Files
remotion_service/docs/superpowers/specs/2026-03-21-advanced-remotion-templates-design.md
2026-03-22 22:42:35 +03:00

9.2 KiB
Raw Permalink Blame History

Advanced Remotion Templates — Design Spec

Summary

Extend the Remotion caption animation system with new highlight styles, segment transitions, and per-word entrance effects. Create two polished system presets ("Шортс" and "Подкаст") using the new capabilities. No new Remotion compositions — presets are style configurations within the existing CaptionedVideo composition.

Context

Current State

  • Remotion service renders captions via a single CaptionedVideo composition
  • CaptionStyleSchema controls all styling: text, layout, animation, background
  • 4 highlight styles: color, scale, underline, color_scale
  • 2 segment transitions: fade, slide, none
  • 3 system presets seeded in DB: "Классические", "Неон", "Минимализм"
  • Frontend has preset grid browser + full style editor with live preview
  • Backend preset CRUD is complete with system/user preset separation

What This Changes

  • Adds 4 new highlight styles, 2 new segment transitions, 3 new animation fields
  • Adds 2 new system presets targeting Shorts/Clips and Podcast content creators
  • All changes are additive — existing presets and rendering continue to work unchanged

Approach

Extend existing schema (Approach A) — add new enum values and fields to CaptionAnimationStyle. All rendering stays in the single Captions.tsx component. Chosen over separate compositions (too much duplication) and plugin architecture (over-engineered for 4-6 new animation types).

Animation System Extensions

New highlight_style Values

Style Visual Effect Implementation
pop_in Each word springs from scale 0→1 when spoken spring() on transform: scale() keyed to word start frame
karaoke Color fills word left→right over its duration CSS linear-gradient with interpolate() shifting stop from 0%→100%
bounce Active word overshoots scale (1→1.15→1.0) with elastic ease spring({ damping: 8 }) on scale, triggers at word start
glow_pulse Active word's text-shadow glow intensity oscillates interpolate() cycling shadow blur/spread over word duration

New segment_transition Values

Transition Visual Effect
zoom_in Old segment scales up + fades out, new segment scales 0.8→1 + fades in
drop_in New segment drops from above with spring bounce

New Fields on CaptionAnimationStyle

Field Type Default Purpose
word_entrance "none" | "pop" | "typewriter" "none" How unspoken words appear. pop: spring from scale 0→1 at word start. typewriter: words become visible sequentially (no scale animation). none: all words in segment visible immediately.
highlight_rotation_deg float (015) 0 Rotation in degrees applied to active word via transform: rotate()
text_transform "none" | "uppercase" | "lowercase" "none" CSS text-transform applied to entire caption container

Backward Compatibility

All new fields have defaults that match current behavior (word_entrance: "none", highlight_rotation_deg: 0, text_transform: "none"). Existing presets and inline configs continue to work without changes.

System Presets

Preset: "Шортс" (Shorts/Clips)

Target: Bold, high-energy captions for TikTok/Reels/Shorts vertical content.

{
  "text": {
    "font_family": "Montserrat",
    "font_size": 72,
    "font_weight": 700,
    "text_color": "#FFFFFF",
    "highlight_color": "#FFE500",
    "text_stroke_width": 3,
    "text_stroke_color": "#000000",
    "text_shadow": "3px 3px 0px #000000"
  },
  "layout": {
    "vertical_position": "bottom",
    "horizontal_alignment": "center",
    "max_width_pct": 85,
    "lines_per_screen": 1,
    "padding_px": 20
  },
  "animation": {
    "highlight_style": "bounce",
    "highlight_scale": 1.15,
    "highlight_rotation_deg": 3,
    "word_entrance": "pop",
    "segment_transition": "zoom_in",
    "fade_duration_frames": 3,
    "animation_speed": 1.0,
    "text_transform": "uppercase"
  },
  "background": {
    "bg_color": "transparent",
    "bg_blur_px": 0,
    "bg_glow_color": null,
    "bg_border_radius_px": 0,
    "bg_padding_px": 0
  }
}

Key characteristics:

  • All caps, 1 line at a time, no background box
  • Words pop in at full size via spring animation
  • Active word: yellow + 1.15x bounce + 3° rotation + subtle glow
  • Heavy text stroke provides contrast without background
  • Zoom transition between segments

Preset: "Подкаст" (Podcast)

Target: Clean, professional captions for long-form podcast/interview content.

{
  "text": {
    "font_family": "Inter",
    "font_size": 44,
    "font_weight": 400,
    "text_color": "#E0E0E0",
    "highlight_color": "#FFFFFF",
    "text_stroke_width": 0,
    "text_stroke_color": null,
    "text_shadow": "1px 1px 3px rgba(0,0,0,0.7)"
  },
  "layout": {
    "vertical_position": "bottom",
    "horizontal_alignment": "center",
    "max_width_pct": 90,
    "lines_per_screen": 2,
    "padding_px": 20
  },
  "animation": {
    "highlight_style": "karaoke",
    "highlight_scale": 1.0,
    "highlight_rotation_deg": 0,
    "word_entrance": "none",
    "segment_transition": "fade",
    "fade_duration_frames": 5,
    "animation_speed": 1.0,
    "text_transform": "none"
  },
  "background": {
    "bg_color": "rgba(0,0,0,0.5)",
    "bg_blur_px": 8,
    "bg_glow_color": null,
    "bg_border_radius_px": 12,
    "bg_padding_px": 16
  }
}

Key characteristics:

  • Normal case, 2 lines, frosted glass background
  • Karaoke wipe fills active word left→right with white
  • All words visible — no entrance animation
  • Subtle fade between segments
  • Inter font, soft white for readability

Changes Per Layer

Remotion Service (remotion_service/)

server/types/CaptionStyleSchema.ts

  • Extend highlight_style union: add "pop_in" | "karaoke" | "bounce" | "glow_pulse"
  • Extend segment_transition union: add "zoom_in" | "drop_in"
  • Add fields: word_entrance, highlight_rotation_deg, text_transform with defaults

src/components/Captions.tsx (~150 lines added)

  • New rendering branches for each highlight style using interpolate() and spring()
  • word_entrance logic: controls opacity/scale of words before their wordStartFrame
  • highlight_rotation_deg: applies transform: rotate() on active word
  • text_transform: CSS text-transform on caption container (lives in animation schema because it's applied at render time alongside animation logic)
  • All animations must use Remotion primitives only — no CSS transitions, no Framer Motion
  • Load Montserrat and Inter via @remotion/google-fonts alongside existing Lobster — dynamically load based on styleConfig.text.font_family

No changes to: Root.tsx, Composition.tsx, useCaptions.ts, server endpoints, queue, S3 logic

Backend (cofee_backend/)

cpv3/modules/captions/schemas.py

  • Extend CaptionAnimationStyle Literal types to include new values
  • Add 3 new Optional fields with defaults matching current behavior

Alembic migration

  • Seed 2 new system presets ("Шортс", "Подкаст") into caption_presets table with is_system=True, user_id=NULL
  • Seed must be idempotent — check for existing name before inserting to avoid duplicates on re-run

No changes to: router, service, repository, task system, webhooks, notifications

Frontend (cofee_frontend/)

features/project/CaptionSettingsStep/StyleEditor.tsx

  • Add 4 new options to highlight style <select>
  • Add 2 new options to segment transition <select>
  • Add 3 new form fields: word_entrance <select>, rotation slider, text_transform <select>
  • Update local FormValues type to include new literal values (it duplicates backend types)

features/project/CaptionSettingsStep/StylePreview.tsx (optional enhancement)

  • Hint at karaoke effect with gradient in static preview
  • Not critical — real preview is the rendered video

No new components, no new files, no new API endpoints.

Data Flow

Unchanged. The existing flow handles this entirely:

  1. User picks preset or edits style → style_config JSON
  2. Submit → POST /api/tasks/captions-generate/ with preset_id or inline config
  3. Backend resolves config → sends to Remotion service
  4. Remotion reads new fields from styleConfig, renders with new animation logic
  5. Output → S3 → webhook → notification → frontend

Testing

  • Remotion: Visual testing via bun run dev (Remotion Studio) — verify each new animation style renders correctly with sample transcription data
  • Backend: Existing integration tests cover preset CRUD — add test cases with new fields to verify persistence and retrieval
  • Frontend: Existing E2E covers preset selection flow — verify new select options appear and are selectable
  • Type-check: bunx tsc --noEmit in both remotion_service/ and cofee_frontend/

Out of Scope

  • New Remotion compositions (only extending existing CaptionedVideo)
  • Layout templates (split-screen, PiP, speaker labels)
  • Social media overlays (progress bars, CTAs)
  • Video cropping/resizing
  • Preview rendering in the style editor (static CSS preview is sufficient)