# Advanced Remotion Templates — Design Spec ## Summary Extend the Remotion caption animation system with new highlight styles, segment transitions, and per-word entrance effects. Create two polished system presets ("Шортс" and "Подкаст") using the new capabilities. No new Remotion compositions — presets are style configurations within the existing `CaptionedVideo` composition. ## Context ### Current State - Remotion service renders captions via a single `CaptionedVideo` composition - `CaptionStyleSchema` controls all styling: text, layout, animation, background - 4 highlight styles: `color`, `scale`, `underline`, `color_scale` - 2 segment transitions: `fade`, `slide`, `none` - 3 system presets seeded in DB: "Классические", "Неон", "Минимализм" - Frontend has preset grid browser + full style editor with live preview - Backend preset CRUD is complete with system/user preset separation ### What This Changes - Adds 4 new highlight styles, 2 new segment transitions, 3 new animation fields - Adds 2 new system presets targeting Shorts/Clips and Podcast content creators - All changes are additive — existing presets and rendering continue to work unchanged ## Approach **Extend existing schema (Approach A)** — add new enum values and fields to `CaptionAnimationStyle`. All rendering stays in the single `Captions.tsx` component. Chosen over separate compositions (too much duplication) and plugin architecture (over-engineered for 4-6 new animation types). ## Animation System Extensions ### New `highlight_style` Values | Style | Visual Effect | Implementation | |-------|--------------|----------------| | `pop_in` | Each word springs from scale 0→1 when spoken | `spring()` on `transform: scale()` keyed to word start frame | | `karaoke` | Color fills word left→right over its duration | CSS `linear-gradient` with `interpolate()` shifting stop from 0%→100% | | `bounce` | Active word overshoots scale (1→1.15→1.0) with elastic ease | `spring({ damping: 8 })` on scale, triggers at word start | | `glow_pulse` | Active word's text-shadow glow intensity oscillates | `interpolate()` cycling shadow blur/spread over word duration | ### New `segment_transition` Values | Transition | Visual Effect | |-----------|--------------| | `zoom_in` | Old segment scales up + fades out, new segment scales 0.8→1 + fades in | | `drop_in` | New segment drops from above with spring bounce | ### New Fields on `CaptionAnimationStyle` | Field | Type | Default | Purpose | |-------|------|---------|---------| | `word_entrance` | `"none" \| "pop" \| "typewriter"` | `"none"` | How unspoken words appear. `pop`: spring from scale 0→1 at word start. `typewriter`: words become visible sequentially (no scale animation). `none`: all words in segment visible immediately. | | `highlight_rotation_deg` | `float` (0–15) | `0` | Rotation in degrees applied to active word via `transform: rotate()` | | `text_transform` | `"none" \| "uppercase" \| "lowercase"` | `"none"` | CSS `text-transform` applied to entire caption container | ### Backward Compatibility All new fields have defaults that match current behavior (`word_entrance: "none"`, `highlight_rotation_deg: 0`, `text_transform: "none"`). Existing presets and inline configs continue to work without changes. ## System Presets ### Preset: "Шортс" (Shorts/Clips) Target: Bold, high-energy captions for TikTok/Reels/Shorts vertical content. ```json { "text": { "font_family": "Montserrat", "font_size": 72, "font_weight": 700, "text_color": "#FFFFFF", "highlight_color": "#FFE500", "text_stroke_width": 3, "text_stroke_color": "#000000", "text_shadow": "3px 3px 0px #000000" }, "layout": { "vertical_position": "bottom", "horizontal_alignment": "center", "max_width_pct": 85, "lines_per_screen": 1, "padding_px": 20 }, "animation": { "highlight_style": "bounce", "highlight_scale": 1.15, "highlight_rotation_deg": 3, "word_entrance": "pop", "segment_transition": "zoom_in", "fade_duration_frames": 3, "animation_speed": 1.0, "text_transform": "uppercase" }, "background": { "bg_color": "transparent", "bg_blur_px": 0, "bg_glow_color": null, "bg_border_radius_px": 0, "bg_padding_px": 0 } } ``` Key characteristics: - All caps, 1 line at a time, no background box - Words pop in at full size via spring animation - Active word: yellow + 1.15x bounce + 3° rotation + subtle glow - Heavy text stroke provides contrast without background - Zoom transition between segments ### Preset: "Подкаст" (Podcast) Target: Clean, professional captions for long-form podcast/interview content. ```json { "text": { "font_family": "Inter", "font_size": 44, "font_weight": 400, "text_color": "#E0E0E0", "highlight_color": "#FFFFFF", "text_stroke_width": 0, "text_stroke_color": null, "text_shadow": "1px 1px 3px rgba(0,0,0,0.7)" }, "layout": { "vertical_position": "bottom", "horizontal_alignment": "center", "max_width_pct": 90, "lines_per_screen": 2, "padding_px": 20 }, "animation": { "highlight_style": "karaoke", "highlight_scale": 1.0, "highlight_rotation_deg": 0, "word_entrance": "none", "segment_transition": "fade", "fade_duration_frames": 5, "animation_speed": 1.0, "text_transform": "none" }, "background": { "bg_color": "rgba(0,0,0,0.5)", "bg_blur_px": 8, "bg_glow_color": null, "bg_border_radius_px": 12, "bg_padding_px": 16 } } ``` Key characteristics: - Normal case, 2 lines, frosted glass background - Karaoke wipe fills active word left→right with white - All words visible — no entrance animation - Subtle fade between segments - Inter font, soft white for readability ## Changes Per Layer ### Remotion Service (`remotion_service/`) **`server/types/CaptionStyleSchema.ts`** - Extend `highlight_style` union: add `"pop_in" | "karaoke" | "bounce" | "glow_pulse"` - Extend `segment_transition` union: add `"zoom_in" | "drop_in"` - Add fields: `word_entrance`, `highlight_rotation_deg`, `text_transform` with defaults **`src/components/Captions.tsx`** (~150 lines added) - New rendering branches for each highlight style using `interpolate()` and `spring()` - `word_entrance` logic: controls opacity/scale of words before their `wordStartFrame` - `highlight_rotation_deg`: applies `transform: rotate()` on active word - `text_transform`: CSS `text-transform` on caption container (lives in animation schema because it's applied at render time alongside animation logic) - All animations must use Remotion primitives only — no CSS transitions, no Framer Motion - Load `Montserrat` and `Inter` via `@remotion/google-fonts` alongside existing `Lobster` — dynamically load based on `styleConfig.text.font_family` **No changes to:** `Root.tsx`, `Composition.tsx`, `useCaptions.ts`, server endpoints, queue, S3 logic ### Backend (`cofee_backend/`) **`cpv3/modules/captions/schemas.py`** - Extend `CaptionAnimationStyle` Literal types to include new values - Add 3 new Optional fields with defaults matching current behavior **Alembic migration** - Seed 2 new system presets ("Шортс", "Подкаст") into `caption_presets` table with `is_system=True`, `user_id=NULL` - Seed must be idempotent — check for existing name before inserting to avoid duplicates on re-run **No changes to:** router, service, repository, task system, webhooks, notifications ### Frontend (`cofee_frontend/`) **`features/project/CaptionSettingsStep/StyleEditor.tsx`** - Add 4 new options to highlight style `` - Add 3 new form fields: word_entrance `` - Update local `FormValues` type to include new literal values (it duplicates backend types) **`features/project/CaptionSettingsStep/StylePreview.tsx`** (optional enhancement) - Hint at karaoke effect with gradient in static preview - Not critical — real preview is the rendered video **No new components, no new files, no new API endpoints.** ## Data Flow Unchanged. The existing flow handles this entirely: 1. User picks preset or edits style → `style_config` JSON 2. Submit → `POST /api/tasks/captions-generate/` with `preset_id` or inline config 3. Backend resolves config → sends to Remotion service 4. Remotion reads new fields from `styleConfig`, renders with new animation logic 5. Output → S3 → webhook → notification → frontend ## Testing - **Remotion**: Visual testing via `bun run dev` (Remotion Studio) — verify each new animation style renders correctly with sample transcription data - **Backend**: Existing integration tests cover preset CRUD — add test cases with new fields to verify persistence and retrieval - **Frontend**: Existing E2E covers preset selection flow — verify new select options appear and are selectable - **Type-check**: `bunx tsc --noEmit` in both `remotion_service/` and `cofee_frontend/` ## Out of Scope - New Remotion compositions (only extending existing `CaptionedVideo`) - Layout templates (split-screen, PiP, speaker labels) - Social media overlays (progress bars, CTAs) - Video cropping/resizing - Preview rendering in the style editor (static CSS preview is sufficient)