Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.2 KiB
Advanced Remotion Templates — Design Spec
Summary
Extend the Remotion caption animation system with new highlight styles, segment transitions, and per-word entrance effects. Create two polished system presets ("Шортс" and "Подкаст") using the new capabilities. No new Remotion compositions — presets are style configurations within the existing CaptionedVideo composition.
Context
Current State
- Remotion service renders captions via a single
CaptionedVideocomposition CaptionStyleSchemacontrols all styling: text, layout, animation, background- 4 highlight styles:
color,scale,underline,color_scale - 2 segment transitions:
fade,slide,none - 3 system presets seeded in DB: "Классические", "Неон", "Минимализм"
- Frontend has preset grid browser + full style editor with live preview
- Backend preset CRUD is complete with system/user preset separation
What This Changes
- Adds 4 new highlight styles, 2 new segment transitions, 3 new animation fields
- Adds 2 new system presets targeting Shorts/Clips and Podcast content creators
- All changes are additive — existing presets and rendering continue to work unchanged
Approach
Extend existing schema (Approach A) — add new enum values and fields to CaptionAnimationStyle. All rendering stays in the single Captions.tsx component. Chosen over separate compositions (too much duplication) and plugin architecture (over-engineered for 4-6 new animation types).
Animation System Extensions
New highlight_style Values
| Style | Visual Effect | Implementation |
|---|---|---|
pop_in |
Each word springs from scale 0→1 when spoken | spring() on transform: scale() keyed to word start frame |
karaoke |
Color fills word left→right over its duration | CSS linear-gradient with interpolate() shifting stop from 0%→100% |
bounce |
Active word overshoots scale (1→1.15→1.0) with elastic ease | spring({ damping: 8 }) on scale, triggers at word start |
glow_pulse |
Active word's text-shadow glow intensity oscillates | interpolate() cycling shadow blur/spread over word duration |
New segment_transition Values
| Transition | Visual Effect |
|---|---|
zoom_in |
Old segment scales up + fades out, new segment scales 0.8→1 + fades in |
drop_in |
New segment drops from above with spring bounce |
New Fields on CaptionAnimationStyle
| Field | Type | Default | Purpose |
|---|---|---|---|
word_entrance |
"none" | "pop" | "typewriter" |
"none" |
How unspoken words appear. pop: spring from scale 0→1 at word start. typewriter: words become visible sequentially (no scale animation). none: all words in segment visible immediately. |
highlight_rotation_deg |
float (0–15) |
0 |
Rotation in degrees applied to active word via transform: rotate() |
text_transform |
"none" | "uppercase" | "lowercase" |
"none" |
CSS text-transform applied to entire caption container |
Backward Compatibility
All new fields have defaults that match current behavior (word_entrance: "none", highlight_rotation_deg: 0, text_transform: "none"). Existing presets and inline configs continue to work without changes.
System Presets
Preset: "Шортс" (Shorts/Clips)
Target: Bold, high-energy captions for TikTok/Reels/Shorts vertical content.
{
"text": {
"font_family": "Montserrat",
"font_size": 72,
"font_weight": 700,
"text_color": "#FFFFFF",
"highlight_color": "#FFE500",
"text_stroke_width": 3,
"text_stroke_color": "#000000",
"text_shadow": "3px 3px 0px #000000"
},
"layout": {
"vertical_position": "bottom",
"horizontal_alignment": "center",
"max_width_pct": 85,
"lines_per_screen": 1,
"padding_px": 20
},
"animation": {
"highlight_style": "bounce",
"highlight_scale": 1.15,
"highlight_rotation_deg": 3,
"word_entrance": "pop",
"segment_transition": "zoom_in",
"fade_duration_frames": 3,
"animation_speed": 1.0,
"text_transform": "uppercase"
},
"background": {
"bg_color": "transparent",
"bg_blur_px": 0,
"bg_glow_color": null,
"bg_border_radius_px": 0,
"bg_padding_px": 0
}
}
Key characteristics:
- All caps, 1 line at a time, no background box
- Words pop in at full size via spring animation
- Active word: yellow + 1.15x bounce + 3° rotation + subtle glow
- Heavy text stroke provides contrast without background
- Zoom transition between segments
Preset: "Подкаст" (Podcast)
Target: Clean, professional captions for long-form podcast/interview content.
{
"text": {
"font_family": "Inter",
"font_size": 44,
"font_weight": 400,
"text_color": "#E0E0E0",
"highlight_color": "#FFFFFF",
"text_stroke_width": 0,
"text_stroke_color": null,
"text_shadow": "1px 1px 3px rgba(0,0,0,0.7)"
},
"layout": {
"vertical_position": "bottom",
"horizontal_alignment": "center",
"max_width_pct": 90,
"lines_per_screen": 2,
"padding_px": 20
},
"animation": {
"highlight_style": "karaoke",
"highlight_scale": 1.0,
"highlight_rotation_deg": 0,
"word_entrance": "none",
"segment_transition": "fade",
"fade_duration_frames": 5,
"animation_speed": 1.0,
"text_transform": "none"
},
"background": {
"bg_color": "rgba(0,0,0,0.5)",
"bg_blur_px": 8,
"bg_glow_color": null,
"bg_border_radius_px": 12,
"bg_padding_px": 16
}
}
Key characteristics:
- Normal case, 2 lines, frosted glass background
- Karaoke wipe fills active word left→right with white
- All words visible — no entrance animation
- Subtle fade between segments
- Inter font, soft white for readability
Changes Per Layer
Remotion Service (remotion_service/)
server/types/CaptionStyleSchema.ts
- Extend
highlight_styleunion: add"pop_in" | "karaoke" | "bounce" | "glow_pulse" - Extend
segment_transitionunion: add"zoom_in" | "drop_in" - Add fields:
word_entrance,highlight_rotation_deg,text_transformwith defaults
src/components/Captions.tsx (~150 lines added)
- New rendering branches for each highlight style using
interpolate()andspring() word_entrancelogic: controls opacity/scale of words before theirwordStartFramehighlight_rotation_deg: appliestransform: rotate()on active wordtext_transform: CSStext-transformon caption container (lives in animation schema because it's applied at render time alongside animation logic)- All animations must use Remotion primitives only — no CSS transitions, no Framer Motion
- Load
MontserratandIntervia@remotion/google-fontsalongside existingLobster— dynamically load based onstyleConfig.text.font_family
No changes to: Root.tsx, Composition.tsx, useCaptions.ts, server endpoints, queue, S3 logic
Backend (cofee_backend/)
cpv3/modules/captions/schemas.py
- Extend
CaptionAnimationStyleLiteral types to include new values - Add 3 new Optional fields with defaults matching current behavior
Alembic migration
- Seed 2 new system presets ("Шортс", "Подкаст") into
caption_presetstable withis_system=True,user_id=NULL - Seed must be idempotent — check for existing name before inserting to avoid duplicates on re-run
No changes to: router, service, repository, task system, webhooks, notifications
Frontend (cofee_frontend/)
features/project/CaptionSettingsStep/StyleEditor.tsx
- Add 4 new options to highlight style
<select> - Add 2 new options to segment transition
<select> - Add 3 new form fields: word_entrance
<select>, rotation slider, text_transform<select> - Update local
FormValuestype to include new literal values (it duplicates backend types)
features/project/CaptionSettingsStep/StylePreview.tsx (optional enhancement)
- Hint at karaoke effect with gradient in static preview
- Not critical — real preview is the rendered video
No new components, no new files, no new API endpoints.
Data Flow
Unchanged. The existing flow handles this entirely:
- User picks preset or edits style →
style_configJSON - Submit →
POST /api/tasks/captions-generate/withpreset_idor inline config - Backend resolves config → sends to Remotion service
- Remotion reads new fields from
styleConfig, renders with new animation logic - Output → S3 → webhook → notification → frontend
Testing
- Remotion: Visual testing via
bun run dev(Remotion Studio) — verify each new animation style renders correctly with sample transcription data - Backend: Existing integration tests cover preset CRUD — add test cases with new fields to verify persistence and retrieval
- Frontend: Existing E2E covers preset selection flow — verify new select options appear and are selectable
- Type-check:
bunx tsc --noEmitin bothremotion_service/andcofee_frontend/
Out of Scope
- New Remotion compositions (only extending existing
CaptionedVideo) - Layout templates (split-screen, PiP, speaker labels)
- Social media overlays (progress bars, CTAs)
- Video cropping/resizing
- Preview rendering in the style editor (static CSS preview is sufficient)