Files
remotion_service/docs/superpowers/specs/2026-03-21-advanced-remotion-templates-design.md
2026-03-22 22:42:35 +03:00

230 lines
9.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Advanced Remotion Templates — Design Spec
## Summary
Extend the Remotion caption animation system with new highlight styles, segment transitions, and per-word entrance effects. Create two polished system presets ("Шортс" and "Подкаст") using the new capabilities. No new Remotion compositions — presets are style configurations within the existing `CaptionedVideo` composition.
## Context
### Current State
- Remotion service renders captions via a single `CaptionedVideo` composition
- `CaptionStyleSchema` controls all styling: text, layout, animation, background
- 4 highlight styles: `color`, `scale`, `underline`, `color_scale`
- 2 segment transitions: `fade`, `slide`, `none`
- 3 system presets seeded in DB: "Классические", "Неон", "Минимализм"
- Frontend has preset grid browser + full style editor with live preview
- Backend preset CRUD is complete with system/user preset separation
### What This Changes
- Adds 4 new highlight styles, 2 new segment transitions, 3 new animation fields
- Adds 2 new system presets targeting Shorts/Clips and Podcast content creators
- All changes are additive — existing presets and rendering continue to work unchanged
## Approach
**Extend existing schema (Approach A)** — add new enum values and fields to `CaptionAnimationStyle`. All rendering stays in the single `Captions.tsx` component. Chosen over separate compositions (too much duplication) and plugin architecture (over-engineered for 4-6 new animation types).
## Animation System Extensions
### New `highlight_style` Values
| Style | Visual Effect | Implementation |
|-------|--------------|----------------|
| `pop_in` | Each word springs from scale 0→1 when spoken | `spring()` on `transform: scale()` keyed to word start frame |
| `karaoke` | Color fills word left→right over its duration | CSS `linear-gradient` with `interpolate()` shifting stop from 0%→100% |
| `bounce` | Active word overshoots scale (1→1.15→1.0) with elastic ease | `spring({ damping: 8 })` on scale, triggers at word start |
| `glow_pulse` | Active word's text-shadow glow intensity oscillates | `interpolate()` cycling shadow blur/spread over word duration |
### New `segment_transition` Values
| Transition | Visual Effect |
|-----------|--------------|
| `zoom_in` | Old segment scales up + fades out, new segment scales 0.8→1 + fades in |
| `drop_in` | New segment drops from above with spring bounce |
### New Fields on `CaptionAnimationStyle`
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `word_entrance` | `"none" \| "pop" \| "typewriter"` | `"none"` | How unspoken words appear. `pop`: spring from scale 0→1 at word start. `typewriter`: words become visible sequentially (no scale animation). `none`: all words in segment visible immediately. |
| `highlight_rotation_deg` | `float` (015) | `0` | Rotation in degrees applied to active word via `transform: rotate()` |
| `text_transform` | `"none" \| "uppercase" \| "lowercase"` | `"none"` | CSS `text-transform` applied to entire caption container |
### Backward Compatibility
All new fields have defaults that match current behavior (`word_entrance: "none"`, `highlight_rotation_deg: 0`, `text_transform: "none"`). Existing presets and inline configs continue to work without changes.
## System Presets
### Preset: "Шортс" (Shorts/Clips)
Target: Bold, high-energy captions for TikTok/Reels/Shorts vertical content.
```json
{
"text": {
"font_family": "Montserrat",
"font_size": 72,
"font_weight": 700,
"text_color": "#FFFFFF",
"highlight_color": "#FFE500",
"text_stroke_width": 3,
"text_stroke_color": "#000000",
"text_shadow": "3px 3px 0px #000000"
},
"layout": {
"vertical_position": "bottom",
"horizontal_alignment": "center",
"max_width_pct": 85,
"lines_per_screen": 1,
"padding_px": 20
},
"animation": {
"highlight_style": "bounce",
"highlight_scale": 1.15,
"highlight_rotation_deg": 3,
"word_entrance": "pop",
"segment_transition": "zoom_in",
"fade_duration_frames": 3,
"animation_speed": 1.0,
"text_transform": "uppercase"
},
"background": {
"bg_color": "transparent",
"bg_blur_px": 0,
"bg_glow_color": null,
"bg_border_radius_px": 0,
"bg_padding_px": 0
}
}
```
Key characteristics:
- All caps, 1 line at a time, no background box
- Words pop in at full size via spring animation
- Active word: yellow + 1.15x bounce + 3° rotation + subtle glow
- Heavy text stroke provides contrast without background
- Zoom transition between segments
### Preset: "Подкаст" (Podcast)
Target: Clean, professional captions for long-form podcast/interview content.
```json
{
"text": {
"font_family": "Inter",
"font_size": 44,
"font_weight": 400,
"text_color": "#E0E0E0",
"highlight_color": "#FFFFFF",
"text_stroke_width": 0,
"text_stroke_color": null,
"text_shadow": "1px 1px 3px rgba(0,0,0,0.7)"
},
"layout": {
"vertical_position": "bottom",
"horizontal_alignment": "center",
"max_width_pct": 90,
"lines_per_screen": 2,
"padding_px": 20
},
"animation": {
"highlight_style": "karaoke",
"highlight_scale": 1.0,
"highlight_rotation_deg": 0,
"word_entrance": "none",
"segment_transition": "fade",
"fade_duration_frames": 5,
"animation_speed": 1.0,
"text_transform": "none"
},
"background": {
"bg_color": "rgba(0,0,0,0.5)",
"bg_blur_px": 8,
"bg_glow_color": null,
"bg_border_radius_px": 12,
"bg_padding_px": 16
}
}
```
Key characteristics:
- Normal case, 2 lines, frosted glass background
- Karaoke wipe fills active word left→right with white
- All words visible — no entrance animation
- Subtle fade between segments
- Inter font, soft white for readability
## Changes Per Layer
### Remotion Service (`remotion_service/`)
**`server/types/CaptionStyleSchema.ts`**
- Extend `highlight_style` union: add `"pop_in" | "karaoke" | "bounce" | "glow_pulse"`
- Extend `segment_transition` union: add `"zoom_in" | "drop_in"`
- Add fields: `word_entrance`, `highlight_rotation_deg`, `text_transform` with defaults
**`src/components/Captions.tsx`** (~150 lines added)
- New rendering branches for each highlight style using `interpolate()` and `spring()`
- `word_entrance` logic: controls opacity/scale of words before their `wordStartFrame`
- `highlight_rotation_deg`: applies `transform: rotate()` on active word
- `text_transform`: CSS `text-transform` on caption container (lives in animation schema because it's applied at render time alongside animation logic)
- All animations must use Remotion primitives only — no CSS transitions, no Framer Motion
- Load `Montserrat` and `Inter` via `@remotion/google-fonts` alongside existing `Lobster` — dynamically load based on `styleConfig.text.font_family`
**No changes to:** `Root.tsx`, `Composition.tsx`, `useCaptions.ts`, server endpoints, queue, S3 logic
### Backend (`cofee_backend/`)
**`cpv3/modules/captions/schemas.py`**
- Extend `CaptionAnimationStyle` Literal types to include new values
- Add 3 new Optional fields with defaults matching current behavior
**Alembic migration**
- Seed 2 new system presets ("Шортс", "Подкаст") into `caption_presets` table with `is_system=True`, `user_id=NULL`
- Seed must be idempotent — check for existing name before inserting to avoid duplicates on re-run
**No changes to:** router, service, repository, task system, webhooks, notifications
### Frontend (`cofee_frontend/`)
**`features/project/CaptionSettingsStep/StyleEditor.tsx`**
- Add 4 new options to highlight style `<select>`
- Add 2 new options to segment transition `<select>`
- Add 3 new form fields: word_entrance `<select>`, rotation slider, text_transform `<select>`
- Update local `FormValues` type to include new literal values (it duplicates backend types)
**`features/project/CaptionSettingsStep/StylePreview.tsx`** (optional enhancement)
- Hint at karaoke effect with gradient in static preview
- Not critical — real preview is the rendered video
**No new components, no new files, no new API endpoints.**
## Data Flow
Unchanged. The existing flow handles this entirely:
1. User picks preset or edits style → `style_config` JSON
2. Submit → `POST /api/tasks/captions-generate/` with `preset_id` or inline config
3. Backend resolves config → sends to Remotion service
4. Remotion reads new fields from `styleConfig`, renders with new animation logic
5. Output → S3 → webhook → notification → frontend
## Testing
- **Remotion**: Visual testing via `bun run dev` (Remotion Studio) — verify each new animation style renders correctly with sample transcription data
- **Backend**: Existing integration tests cover preset CRUD — add test cases with new fields to verify persistence and retrieval
- **Frontend**: Existing E2E covers preset selection flow — verify new select options appear and are selectable
- **Type-check**: `bunx tsc --noEmit` in both `remotion_service/` and `cofee_frontend/`
## Out of Scope
- New Remotion compositions (only extending existing `CaptionedVideo`)
- Layout templates (split-screen, PiP, speaker labels)
- Social media overlays (progress bars, CTAs)
- Video cropping/resizing
- Preview rendering in the style editor (static CSS preview is sufficient)