Files
remotion_service/docs/superpowers/specs/2026-03-14-captions-wizard-integration-design.md
T
Daniil e6bfe7c946 feat: upgrade agent team with browser, MCP, CLI tools, rules, and hooks
- Add Chrome browser access to 6 visual agents (18 tools each)
- Add Playwright access to 2 testing agents (22 tools each)
- Add 4 MCP servers: Postgres Pro, Redis, Lighthouse, Docker (.mcp.json)
- Add 3 new rules: testing.md, security.md, remotion-service.md
- Add Context7 library references to all domain agents
- Add CLI tool instructions per agent (curl, ffprobe, k6, semgrep, etc.)
- Update team protocol with new capabilities column
- Add orchestrator dispatch guidance for new agent capabilities
- Init git repo tracking docs + Claude config only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 22:46:16 +03:00

11 KiB

Captions Wizard Integration — Design Spec

Context

The backend captions module (/api/captions/*) and caption generation task (/api/tasks/captions-generate/) are fully implemented but have no frontend UI. This spec covers integrating captions into the Project Wizard as 3 new steps, allowing users to select/manage caption presets, trigger rendering, and view/download the captioned video.

Requirements

  • Add caption-settings, caption-processing, caption-result wizard steps (positions 9-11)
  • Full CRUD for caption presets (system + user presets)
  • Tab-switch layout: preset selection grid and full-page style editor
  • Static preview text in the editor that updates live with style changes
  • Reuse ProcessingStep for caption-processing
  • Result step: video player + download + re-render button
  • All UI text in Russian

Wizard Step Flow

... → subtitle-revision → caption-settings → caption-processing → caption-result
Step Key Label Component New?
caption-settings Настройка субтитров CaptionSettingsStep Yes
caption-processing Обработка ProcessingStep Reused
caption-result Результат CaptionResultStep Yes

Navigation

  • subtitle-revisioncaption-settings (change existing "Завершить проект" button to "Далее" + add goToStep("caption-settings") call)
  • caption-settings "Генерировать" → sets active job → auto-navigates to caption-processing
  • caption-processing job completes → auto-navigates to caption-result
  • caption-result "Перегенерировать" → loops back to caption-settings
  • caption-result "Завершить" → marks completed, wizard finished

WizardContext Changes

New state fields in WizardContextValue:

captionPresetId: string | null        // Selected preset UUID
captionStyleConfig: object | null     // Inline style override (custom not-yet-saved config)
captionedVideoPath: string | null     // S3 path of rendered captioned video

These are persisted to project.workspace_state.wizard alongside existing fields.

Auto-advance logic (WizardContext effect)

  1. Update isJobActive guard: Add currentStep === "caption-processing" to the polling condition (alongside existing "processing" and "transcription-processing" checks) so task status polling fires during caption processing.
  2. New CAPTIONS_GENERATE case: When activeJobType === "CAPTIONS_GENERATE" and task status becomes DONE → read taskStatus.output_data.output_path to get the captioned video S3 path (this data is NOT available in Redux notifications — it must come from the task status polling response). Store in captionedVideoPath, clear active job, navigate to caption-result.

Where transcription_id comes from

The useSubmitCaptionGenerate hook needs a transcription_id. This comes from transcriptionArtifactId in WizardContext (set during the transcription flow). The hook reads it from context and passes it in the request body.

CaptionSettingsStep

Two sub-views controlled by local state (activeTab: "select" | "editor").

Tab 1: Preset Selection ("Выбор пресета")

Data: api.useQuery("get", "/api/captions/presets/") → returns system + user presets

Layout:

  • Grid of preset cards (3 columns)
  • Each card:
    • Dark preview area with styled "Пример" text (CSS-styled based on style_config)
    • Preset name below preview
    • "Системный" badge for is_system === true
    • Edit (pencil) + Delete (trash) icon buttons — hidden for system presets
  • Last card: "+ Создать пресет" (dashed border, click opens editor)
  • Selected card: highlighted border (indigo)

Footer: "← Назад" (to subtitle-revision) + "Генерировать →" (disabled until preset selected)

Actions:

  • Click card → captionPresetId = preset.id, highlight
  • Click edit → setActiveTab("editor"), load preset's style_config into form
  • Click delete → confirmation dialog → DELETE /api/captions/presets/{id}/ → invalidate query cache
  • Click "+ Создать" → setActiveTab("editor"), form with default values
  • Click "Генерировать" → call useSubmitCaptionGenerate() → on success: setActiveJob(job_id, "CAPTIONS_GENERATE"), markStepCompleted("caption-settings"), goToStep("caption-processing")

Tab 2: Style Editor ("Редактор стиля")

Layout:

  • Top: Large preview panel (dark bg) — "Пример субтитров" text styled live from form values
  • Middle: 4 sub-tabs for style config sections
  • Bottom: Form controls for the active sub-tab
  • Footer: "Отмена" (back to Tab 1) + "Сохранить пресет" (create or update)

Sub-tabs and controls:

Sub-tab Field Control
Текст font_family Select (Lobster, Inter, Roboto, Montserrat, etc. — include Lobster as it's the backend default)
Текст font_size Slider (16-96px)
Текст font_weight Select (400: Обычный / 700: Жирный) — numeric values, backend expects int
Текст text_color Color picker
Текст highlight_color Color picker
Текст text_shadow Toggle + text input
Текст text_stroke_width Number input (0-5px)
Текст text_stroke_color Color picker
Позиция vertical_position Select (top / center / bottom)
Позиция horizontal_alignment Select (left / center / right)
Позиция padding_px Number input
Позиция max_width_pct Slider (20-100%)
Позиция lines_per_screen Number input (1-4)
Анимация highlight_style Select (color / scale / underline / color_scale)
Анимация highlight_scale Slider (1.0-2.0)
Анимация segment_transition Select (fade / slide / none)
Анимация fade_duration_frames Number input
Анимация animation_speed Slider (0.5-2.0)
Фон bg_color Color picker
Фон bg_blur_px Number input (0-20)
Фон bg_glow_color Color picker
Фон bg_border_radius_px Number input (0-24)
Фон bg_padding_px Number input (0-32)

Form management: react-hook-form with nested CaptionStyleConfig shape. Form field paths use the nested structure matching the backend schema: text.font_family, text.font_size, layout.vertical_position, animation.highlight_style, background.bg_color, etc. Preview panel applies form values as inline CSS.

Save flow:

  • If editing existing preset → PATCH /api/captions/presets/{id}/ with name + style_config
  • If creating new → name input + POST /api/captions/presets/ with name + style_config
  • On success: invalidate presets query, switch back to Tab 1, auto-select the new/updated preset

CaptionResultStep

Data source: captionedVideoPath from WizardContext → GET /api/files/get_file/?file_path={path} to get presigned URL

Layout:

  • Full-width video player (Vidstack MediaPlayer) with the captioned video
  • Info bar: file name, duration
  • Action buttons:
    • "Скачать" — triggers browser download of the presigned S3 URL
    • "Перегенерировать" — goToStep("caption-settings") to re-render with different preset
    • "Завершить" — markStepCompleted("caption-result"), wizard done

ProcessingStep Integration

ProcessingStep already reads activeJobType and shows different labels. Add to the JOB_TYPE_LABELS map:

"CAPTIONS_GENERATE": "ГЕНЕРАЦИЯ СУБТИТРОВ"

Auto-advance logic in WizardContext needs a new case:

  • When CAPTIONS_GENERATE job is DONE → extract captioned video path from job output, store in captionedVideoPath, navigate to caption-result

API Hooks (New Files)

useSubmitCaptionGenerate.ts

// POST /api/tasks/captions-generate/
// Body: { video_s3_path, folder: "output_files", transcription_id, project_id, preset_id?, style_config? }
// Returns: { job_id, status }

useCaptionPresets.ts

// GET /api/captions/presets/ → list of CaptionPresetRead
// POST /api/captions/presets/ → CaptionPresetCreate → CaptionPresetRead
// PATCH /api/captions/presets/{id}/ → CaptionPresetUpdate → CaptionPresetRead
// DELETE /api/captions/presets/{id}/ → 204

File Structure

src/features/project/
├── CaptionSettingsStep/
│   ├── index.ts
│   ├── CaptionSettingsStep.tsx        # Main component with tab logic
│   ├── PresetGrid.tsx                 # Tab 1: preset cards grid
│   ├── StyleEditor.tsx                # Tab 2: full style editor
│   ├── StylePreview.tsx               # Live preview panel
│   ├── useCaptionPresets.ts           # Query + mutations for presets
│   └── useSubmitCaptionGenerate.ts    # Caption generation mutation
├── CaptionResultStep/
│   ├── index.ts
│   └── CaptionResultStep.tsx          # Video player + download + re-render

Files to Modify

File Change
src/shared/context/WizardContext.tsx Add 3 step keys, 3 state fields, auto-advance for CAPTIONS_GENERATE
src/widgets/ProjectWizard/ProjectWizard.tsx Add steps to WIZARD_STEPS array and STEP_COMPONENTS map
src/features/project/ProcessingStep/ProcessingStep.tsx Add "CAPTIONS_GENERATE" to JOB_TYPE_LABELS
src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx Change "Завершить проект" button to "Далее" and add goToStep("caption-settings") navigation (currently has no forward navigation, only markStepCompleted)
src/shared/api/__generated__/openapi.types.ts Regenerate via bun run gen:api-types

Prerequisites

  1. Run bun run gen:api-types with backend running to get latest captions preset types
  2. Verify backend /api/captions/presets/ endpoint is accessible

Error Handling

  • Caption generation fails (FAILED status): ProcessingStep already shows failure state with danger-colored progress. User clicks "Назад" (goBack()) → navigates back to caption-settings to re-submit.
  • Preset delete fails (403): Show error toast — system presets cannot be deleted.
  • Preset save fails (validation): Display field-level errors from API response.
  • Result video URL expired: Re-fetch presigned URL on player error via retry.

Verification

  1. Navigate to an existing project that has completed subtitle-revision
  2. After subtitle-revision, wizard should advance to "Настройка субтитров"
  3. Verify system presets (Классические, Неон, Минимализм) appear in the grid
  4. Create a custom preset via the style editor, verify it appears in grid
  5. Edit and delete the custom preset, verify CRUD works
  6. Select a preset and click "Генерировать" → verify navigation to processing step
  7. Wait for job completion → verify navigation to result step
  8. Verify captioned video plays in the result step
  9. Click "Перегенерировать" → verify return to caption-settings
  10. Click "Скачать" → verify download works