docs initial

This commit is contained in:
Daniil
2026-04-06 01:44:58 +03:00
parent 2a344ad588
commit 694b8bc77c
84 changed files with 6922 additions and 298 deletions
@@ -0,0 +1,800 @@
# Исследование API-сервисов: Video Intelligence, STT, TTS & B-Roll
**Дата:** 1 апреля 2026
**Консультанты:** ML/AI-инженер, Backend-архитектор, Remotion-инженер, Product Lead, + 4 исследовательских агента
**Контекст:** Глубокий анализ API-сервисов для будущих фич — highlight detection, shorts generation, semantic search, B-Roll
---
## Содержание
1. [Executive Summary](#1-executive-summary)
2. [STT — обновлённое сравнение](#2-stt--обновлённое-сравнение)
3. [TTS — обновлённое сравнение](#3-tts--обновлённое-сравнение)
4. [Video Intelligence — полное сравнение](#4-video-intelligence--полное-сравнение)
5. [TwelveLabs — глубокий анализ](#5-twelvelabs--глубокий-анализ)
6. [Gemini 2.5 — ключевой новый игрок](#6-gemini-25--ключевой-новый-игрок)
7. [Clipping-платформы (OpusClip, Reap, Vizard)](#7-clipping-платформы)
8. [B-Roll генерация](#8-b-roll-генерация)
9. [Архитектура интеграции в Coffee Project](#9-архитектура-интеграции-в-coffee-project)
10. [Remotion Pipeline — эволюция](#10-remotion-pipeline--эволюция)
11. [Продуктовая стратегия и монетизация](#11-продуктовая-стратегия-и-монетизация)
12. [Сводная таблица стоимости](#12-сводная-таблица-стоимости)
13. [Рекомендации и дорожная карта](#13-рекомендации-и-дорожная-карта)
14. [Красные флаги в текущем коде](#14-красные-флаги-в-текущем-коде)
15. [Источники](#15-источники)
---
## 1. Executive Summary
### Ключевые находки
1. **Gemini 2.5 Flash — game-changer.** $0.005/мин за видеоанализ (20-60x дешевле TwelveLabs). Достаточно для MVP highlight detection.
2. **TwelveLabs оправдан только для повторных запросов.** Модель «проиндексируй раз — ищи многократно» выгодна при 10+ запросах к одному видео. Для одноразового анализа — Gemini дешевле.
3. **ElevenLabs Scribe v2 — лучший STT для нашего продукта.** WER 2.3%, точные пословные таймстемпы (критично для субтитров), встроенная диаризация. $0.40/час.
4. **B-Roll генерация НЕ готова для продакшна.** Рекомендация: Pexels API (бесплатный) для поиска стокового видео по ключевым словам из транскрипции.
5. **Reap.video — неожиданно сильный конкурент.** API + CLI + MCP за $9.99/мес, 98 языков для субтитров. Дешевле и доступнее OpusClip.
6. **У Coffee Project нулевая инфраструктура монетизации.** Нет планов, тарифов, трекинга использования, биллинга. Это блокер для любых платных фич.
7. **Русский рынок — first-mover advantage.** Нет локальных конкурентов в AI video clipping. Западные инструменты недоступны из-за санкций.
### Рекомендуемый стек (обновлённый)
| Задача | Сервис | Цена | Зачем именно этот |
|--------|--------|------|-------------------|
| STT (продакшн) | ElevenLabs Scribe v2 | $0.40/час | Лучший WER + таймстемпы для субтитров |
| STT (черновик/preview) | Whisper v3-turbo (DeepInfra) | $0.06/час | 253x realtime, мгновенный preview |
| Highlight detection (MVP) | Gemini 2.5 Flash | $0.005/мин | 20-60x дешевле TwelveLabs |
| Highlight detection (premium) | TwelveLabs Pegasus 1.2 | $0.063/мин | Лучшая точность для автоматизации |
| Chapters | Gemini 2.5 Flash | $0.005/мин | Достаточно качества, минимальная цена |
| Semantic search | TwelveLabs Marengo 3.0 | $4/1000 запросов | Единственный с pre-indexed search |
| B-Roll suggestions | Pexels API | Бесплатно | Реальное видео > AI-генерация |
| TTS (русский) | SaluteSpeech | $2.1/1M символов | Самый дешёвый для RU |
---
## 2. STT — обновлённое сравнение
### Сравнительная таблица (апрель 2026)
| Сервис | WER (EN) | WER (RU, оценка) | $/час | Пословные таймстемпы | Диаризация | Особенности |
|--------|----------|-------------------|-------|----------------------|------------|-------------|
| **ElevenLabs Scribe v2** | **2.3%** | ~5-7% | $0.40 | Да, точные (для субтитров) | Да (batch) | Audio tagging (смех, музыка), 90+ языков |
| **Deepgram Nova-3 Mono** | 5.4% | ~8-12% | $0.46 | Да, улучшены в v3 | Да (+$0.12/час) | Code-switching 10 языков в одном потоке |
| **Deepgram Nova-3 Multi** | 5.4% | ~8-12% | $0.55 | Да | Да | Мультиязычная версия |
| **Whisper large-v3 (stock)** | 4.2% | 9.0% | $0.06 (DeepInfra) | Да, ±500ms нативно | Нет | Open-source, pay-as-you-go |
| **Whisper large-v3 (fine-tuned RU)** | — | **6.4%** | Self-hosted | Да, ±500ms | Нет | Требует GPU, инфраструктура |
| **Whisper v3-turbo** | 4.8% | 10.2% | $0.06 (DeepInfra) | Да, менее точные | Нет | 253x realtime, 6x быстрее large |
| **Google Speech V1** (текущий) | ~6-8% | ~8-12% | ~$0.06/15сек | Да | Да | Уже интегрирован |
### Критический вывод: точность таймстемпов
Для Coffee Project **точность пословных таймстемпов — главная метрика**, потому что субтитры синхронизируются покадрово в Remotion через `WordNode.time.start/end`.
- **ElevenLabs Scribe v2**: создан для субтитрирования. Точность таймстемпов достаточна без постобработки.
- **Whisper нативный**: ±500ms на уровне сегментов. Пословные таймстемпы из cross-attention весов — заметно неточные. Это проблема, которая уже есть в проекте.
- **Whisper + WhisperX**: значительно лучше через wav2vec2 forced alignment, но добавляет вторую модель и сложность.
### Рекомендация ML/AI-инженера
**Двухуровневая архитектура STT:**
| Уровень | Движок | Задержка | $/час | Когда |
|---------|--------|----------|-------|-------|
| Черновик (мгновенный) | Whisper v3-turbo (DeepInfra) | ~2-3 сек на 5 мин | $0.06 | Preview сразу после загрузки |
| Продакшн (точный) | ElevenLabs Scribe v2 | ~15-30 сек на 5 мин | $0.40 | Заменяет черновик, используется для рендера |
Экономия: 85% на большинстве взаимодействий (просмотр, предпросмотр), где достаточно черновика.
### Новое в ElevenLabs Scribe v2
- **Audio tagging** (январь 2026): детектирует смех, аплодисменты, музыку, шаги, фоновый шум. Теги появляются inline в транскрипте с таймстемпами: `(laughter)`, `(music)`.
- **Scribe v2 Realtime**: 30-80ms задержка, 93.5% точность на 30 языках.
- **Voice Isolator**: нейронное разделение речи — полезно для предобработки шумного видео.
### Новое в Deepgram Nova-3
- **54.2% снижение WER** для стриминга vs конкурентов.
- **Live code-switching**: 10 языков (включая русский) в одном потоке.
- **Keyterm prompting**: мультиязычный, улучшает точность для специфических терминов.
- **Audio Intelligence — по-прежнему только EN.** Sentiment, topics, intent — только английский. Это критическое ограничение для нашего продукта.
---
## 3. TTS — обновлённое сравнение
Без изменений vs первоначальное исследование. Обновлённые цены Deepgram:
| Сервис | $/1K символов | $/1M символов | Особенности |
|--------|---------------|---------------|-------------|
| **SaluteSpeech** (Сбер) | ~$0.0021 | ~$2.1 | Самый дешёвый. RU/EN/KZ |
| **Deepgram Aura-1** | $0.015 | $15 | Предыдущее поколение |
| **Deepgram Aura-2** | $0.030 | $30 | Новейшая модель |
| **ElevenLabs Flash/Turbo** | $0.06 | $60 | Business tier, ~75ms, 32 языка |
| **ElevenLabs Multilingual v2/v3** | $0.12 | $120 | Премиум качество, voice cloning |
---
## 4. Video Intelligence — полное сравнение
### Сравнительная матрица
| Параметр | TwelveLabs | Gemini 2.5 Pro | Gemini 2.5 Flash | GPT-4o/4.1 | Google Video Intelligence | Azure Video Indexer |
|----------|-----------|----------------|------------------|------------|--------------------------|---------------------|
| **Тип** | Video-native foundation models | General VLM с видеовходом | General VLM (лёгкий) | Image-only (кадры) | Structured annotation | ML pipeline orchestrator |
| **Архитектура** | Marengo (embeddings) + Pegasus (генерация) | Мультимодальный LLM | Мультимодальный LLM | Мультимодальный LLM (без видео) | Отдельные ML-модели | Набор Azure AI сервисов |
| **Highlight detection** | Нативный API, таймкоды | Через промпт, секундные таймкоды | Через промпт | Нет | Нет | Нет |
| **Semantic search** | Pre-indexed (Marengo) | Промпт-based | Промпт-based | Нет | Нет | Нет |
| **Chapters** | Нативный API | Через промпт | Через промпт | Через промпт | Нет | Нет |
| **Object tracking** | Сильный, cross-frame | Ограниченный | Ограниченный | Нет (между кадрами) | Отдельная фича ($0.15/мин) | Да |
| **Макс. длительность** | 4 часа (Marengo), 1 час (Pegasus) | ~6 часов (2M контекст) | ~6 часов | Ограничен кадрами | Без лимита | 12 часов (free tier) |
| **Русская речь** | Да (36+ языков) | Да (сильный) | Да | Нет нативного аудио | 50+ языков | 50+ языков |
| **Цена за 1 мин** | $0.063 (index+analyze) | $0.021 (≤200k) | **$0.005** | $0.026-0.23 | $0.025-0.15 (per feature) | Custom |
| **Цена за 1 час** | $3.78 | $1.26 | **$0.36** | $1.56-13.80 | $1.50-9.00 | Custom |
| **Повторные запросы** | $4/1000 (дёшево) | Пересчитываются (дорого) | Пересчитываются | Пересчитываются | — | — |
| **Бенчмарки** | SOTA VideoMME-Long (30+ мин) | 85.2% VideoMME | Ниже Pro | 72% VideoMME | — | — |
### Ключевой инсайт: «проиндексируй раз — ищи многократно»
TwelveLabs заявляет ~36,000x дешевле Gemini для повторных запросов к тому же видео ($0.09/видео-час/месяц vs $4.50/1M токенов за запрос). Но для **одноразового анализа** (highlight detection для одного видео) — Gemini 2.5 Flash в 12x дешевле.
---
## 5. TwelveLabs — глубокий анализ
### Актуальные модели (апрель 2026)
| Модель | Статус | Назначение | Ключевые улучшения |
|--------|--------|-----------|-------------------|
| **Marengo 3.0** | GA (текущая) | Embeddings, Search | 512-dim (было 1024), composed text+image search, спорт, 36 языков, 4 часа видео, 2x быстрее |
| **Pegasus 1.2** | GA (текущая) | Analyze, генерация | 1 час видео, меньше галлюцинаций, SOTA на VideoMME-Long |
| Marengo 2.7 | **Sunset 30 марта 2026** | — | Устарела |
| Pegasus 1.1 | **Discontinued** | — | Автообновлена до 1.2 |
### Подтверждённые цены (Developer plan)
| Компонент | Цена | Подтверждено |
|-----------|------|-------------|
| Video indexing (Marengo/Pegasus) | $0.042/мин ($2.52/час) | ✅ |
| Infrastructure (хранение индексов) | $0.0015/мин ($0.09/час/мес) | ✅ |
| Analyze API input (Pegasus) | $0.021/мин | ✅ |
| Analyze API output | $7.50/1M токенов | ✅ |
| Search API | $4/1000 запросов | ✅ |
| Embed API (video) | $0.042/мин | ✅ |
| **Embed API (audio only)** | **$0.0083/мин** | 🆕 |
| **Embed API (image)** | **$0.10/1000 запросов** | 🆕 |
| **Embed API (text)** | **$0.07/1000 запросов** | 🆕 |
Free tier: 600 минут, 100 видео, 90 дней хранения.
### SDK и интеграция
**Python SDK** (`pip install twelvelabs`, v1.2.1):
```python
from twelvelabs import TwelveLabs
client = TwelveLabs(api_key=API_KEY)
# Highlight detection
res = client.generate.summarize(video_id="...", type="highlight")
for hl in res.highlights:
print(f"{hl.start}s-{hl.end}s: {hl.highlight}")
# Chapter generation
res = client.generate.summarize(video_id="...", type="chapter")
for ch in res.chapters:
print(f"{ch.start}s-{ch.end}s: {ch.chapter_title}")
# Structured JSON output (новое)
result = client.analyze(
video_id="...",
prompt="Extract key moments",
response_format=ResponseFormat(type="json_schema", json_schema={...})
)
```
**Node.js SDK**: `npm install twelvelabs-js` (production-ready).
**OpenAPI spec**: 8,400 строк, доступен в [repo](https://github.com/twelvelabs-io/twelvelabs-developer-experience).
### Ограничения и gotchas
- Текстовый запрос: макс **77 токенов** (Marengo), **500 токенов** (Marengo 3.0)
- Промпт Pegasus: макс **375 токенов**
- Видео: 360x360 — 5184x2160, aspect ratio 1:1 — 2.4:1, мин 4 сек
- Размер файла: макс 200 МБ (прямая загрузка), 4 ГБ (multipart/URL)
- Индексация: только async, нужно poll status или webhook
- **Webhooks только для индексации** — нет для analyze/search/embed
- Rate limits: Free 8 RPM, Dev Tier 1 = 600 RPM (search), автоапгрейд при $200+/мес
### Интеграции из repo
- **Vector Store RAG**: ChromaDB, Weaviate, LanceDB, Oracle
- **Real-time мониторинг**: VideoDB (RTSP feeds)
- **Visual pipelines**: Langflow
- **Chatbot**: Poe
---
## 6. Gemini 2.5 — ключевой новый игрок
### Почему это важно
Gemini 2.5 Flash при $0.005/мин — это **20-60x дешевле TwelveLabs** для одноразового видеоанализа. С 2M-токенным контекстом может обработать ~6 часов видео за один вызов. Это делает highlight detection доступным даже на free tier нашего продукта.
### Pricing per minute video
Видео потребляет **258 токенов/сек** (1 fps). Аудио добавляет **25 токенов/сек**.
| Модель | $/мин (видео) | $/мин (видео+аудио) | $/час | Batch (50% скидка) |
|--------|---------------|---------------------|-------|-------------------|
| **Gemini 2.5 Flash** | **$0.005** | $0.006 | $0.36 | $0.18/час |
| Gemini 2.5 Pro (≤200k) | $0.019 | $0.021 | $1.26 | $0.63/час |
| Gemini 2.5 Pro (>200k) | $0.039 | $0.041 | $2.46 | $1.23/час |
### Gemini vs TwelveLabs: когда что
| Сценарий | Победитель | Почему |
|----------|-----------|-------|
| Одноразовый highlight detection | **Gemini Flash** | 12x дешевле ($0.005 vs $0.063/мин) |
| Точные таймкоды для автоматической нарезки | **TwelveLabs** | Video-native модель, лучше temporal grounding |
| Повторные запросы к библиотеке видео | **TwelveLabs** | Index once, query many ($4/1000 запросов) |
| Object tracking cross-frame | **TwelveLabs** | Архитектурное преимущество |
| Chapter generation | **Gemini Flash** | Достаточно качества, 12x дешевле |
| Semantic search | **TwelveLabs** | Единственный с pre-indexed vector search |
| Budget MVP | **Gemini Flash** | Минимальная стоимость входа |
### GPT-4o/4.1 — не рекомендуется для видео
- **Нет нативного видеовхода** — нужно извлекать кадры (OpenCV/ffmpeg)
- 85 токенов/кадр (low detail), 765 токенов/кадр (high detail)
- $0.026-0.23/мин — **дороже Gemini при худшем качестве**
- Нет аудио из видео (отдельный Whisper)
- Нет встроенных таймкодов
- GPT-4.1: улучшен до 72% VideoMME, но фундаментальное ограничение (кадры) остаётся
---
## 7. Clipping-платформы
### Сравнение API-доступности
| Платформа | API | Цена API | Highlights | Captions | Reframe | Batch | RU |
|-----------|-----|----------|-----------|----------|---------|-------|-----|
| **OpusClip** | Enterprise only | Custom | ✅ 95%+ mAP | ✅ | ✅ | 50 concurrent | Нет |
| **Reap.video** | Все планы ($9.99+) | Включена | ✅ Multi-signal | ✅ 98 языков | ✅ | 5-15 concurrent | ✅ |
| **Vizard** | Paid планы ($20+) | Включена | ✅ | ✅ 100+ языков | ✅ | Minimal API | Неизвестно |
| **Descript** | Нет public API | — | ✅ "Find Good Clips" | ✅ | ✅ | — | Нет |
| **CapCut** | Нет public API | — | ✅ Smart Highlights | ✅ | ✅ | — | Частично |
### OpusClip — подробнее
- **ClipAnything**: мультисигнальный AI (визуал + аудио + сентимент), mAP 0.93
- **Virality Score**: 0-100 эвристика, спорная точность (клипы с низким скором часто работают лучше)
- **API**: Enterprise-only, 30 req/мин, макс 10 часов видео
- **Цены SaaS**: Free 60 мин/мес → Starter $15 (150 мин) → Pro $14.50/мес (annual, 3600/год)
- **Барьер**: API недоступен на обычных планах
### Reap.video — неожиданно сильный
- **API + CLI + MCP** за $9.99/мес — значительно доступнее OpusClip
- **MCP Server** — прямая интеграция с Claude Code и другими AI-агентами
- **Prompt-first clipping**: опиши какие клипы хочешь — AI найдёт
- **98 языков** включая русский для субтитров
- **80 языков** для дубляжа (русский включён)
- **Romanized scripts** (Hinglish, Arabizi) — уникальная фича
### Конкурентная карта (Product Lead)
```
ВЫСОКАЯ ЦЕНА
|
Descript | (Enterprise)
$24-35/мес |
|
OpusClip $29 |
|
Vizard $20-30---+--- ☕ Coffee Project TARGET: $15-29/мес
| Субтитры + Клипы в одном
|
Reap $9.99 |
|
CapCut |
$8-20 |
|
НИЗКАЯ ЦЕНА
|
ТОЛЬКО СУБТИТРЫ -------------- ПОЛНЫЙ REPURPOSING
```
**Позиционирование Coffee Project**: «Единственный инструмент, где субтитры И клипы — first-class citizens в одном workflow, по цене ниже full-editor tax.»
---
## 8. B-Roll генерация
### Text-to-Video модели: текущее состояние
| Модель | Качество | Длительность | $/5-сек клип | Готово для B-Roll? |
|--------|----------|-------------|-------------|-------------------|
| **Runway Gen-4 Turbo** | Хорошее, быстрое | 5-10 сек | $0.25 | Почти, но артефакты |
| **Runway Gen-4.5** | Выше | 5-10 сек | $0.60 | Ближе |
| **Runway Gen-4 Aleph** | Наивысшее (Runway) | 5-10 сек | $0.75 | Ближе |
| **Pika 2.2** (via fal.ai) | Хорошее для соцсетей | 5 сек | **$0.20** | Для некритичного контента |
| **Kling 2.6** | Отличное для природы | 5-10 сек | $0.45-0.50 | Для ландшафтов да |
| **Veo 3.1** (Runway API) | Сильное | 5-10 сек | $1.00 | Дорого |
### Честная оценка ML/AI-инженера: генерация НЕ готова
**Нет, ещё не для профессионального использования.** Причины:
1. **Консистентность**: каждая генерация независима. Нельзя получить два клипа с одинаковым освещением, локацией, камерой.
2. **Длительность**: 5-10 секунд. Реальный B-Roll — 15-60 секунд. Нужно цепочку генераций, что усиливает проблему консистентности.
3. **Артефакты**: даже Runway Gen-4 даёт нарушения физики, несоответствие освещения, «AI-маркеры».
4. **Стоимость**: 5-10 B-Roll клипов × $0.50 (+ 2-3 перегенерации) = $7.50-15 за видео. Стоковое видео дешевле.
### Рекомендация: AI-powered поиск стокового видео
| Сервис | Цена | Библиотека | API | Semantic Search |
|--------|------|-----------|-----|-----------------|
| **Pexels API** | **Бесплатно** | ~150K видео | Да, хорошая документация | Базовый keyword |
| **Storyblocks API** | Подписка | 1M+ видео | Да | Лучшая категоризация |
| **Shutterstock API** | Per-download / подписка | Крупнейшая | Да | AI-powered search |
**Phase 1 (запустить сейчас): Pexels API.**
Pipeline:
1. Транскрипция даёт текстовые сегменты с таймкодами
2. Gemini Flash анализирует сегменты → предлагает ключевые слова для B-Roll
3. Pexels API ищет подходящее стоковое видео
4. Пользователь выбирает из предложений
Бесплатно, реальное видео выглядит профессионально, можно запустить за недели.
**Phase 2 (когда модели созреют): AI-generated B-Roll как premium-опция.** Revisit в Q3 2026 с Runway Gen-5 / Veo 4.
---
## 9. Архитектура интеграции в Coffee Project
### Текущий pipeline (recap)
```
Upload → S3 → Media Probe (ffprobe) → Transcription (Whisper/Google) → Captions (Remotion) → S3
Silence Detection (pydub)
```
**Что есть:**
- 2 STT-движка: LOCAL_WHISPER (default `tiny` — плохое качество), GOOGLE_SPEECH_CLOUD
- Dramatiq actors для всех фоновых задач с webhooks + WebSocket notifications
- Пустое поле `semantic_tags` в `WordNode` — готово для ML-аннотаций
- Silence detection (pydub + librosa)
**Чего нет:**
- Highlight/chapter detection
- Semantic search
- Video intelligence интеграция
- Монетизация (планы, квоты, биллинг)
### Новый модуль: `video_intelligence`
Backend-архитектор рекомендует **один новый модуль** со стандартной 6-файловой структурой:
```
cpv3/modules/video_intelligence/
__init__.py
models.py # VideoIndex model
schemas.py # Index, Highlight, Chapter, Search schemas
repository.py # VideoIndexRepository
service.py # Provider calls, business logic
router.py # API endpoints
```
### Модель данных
```python
class VideoIndex(Base, BaseModelMixin):
user_id: UUID # FK users
project_id: UUID | None # FK projects
source_file_id: UUID # FK files
provider: str # "TWELVE_LABS" | "GEMINI"
provider_index_id: str # Provider-specific ID
provider_video_id: str # Provider video ref
highlights_json: dict | None # Cached highlights (JSONB)
chapters_json: dict | None # Cached chapters (JSONB)
index_status: str # PENDING | INDEXING | READY | FAILED
video_duration_seconds: float
indexing_cost_cents: int | None # Cost tracking
```
Highlights и chapters — JSONB-колонки (не отдельные таблицы), по аналогии с `Transcription.document`.
### Расширенный pipeline
```
Upload → S3 → Media Probe
|
+-----------+-----------+
| |
Transcription Video Index (user-triggered)
(Whisper/Scribe) (TwelveLabs/Gemini)
| |
| +--------+--------+
| | | |
| Highlights Chapters Search
| (Dramatiq) (Dramatiq) (sync endpoint)
| | |
+------+-------+--------+
|
Shorts/Clips Rendering (Remotion)
```
### Режимы операций
| Операция | Режим | Почему |
|----------|-------|-------|
| Video indexing | **Dramatiq (async)** | Минуты обработки |
| Highlight detection | **Dramatiq (async)** | 30-60 сек |
| Chapter generation | **Dramatiq (async)** | 30-60 сек |
| Semantic search | **Sync endpoint** | 1-3 сек ответ |
| B-Roll suggestions | **Sync endpoint** | Быстрый поиск |
### Новые endpoints
**Task endpoints** (async, в `tasks/router.py`):
```
POST /api/tasks/video-index/ → 202 Accepted
POST /api/tasks/highlights-detect/ → 202 Accepted
POST /api/tasks/chapters-generate/ → 202 Accepted
```
**Sync endpoints**`video_intelligence/router.py`):
```
GET /api/video-intelligence/{id}/ → VideoIndexRead
GET /api/video-intelligence/{id}/highlights/ → HighlightsResult
GET /api/video-intelligence/{id}/chapters/ → ChaptersResult
POST /api/video-intelligence/search/ → VideoSearchResponse
POST /api/video-intelligence/broll-suggestions/ → BRollSuggestionResponse
```
### Квоты и контроль расходов
Redis-based per-user quotas:
```python
# Проверка ПЕРЕД созданием Dramatiq task
QUOTA_FREE_INDEX_MINUTES = 60
key = f"vi_quota:{user_id}:indexed_minutes"
# Кэш поисковых запросов (5 мин TTL)
key = f"vi_search_cache:{video_index_id}:{sha256(query)[:16]}"
```
### Ключевые архитектурные решения
1. **НЕТ автоматической цепочки задач.** Frontend контролирует workflow — каждая задача запускается явно.
2. **НЕТ абстрактного провайдер-паттерна** (YAGNI). Простой string selector как в transcription engine.
3. **Retry с backoff для внешних API** (`max_retries=3, min_backoff=15000`) — в отличие от текущих actors с `max_retries=0`.
4. **Highlights/chapters кэшируются в БД** (JSONB). Search кэшируется в Redis (5 мин TTL).
---
## 10. Remotion Pipeline — эволюция
### Shorts/Clips рендеринг
**Гибридный подход FFmpeg + Remotion (2-3x быстрее чистого Remotion):**
| Шаг | Инструмент | Время | Зачем |
|-----|-----------|-------|-------|
| 1. Вырезать клип | FFmpeg `-c copy` | ~1 сек | Stream copy, без перекодирования |
| 2. Рендер с субтитрами | Remotion `ShortVideo` | 10-30 сек на клип | Каппинг + reframe + стили |
| 3. Upload | S3 multipart | ~5 сек | В папку `shorts/` |
**Сравнение для 10-мин видео → 5 Shorts по 1 мин:**
| Подход | Общее время | Ресурсы |
|--------|------------|---------|
| Чистый Remotion (5 рендеров от полного видео) | 5-10 мин | Высокие: 5 Chromium процессов, каждый ищет в 10-мин видео |
| **Гибрид** (FFmpeg нарезка + 5 лёгких рендеров) | **2-5 мин** | Средние: FFmpeg ~5 сек + 5 лёгких Remotion |
| Чистый FFmpeg (без субтитров) | ~10 сек | Минимальные |
### Новая композиция: `ShortVideo`
```typescript
type ShortCompositionProps = {
videoSrc: string;
transcription: Transcription;
fps: number;
styleConfig?: CaptionStyleConfig;
clipStart: number; // Начало в секундах
clipEnd: number; // Конец в секундах
cropConfig?: {
focusX: number; // 0-1, центр кропа
focusY: number;
autoReframe: boolean;
};
};
```
**Адаптация субтитров для вертикального формата:**
- Шрифт: 60-70px (вместо 40)
- Строки на экране: 1, макс 3-4 слова
- Позиция: bottom с отступом 80-100px (UI YouTube Shorts/TikTok/Reels перекрывает низ)
- Max width: 95% от 1080px
- Фон: более непрозрачный
**Auto-reframe:**
- Phase 1: Center crop (простейший, 607x1080 из 1920x1080)
- Phase 2: Speaker-position crop (per-segment `focusX` из ML)
- Phase 3: Per-frame face tracking (future)
### Chapter markers
Простой overlay — НЕ реструктуризация видео:
- `ChapterOverlay` компонент: fade-in заголовок, hold 2 сек, fade-out
- `interpolate()` для анимации (не CSS transitions)
- YouTube chapters metadata — ответственность backend, не Remotion
### B-Roll в Remotion
Самая сложная фича — мультиисточниковый таймлайн:
```typescript
type BRollSegment = {
src: string; // S3 presigned URL
startTime: number; // Когда показать
endTime: number;
mode: "cutaway" | "pip"; // Полная замена или overlay
transitionIn?: "fade" | "slide" | "cut";
audio: "mute" | "duck" | "replace";
};
```
- Использовать `<OffthreadVideo>` (не `<Video>`) — декодирование off-thread
- Docker может потребовать увеличение памяти: 4GB → 6-8GB
- Горизонтальное масштабирование: N контейнеров на одной BullMQ очереди
### Pre-existing bug
`remotion_service/src/themes/default.css:23` — CSS `transition: transform 0.1s ease;` на `.word`. Это browser timer, не Remotion frame clock. В CSS theme mode анимация scale на `.current-word` рендерится непредсказуемо. Inline style mode (с `styleConfig`) не затронут — это основной продакшн-путь.
---
## 11. Продуктовая стратегия и монетизация
### Критическая находка: нулевая инфраструктура монетизации
В кодовой базе **нет**:
- Поля `plan` / `subscription` в User модели
- Трекинга использования (минуты рендера, транскрипции)
- Квот и лимитов
- Интеграции с платёжными системами
- Pricing page / upgrade modal
Это **блокер** для любых платных фич.
### Рекомендуемая тарифная сетка
| | Free | Starter ($15/мес) | Pro ($29/мес) | Agency ($79/мес) |
|---|---|---|---|---|
| Минуты обработки | 30/мес | 150/мес | 400/мес | 1,200/мес |
| Транскрипция | Whisper base | Все движки | Все движки | + приоритет |
| Стили субтитров | 3 базовых | 10 | Все | + кастомный бренд |
| Клипы с видео | Preview only | 5/видео | Безлимит | Безлимит |
| Chapters | Да (бесплатно) | Да | Да | Да |
| Качество экспорта | 720p + watermark | 1080p | 4K | 4K |
| Highlights engine | Transcript-based | Gemini Flash | TwelveLabs | TwelveLabs + analytics |
| API доступ | Нет | Нет | Нет | Да |
| Команда | 1 | 1 | 1 | 5 |
### Unit economics
**Стоимость за минуту обработки:**
| Компонент | $/мин |
|-----------|-------|
| TwelveLabs indexing | $0.042 |
| TwelveLabs infrastructure | $0.0015/мес |
| TwelveLabs search | ~$0.004 |
| Whisper STT (self-hosted) | ~$0.0005 |
| Remotion render (clip) | ~$0.02 |
| S3 storage (amortized) | ~$0.001 |
| **С TwelveLabs** | **~$0.07** |
| **Без TwelveLabs (Gemini)** | **~$0.03** |
**Маржинальность по тарифам:**
| Тариф | Revenue | Avg usage | Cost (с TwelveLabs) | Gross Margin |
|-------|---------|-----------|--------------------|--------------|
| Starter $15 | $15 | ~80 мин | $5.60 | **63%** |
| Pro $29 | $29 | ~200 мин | $14.00 | **52%** |
| Agency $79 | $79 | ~600 мин | $42.00 | **47%** |
### Free tier: TwelveLabs НЕ использовать
Free tier должен использовать **transcript-based highlights** (анализ энергии + ключевых слов из транскрипции) — почти нулевая стоимость. TwelveLabs — только для платных тарифов.
10,000 free users × $2.10/мес TwelveLabs = $21,000/мес. Без TwelveLabs = ~$900/мес.
### Конкурентное позиционирование
| Инструмент | За ~150 мин/мес + субтитры + клипы | Coffee Project эквивалент |
|------------|-----------------------------------|--------------------------|
| OpusClip Starter | $15/мес (клипы, без субтитров) | $15/мес (субтитры + клипы) |
| Vizard Creator | $14.50-30/мес | $15/мес (лучше субтитры) |
| Descript Hobbyist | $24/мес (полный редактор) | $15/мес (focused workflow) |
| Reap | $9.99/мес | $15/мес (больше обработки) |
### Русский рынок
- **Нет локальных конкурентов** в AI video clipping
- Западные инструменты: проблемы с оплатой (Stripe недоступен)
- Платёжные системы: ЮKassa, CloudPayments, Тинькофф
- Цены: ₽990/мес (Starter), ₽1,990/мес (Pro) — на 30-50% ниже USD
- Каналы: VK, Telegram, YouTube (через VPN)
---
## 12. Сводная таблица стоимости
### Расчёт для 100 часов видео/мес (обновлённый)
| Стек | $/мес | Что получаем |
|------|-------|-------------|
| **Gemini 2.5 Flash** (highlights + chapters) | **~$36** | Highlight detection + chapters. Без search |
| **TwelveLabs** (index + infra + analyze + search) | ~$389 | Полный video understanding + semantic search |
| **Gemini Flash + TwelveLabs search** (гибрид) | ~$180 | Flash для анализа, TL для поиска по библиотеке |
| **DeepInfra Whisper** (STT draft) | ~$6 | Черновая транскрипция |
| **ElevenLabs Scribe** (STT prod) | ~$40 | Продакшн транскрипция |
| **Pexels API** (B-Roll search) | **$0** | Поиск стокового видео |
| **Google Video Intelligence** (labels + shots) | ~$450-600 | Метаданные, без highlights |
### Рекомендуемый стек по фазам
| Фаза | Стек | $/мес (100 часов) |
|------|------|-------------------|
| **MVP** | Gemini Flash + DeepInfra Whisper + Pexels | **~$42** |
| **Growth** | Gemini Flash + Scribe v2 + TwelveLabs search | **~$220** |
| **Scale** | TwelveLabs full + Scribe v2 + Pexels + Runway | **~$470** |
---
## 13. Рекомендации и дорожная карта
### Приоритеты (RICE-скоринг от Product Lead)
| Приоритет | Фича | Движок | Effort (dev-weeks) | $/мес (100 users) |
|-----------|-------|--------|--------------------|--------------------|
| **P0** | Upgrade STT → Scribe v2 | ElevenLabs API | 2-3 дня | $40-80 |
| **P0** | Draft STT tier | Whisper v3-turbo (DeepInfra) | 2-3 дня | $6-12 |
| **P0** | Монетизация (планы, квоты, биллинг) | Stripe + ЮKassa | 4-6 недель | — |
| **P1** | Highlight detection MVP | Gemini 2.5 Flash | 1 неделя | $5-15 |
| **P1** | Shorts rendering | FFmpeg + Remotion ShortVideo | 2-3 недели | — |
| **P2** | Chapter generation | Gemini 2.5 Flash | 1 неделя | $5 |
| **P2** | B-Roll suggestions (stock) | Pexels API + Gemini Flash | 2 недели | $5 + $0 |
| **P3** | Premium highlights | TwelveLabs Pegasus 1.2 | 1 неделя | $50-200 |
| **P3** | Semantic video search | TwelveLabs Marengo 3.0 | 2 недели | $20-50 |
| **P4** | AI-generated B-Roll | Runway Gen-4 API | 1 неделя | Variable |
### Фазы реализации
**Pre-Phase: Монетизация (4-6 недель, параллельно с Phase 1)**
- `plan`, `plan_expires_at`, `usage_minutes_current/limit` в User модели
- Usage tracking middleware
- Quota enforcement в service layer
- Stripe Checkout + ЮKassa
- Pricing page + upgrade modal
**Phase 1: «Clips» — Highlights + Smart Clipping (8-10 недель)**
- `video_intelligence` модуль
- Gemini Flash интеграция для highlight detection
- Shorts rendering (ShortVideo composition + FFmpeg pre-cut)
- Субтитры на клипах (существующие стили)
- Free tier: transcript-based highlights (preview only, без экспорта)
- Paid: Gemini Flash highlights + экспорт клипов
**Phase 2: Chapters + B-Roll suggestions (4-6 недель)**
- Chapter generation через Gemini Flash
- Chapter overlay в Remotion
- YouTube chapters metadata export
- Pexels API для B-Roll suggestions
- Chapters — бесплатно (activation feature)
**Phase 3: Premium Video Intelligence (future)**
- TwelveLabs для premium highlight detection
- Semantic video search (enterprise)
- Prompt-first clipping
- Batch processing
---
## 14. Красные флаги в текущем коде
Обнаружено агентами при анализе кодовой базы:
### Backend
1. **Whisper default model = `tiny`** (`schemas.py:122`, `service.py:325`). Минимум `base` или `small` для приемлемого качества.
2. **Нет `time_limit` на Dramatiq actor** (`@dramatiq.actor(max_retries=0)`, `service.py:603`). Corrupted файл может заставить воркер висеть бесконечно. Добавить `time_limit=1800000` (30 мин).
3. **Google Speech V1 API**. V2 API имеет модель Chirp — значительно лучше для мультиязычного контента.
4. **Нет кэширования транскрипций**. Actor не проверяет, существует ли транскрипция для того же файла + движка + модели + языка. Повторная транскрипция = потеря денег.
5. **Transcription router обходит service layer** (`transcription/router.py:30-38`) — прямой вызов `TranscriptionRepository` из router. Нарушает паттерн Router → Service → Repository.
6. **Нет пагинации** на `list_all_transcriptions` — возвращает неограниченный список.
7. **Inline error strings** (`transcription/router.py:65`: `detail="Не найдено"`) — нет `ERROR_` константы.
8. **`tasks/service.py` уже 1400+ строк** — новые actors должны делегировать в `video_intelligence/service.py`.
### Remotion
9. **CSS `transition` в `default.css:23`**`transition: transform 0.1s ease;` на `.word` класс. Browser timer, не Remotion frame clock. Непредсказуемый рендеринг в CSS theme mode.
10. **`<Video>` вместо `<OffthreadVideo>`** — для B-Roll с множественными видеоисточниками нужен `<OffthreadVideo>` (декодирование off-thread).
11. **Docker лимиты**: 2 CPU, 4GB RAM, `MAX_CONCURRENT_RENDERS=2`. Shorts batch + B-Roll потребуют увеличения до 6-8GB.
---
## 15. Источники
### STT
- [Artificial Analysis STT Leaderboard](https://artificialanalysis.ai/speech-to-text)
- [ElevenLabs Scribe v2](https://elevenlabs.io/blog/introducing-scribe-v2)
- [ElevenLabs Scribe v2 Realtime](https://elevenlabs.io/blog/scribe-v2-realtime-in-elevenlabs-agents)
- [ElevenLabs API Pricing](https://elevenlabs.io/pricing/api)
- [Deepgram Nova-3 Introduction](https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
- [Deepgram Nova-3 Multilingual WER](https://deepgram.com/learn/nova-3-multilingual-major-wer-improvements-across-languages)
- [Deepgram Models & Languages](https://developers.deepgram.com/docs/models-languages-overview)
- [Deepgram Pricing](https://deepgram.com/pricing)
- [Whisper large-v3-turbo (HuggingFace)](https://huggingface.co/openai/whisper-large-v3-turbo)
- [Whisper large-v3-russian (fine-tuned)](https://huggingface.co/antony66/whisper-large-v3-russian)
- [DeepInfra Whisper API](https://deepinfra.com/openai/whisper-large-v3-turbo/api)
### Video Intelligence
- [TwelveLabs Pricing](https://www.twelvelabs.io/pricing)
- [TwelveLabs Docs](https://docs.twelvelabs.io)
- [TwelveLabs Marengo 3.0](https://www.twelvelabs.io/blog/marengo-3-0)
- [TwelveLabs Pegasus 1.2](https://www.twelvelabs.io/blog/introducing-pegasus-1-2)
- [TwelveLabs Video-to-Text Arena](https://www.twelvelabs.io/blog/video-to-text-arena)
- [TwelveLabs Developer Experience (GitHub)](https://github.com/twelvelabs-io/twelvelabs-developer-experience)
- [Gemini 2.5 Video Understanding](https://developers.googleblog.com/en/gemini-2-5-video-understanding/)
- [Gemini API Pricing](https://ai.google.dev/gemini-api/docs/pricing)
- [GPT-4.1 Multimodal](https://blog.roboflow.com/gpt-4-1-multimodal/)
- [Google Video Intelligence API](https://cloud.google.com/video-intelligence)
### Clipping Platforms
- [OpusClip API](https://help.opus.pro/api-reference/overview)
- [OpusClip Pricing](https://www.opus.pro/pricing)
- [Reap.video API](https://docs.reap.video/api-reference/1_introduction)
- [Reap.video MCP](https://reap.video/mcp)
- [Vizard API Docs](https://docs.vizard.ai/docs/introduction)
- [Descript Pricing](https://www.descript.com/pricing)
- [CapCut Pricing](https://www.gamsgo.com/blog/capcut-pricing)
### B-Roll Generation
- [Runway API Pricing](https://docs.dev.runwayml.com/guides/pricing/)
- [Pika 2.2 on fal.ai](https://fal.ai/models/fal-ai/pika/v2.2/text-to-video)
- [Kling AI Pricing](https://klingai.com/global/dev/pricing)
- [Best Text-to-Video APIs 2026](https://wavespeed.ai/blog/posts/best-text-to-video-api-2026/)
- [Pexels Free API](https://www.pexels.com/api/)
### Market & Competition
- [AI Video Generator Market (Grand View Research)](https://www.grandviewresearch.com/industry-analysis/ai-video-generator-market-report)
- [Descript vs Veed vs Kapwing Growth (YipitData)](https://www.yipitdata.com/resources/blog/descript-vs-veed-vs-kapwing-ai-video-tools)
- [OpusClip Highlight Accuracy](https://www.opus.pro/blog/ai-tools-for-precise-video-highlight-search-accuracy)
- [SaaS Freemium Conversion Benchmarks](https://firstpagesage.com/seo-blog/saas-freemium-conversion-rates/)
---
*Документ подготовлен 8 параллельными исследовательскими агентами: 4 внешних ресёрчера (TwelveLabs repo, TwelveLabs pricing, Other Services, Coffee Architecture) + 4 доменных специалиста (ML/AI Engineer, Backend Architect, Product Lead, Remotion Engineer).*
@@ -0,0 +1,888 @@
# Docker Infrastructure Hardening — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Harden all Docker infrastructure across the monorepo — security, build optimization, service organization, health checks, and networking.
**Architecture:** 4-phase approach: quick config fixes first (no code changes), then Dockerfile improvements, then health endpoints + networking, then resource limits. Each phase produces a working stack.
**Tech Stack:** Docker, Docker Compose, FastAPI (Python), ElysiaJS (Bun/TypeScript), PostgreSQL, Redis, MinIO
---
### Task 1: Add .env to .gitignore files
**Files:**
- Modify: `cofee_backend/.gitignore`
- Modify: `cofee_frontend/.gitignore`
- [ ] **Step 1: Add .env exclusion to backend .gitignore**
Append to `cofee_backend/.gitignore`:
```
# Environment
.env
.env.*
```
- [ ] **Step 2: Add .env exclusion to frontend .gitignore**
The frontend `.gitignore` has `.env*.local` but not `.env` itself. Add before the `# local env files` section in `cofee_frontend/.gitignore`:
```
# Environment
.env
```
Note: Keep the existing `.env*.local` line too.
- [ ] **Step 3: Verify .env files are not tracked**
Run: `git ls-files | grep '\.env'`
Expected: no output. If any .env files are tracked, run `git rm --cached <file>` for each.
- [ ] **Step 4: Commit**
```bash
git add cofee_backend/.gitignore cofee_frontend/.gitignore
git commit -m "fix(infra): add .env to backend and frontend .gitignore"
```
---
### Task 2: Add .env to backend .dockerignore
**Files:**
- Modify: `cofee_backend/.dockerignore`
- [ ] **Step 1: Add .env exclusion**
Add to `cofee_backend/.dockerignore`:
```
.env
.env.*
```
- [ ] **Step 2: Commit**
```bash
git add cofee_backend/.dockerignore
git commit -m "fix(infra): exclude .env from backend Docker build context"
```
---
### Task 3: DRY up docker-compose env vars with YAML anchor
**Files:**
- Modify: `cofee_backend/docker-compose.yml`
The `api` and `worker` services share 14 identical env vars. Extract into an `x-backend-env` anchor. Also adds the missing `JWT_SECRET_KEY` to worker.
- [ ] **Step 1: Add x-backend-env anchor and refactor services**
Replace the entire `cofee_backend/docker-compose.yml` with:
```yaml
x-backend-image: &backend-image
image: cpv3-backend:dev
build:
context: .
dockerfile: Dockerfile
target: dev
x-backend-env: &backend-env
DEBUG: ${DEBUG:-1}
JWT_SECRET_KEY: ${JWT_SECRET_KEY:-dev-secret}
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
POSTGRES_HOST: db
POSTGRES_PORT: 5432
POSTGRES_DATABASE: ${POSTGRES_DATABASE:-coffee_project_db}
STORAGE_BACKEND: ${STORAGE_BACKEND:-S3}
S3_ACCESS_KEY: ${MINIO_ROOT_USER:-minioadmin}
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-minioadmin}
S3_BUCKET_NAME: ${S3_BUCKET_NAME:-coffee-bucket}
S3_ENDPOINT_URL_INTERNAL: http://minio:9000
S3_ENDPOINT_URL_PUBLIC: http://localhost:9000
REDIS_URL: redis://redis:6379/0
WEBHOOK_BASE_URL: http://api:8000
REMOTION_SERVICE_URL: ${REMOTION_SERVICE_URL:-http://remotion:3001}
services:
db:
container_name: cpv3_postgres
image: postgres:16
restart: unless-stopped
environment:
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
POSTGRES_DB: ${POSTGRES_DATABASE:-coffee_project_db}
ports:
- "127.0.0.1:5332:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-coffee_project_db}"]
interval: 5s
timeout: 3s
retries: 20
volumes:
- cpv3_db:/var/lib/postgresql/data
minio:
container_name: cpv3_minio
image: minio/minio:RELEASE.2024-11-07T00-52-20Z
restart: unless-stopped
ports:
- "127.0.0.1:9000:9000"
- "127.0.0.1:9001:9001"
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minioadmin}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minioadmin}
command: server /data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- cpv3_minio:/data
redis:
container_name: cpv3_redis
image: redis:7-alpine
restart: unless-stopped
ports:
- "127.0.0.1:6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 10
volumes:
- cpv3_redis:/data
api:
container_name: cpv3_api
<<: *backend-image
restart: unless-stopped
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
environment:
<<: *backend-env
ports:
- "127.0.0.1:8000:8000"
volumes:
- ./cpv3:/app/cpv3
- ./alembic:/app/alembic
- ./alembic.ini:/app/alembic.ini
worker:
container_name: cpv3_worker
<<: *backend-image
restart: unless-stopped
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
environment:
<<: *backend-env
command: >
watchfiles --filter python 'dramatiq cpv3.modules.tasks.service --processes 1 --threads 2' /app/cpv3
volumes:
- ./cpv3:/app/cpv3
volumes:
cpv3_db:
cpv3_minio:
cpv3_redis:
```
Key changes in this file:
- `x-backend-env` anchor with all shared env vars (DRY)
- `JWT_SECRET_KEY` added to worker (was missing)
- `restart: unless-stopped` on all services
- All ports bound to `127.0.0.1` (not `0.0.0.0`)
- MinIO pinned to `RELEASE.2024-11-07T00-52-20Z`
- MinIO health check added (`curl` on `/minio/health/live`)
- Removed inline comments for cleanliness
- [ ] **Step 2: Validate compose syntax**
Run: `cd cofee_backend && docker compose config > /dev/null`
Expected: no errors.
- [ ] **Step 3: Test stack starts**
Run: `cd cofee_backend && docker compose up -d`
Wait 30s, then: `docker compose ps`
Expected: all services `Up` or `Up (healthy)`.
- [ ] **Step 4: Commit**
```bash
git add cofee_backend/docker-compose.yml
git commit -m "refactor(infra): DRY env vars, pin images, bind localhost, add restart policies"
```
---
### Task 4: Move build-essential out of base stage in backend Dockerfile
**Files:**
- Modify: `cofee_backend/Dockerfile`
`build-essential` is only needed during `uv sync` (compiling C extensions). Moving it from `base` to `deps` saves ~200MB in the prod image since the `prod` stage inherits from `deps` but the compiled artifacts are in `.venv`, not the system packages.
- [ ] **Step 1: Restructure Dockerfile stages**
Replace the entire `cofee_backend/Dockerfile` with:
```dockerfile
# syntax=docker/dockerfile:1.7
# ---------------------------------------------------------------------------
# Stage 1: base — minimal runtime dependencies (shared by dev and prod)
# ---------------------------------------------------------------------------
FROM python:3.11-slim AS base
COPY --from=ghcr.io/astral-sh/uv:0.8.15 /uv /uvx /bin/
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PATH="/app/.venv/bin:${PATH}"
WORKDIR /app
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
# ---------------------------------------------------------------------------
# Stage 2: deps — install Python dependencies (build-essential here only)
# ---------------------------------------------------------------------------
FROM base AS deps
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-install-project
# ---------------------------------------------------------------------------
# Stage 3: dev — development target (used by docker-compose)
# ---------------------------------------------------------------------------
FROM deps AS dev
ENV PYTHONPATH=/app
EXPOSE 8000
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir /app/cpv3"]
# ---------------------------------------------------------------------------
# Stage 4: prod — production target (no build-essential, non-root user)
# ---------------------------------------------------------------------------
FROM base AS prod
RUN groupadd --gid 1000 app && \
useradd --uid 1000 --gid app --create-home app
COPY --from=deps /app/.venv /app/.venv
COPY pyproject.toml uv.lock ./
ENV UV_LINK_MODE=copy
COPY cpv3 ./cpv3
COPY alembic ./alembic
COPY alembic.ini ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev
RUN chown -R app:app /app
USER app
EXPOSE 8000
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000"]
```
Key changes:
- `build-essential` moved from `base` to `deps` — prod image is ~200MB smaller
- `prod` stage inherits from `base` (not `deps`) — no compiler in production
- `prod` copies only `.venv` from `deps` stage — gets compiled packages without build tools
- Non-root `app` user (uid 1000) added to `prod` stage
- `dev` stage still inherits from `deps` (has build-essential for potential ad-hoc installs)
- [ ] **Step 2: Build and verify prod stage**
Run: `cd cofee_backend && docker build --target prod -t cpv3-backend:prod-test .`
Expected: builds successfully.
- [ ] **Step 3: Build and verify dev stage**
Run: `cd cofee_backend && docker build --target dev -t cpv3-backend:dev-test .`
Expected: builds successfully.
- [ ] **Step 4: Verify dev stack still works**
Run: `cd cofee_backend && docker compose up -d --build`
Wait 30s, then: `docker compose ps`
Expected: all services running.
- [ ] **Step 5: Commit**
```bash
git add cofee_backend/Dockerfile
git commit -m "perf(infra): move build-essential to deps stage, add non-root user to prod"
```
---
### Task 5: Add BuildKit cache mounts and non-root user to Remotion Dockerfile
**Files:**
- Modify: `remotion_service/Dockerfile`
- [ ] **Step 1: Update Remotion Dockerfile**
Replace the entire `remotion_service/Dockerfile` with:
```dockerfile
# syntax=docker/dockerfile:1.7-labs
FROM oven/bun:1.3.10 AS base
ENV APP_HOME=/app \
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 \
REMOTION_PUPPETEER_NO_SANDBOX=1 \
NODE_ENV=production
WORKDIR ${APP_HOME}
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
ca-certificates \
ffmpeg \
chromium \
libglib2.0-0 \
libnss3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libdrm2 \
libxkbcommon0 \
libgbm1 \
fonts-noto-color-emoji \
curl \
&& rm -rf /var/lib/apt/lists/*
FROM base AS deps
WORKDIR ${APP_HOME}
COPY package.json bun.lock ./
RUN NODE_ENV=development bun install --frozen-lockfile
FROM base AS runner
WORKDIR ${APP_HOME}
RUN groupadd --gid 1000 app && \
useradd --uid 1000 --gid app --create-home app
COPY --from=deps ${APP_HOME}/node_modules ./node_modules
COPY package.json bun.lock ./
COPY tsconfig.json remotion.config.ts ./
COPY public ./public
COPY src ./src
COPY server ./server
RUN mkdir -p out && chown -R app:app /app
USER app
EXPOSE 3001
CMD ["bun", "run", "server"]
```
Key changes:
- BuildKit apt cache mounts added (matches backend pattern)
- Non-root `app` user (uid 1000) in runner stage
- `chown` before `USER app` so the app owns all files including `out/`
- [ ] **Step 2: Build and verify**
Run: `cd remotion_service && docker build --target runner -t remotion:test .`
Expected: builds successfully.
- [ ] **Step 3: Commit**
```bash
git add remotion_service/Dockerfile
git commit -m "perf(infra): add BuildKit cache mounts and non-root user to Remotion Dockerfile"
```
---
### Task 6: Add resource limits and cap_drop to Remotion docker-compose
**Files:**
- Modify: `remotion_service/docker-compose.yml`
- [ ] **Step 1: Update Remotion docker-compose.yml**
Replace the entire `remotion_service/docker-compose.yml` with:
```yaml
services:
remotion:
build:
context: .
dockerfile: Dockerfile
target: runner
command: >
sh -lc "NODE_ENV=development bun install --frozen-lockfile && bun run server"
restart: unless-stopped
env_file: .env
environment:
S3_ENDPOINT_URL: http://minio:9000
REDIS_URL: redis://redis:6379/0
ports:
- "127.0.0.1:3001:3001"
deploy:
resources:
limits:
memory: 4g
cpus: "2"
reservations:
memory: 1g
cpus: "0.5"
cap_drop:
- ALL
cap_add:
- SYS_ADMIN
volumes:
- .:/app:cached
- remotion_node_modules:/app/node_modules
networks:
- backend
stdin_open: true
tty: true
volumes:
remotion_node_modules:
networks:
backend:
external: true
name: cofee_backend_default
```
Key changes:
- `restart: unless-stopped`
- Port bound to `127.0.0.1`
- Resource limits: 4GB memory / 2 CPUs (Chromium + FFmpeg need this)
- Resource reservations: 1GB / 0.5 CPU (scheduling guarantees)
- `cap_drop: ALL` + `cap_add: SYS_ADMIN` (SYS_ADMIN needed for Chromium sandbox)
- [ ] **Step 2: Validate compose syntax**
Run: `cd remotion_service && docker compose config > /dev/null`
Expected: no errors.
- [ ] **Step 3: Commit**
```bash
git add remotion_service/docker-compose.yml
git commit -m "fix(infra): add resource limits, cap_drop, restart policy to Remotion compose"
```
---
### Task 7: Add resource limits and cap_drop to backend docker-compose
**Files:**
- Modify: `cofee_backend/docker-compose.yml`
- [ ] **Step 1: Add deploy and cap_drop sections to each service**
Add to the `db` service after `volumes`:
```yaml
cap_drop:
- ALL
cap_add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
```
Add to the `minio` service after `volumes`:
```yaml
cap_drop:
- ALL
cap_add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
```
Add to the `redis` service after `volumes`:
```yaml
cap_drop:
- ALL
```
Add to the `api` service after `volumes`:
```yaml
deploy:
resources:
limits:
memory: 512m
cpus: "1"
cap_drop:
- ALL
```
Add to the `worker` service after `volumes`:
```yaml
deploy:
resources:
limits:
memory: 1g
cpus: "1"
cap_drop:
- ALL
```
- [ ] **Step 2: Validate compose syntax**
Run: `cd cofee_backend && docker compose config > /dev/null`
Expected: no errors.
- [ ] **Step 3: Commit**
```bash
git add cofee_backend/docker-compose.yml
git commit -m "fix(infra): add resource limits and capability dropping to backend compose"
```
---
### Task 8: Add health check endpoint to backend API
**Files:**
- Modify: `cofee_backend/cpv3/modules/system/router.py`
The existing `/api/ping/` only returns a static response. We need a `/api/health/` endpoint that checks DB and Redis connectivity for Docker health checks.
- [ ] **Step 1: Add health endpoint to system router**
Replace the contents of `cofee_backend/cpv3/modules/system/router.py` with:
```python
from __future__ import annotations
from fastapi import APIRouter, Depends
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession
from cpv3.db.session import get_db
from cpv3.infrastructure.settings import get_settings
router = APIRouter(prefix="/api", tags=["System"])
_settings = get_settings()
@router.get("/ping/")
async def ping() -> dict[str, str]:
return {"status": "ok"}
@router.get("/health/")
async def health(db: AsyncSession = Depends(get_db)) -> dict[str, str]:
"""Health check for Docker/K8s probes. Verifies DB connectivity."""
try:
await db.execute(text("SELECT 1"))
db_status = "connected"
except Exception:
db_status = "disconnected"
status = "ok" if db_status == "connected" else "degraded"
return {"status": status, "database": db_status}
```
- [ ] **Step 2: Run linter**
Run: `cd cofee_backend && uv run ruff check cpv3/modules/system/router.py`
Expected: no errors.
- [ ] **Step 3: Run existing tests**
Run: `cd cofee_backend && uv run pytest -x -q 2>&1 | tail -10`
Expected: all tests pass (health endpoint is additive, no breaking changes).
- [ ] **Step 4: Commit**
```bash
git add cofee_backend/cpv3/modules/system/router.py
git commit -m "feat(backend): add /api/health/ endpoint for Docker health checks"
```
---
### Task 9: Add health check endpoint to Remotion service
**Files:**
- Modify: `remotion_service/server/index.ts`
- [ ] **Step 1: Add /health endpoint before app.listen**
Add before the `app.listen(...)` line (around line 138) in `remotion_service/server/index.ts`:
```typescript
app.get("/health", async () => {
return { status: "ok" };
});
```
Note: This is outside the `/api` prefix since it's at the Elysia instance level. The endpoint will be available at `GET /api/health` because the Elysia instance has `prefix: "/api"`.
- [ ] **Step 2: Type check**
Run: `cd remotion_service && bunx tsc --noEmit`
Expected: no new errors.
- [ ] **Step 3: Commit**
```bash
git add remotion_service/server/index.ts
git commit -m "feat(remotion): add /api/health endpoint for Docker health checks"
```
---
### Task 10: Add health checks for api, worker, and remotion in compose files
**Files:**
- Modify: `cofee_backend/docker-compose.yml`
- Modify: `remotion_service/docker-compose.yml`
- [ ] **Step 1: Add healthcheck to api service**
Add to `api` service in `cofee_backend/docker-compose.yml` (after `depends_on`):
```yaml
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health/')"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
```
- [ ] **Step 2: Add healthcheck to worker service**
The worker has no HTTP port. Use a process check. Add to `worker` service:
```yaml
healthcheck:
test: ["CMD-SHELL", "pgrep -f dramatiq || exit 1"]
interval: 15s
timeout: 5s
retries: 3
```
- [ ] **Step 3: Add healthcheck to remotion service**
Add to `remotion` service in `remotion_service/docker-compose.yml` (after `environment`):
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3001/api/health"]
interval: 10s
timeout: 5s
retries: 5
start_period: 15s
```
- [ ] **Step 4: Validate both compose files**
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
Expected: no errors.
- [ ] **Step 5: Commit**
```bash
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
git commit -m "feat(infra): add health checks to api, worker, and remotion services"
```
---
### Task 11: Add network segmentation to backend compose
**Files:**
- Modify: `cofee_backend/docker-compose.yml`
Currently all services share one flat network. Separate into `db-net` (data stores) and `app-net` (application services). This prevents Remotion from reaching DB/Redis directly.
- [ ] **Step 1: Add networks to compose**
Add at the bottom of `cofee_backend/docker-compose.yml`, replacing the existing `volumes:` section:
```yaml
volumes:
cpv3_db:
cpv3_minio:
cpv3_redis:
networks:
db-net:
driver: bridge
app-net:
driver: bridge
```
- [ ] **Step 2: Add network assignments to each service**
Add to `db`:
```yaml
networks:
- db-net
```
Add to `redis`:
```yaml
networks:
- db-net
```
Add to `minio`:
```yaml
networks:
- db-net
- app-net
```
Add to `api`:
```yaml
networks:
- db-net
- app-net
```
Add to `worker`:
```yaml
networks:
- db-net
- app-net
```
- [ ] **Step 3: Update Remotion compose to use app-net**
In `remotion_service/docker-compose.yml`, change the networks section:
```yaml
networks:
backend:
external: true
name: cofee_backend_app-net
```
This ensures Remotion can reach MinIO and API (on `app-net`) but NOT PostgreSQL or Redis (on `db-net`).
- [ ] **Step 4: Validate both compose files**
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
Expected: no errors.
- [ ] **Step 5: Test full stack connectivity**
Run:
```bash
cd cofee_backend && docker compose down && docker compose up -d
# Wait for healthy
cd ../remotion_service && docker compose down && docker compose up -d
```
Verify API can reach DB, Redis, MinIO. Verify Remotion can reach MinIO but NOT DB.
- [ ] **Step 6: Commit**
```bash
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
git commit -m "feat(infra): add network segmentation — db-net and app-net isolation"
```
---
### Task 12: Final verification
- [ ] **Step 1: Bring down everything**
```bash
cd cofee_backend && docker compose down
cd ../remotion_service && docker compose down
```
- [ ] **Step 2: Clean build**
```bash
cd cofee_backend && docker compose build --no-cache
cd ../remotion_service && docker compose build --no-cache
```
- [ ] **Step 3: Start backend stack**
```bash
cd cofee_backend && docker compose up -d
```
Wait for: `docker compose ps` shows all services healthy.
- [ ] **Step 4: Start Remotion stack**
```bash
cd remotion_service && docker compose up -d
```
Wait for: `docker compose ps` shows remotion healthy.
- [ ] **Step 5: Test API health**
Run: `curl http://127.0.0.1:8000/api/health/`
Expected: `{"status":"ok","database":"connected"}`
- [ ] **Step 6: Test Remotion health**
Run: `curl http://127.0.0.1:3001/api/health`
Expected: `{"status":"ok"}`
- [ ] **Step 7: Verify port binding**
Run: `docker compose -f cofee_backend/docker-compose.yml ps --format '{{.Name}} {{.Ports}}'`
Expected: all ports show `127.0.0.1:XXXX->YYYY/tcp` (not `0.0.0.0`).
- [ ] **Step 8: Verify resource limits**
Run: `docker inspect cpv3_api --format '{{.HostConfig.Memory}}'`
Expected: `536870912` (512MB).
Run: `docker inspect remotion --format '{{.HostConfig.Memory}}'`
Expected: `4294967296` (4GB).
@@ -0,0 +1,478 @@
# Subtitle Revision Workspace Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Redesign the subtitle-revision screen into a more cohesive editorial workspace while staying inside the current frontend design system.
**Architecture:** Keep the existing component boundaries and logic intact, then improve hierarchy through coordinated shell styling and small presentation-only markup changes. The work is isolated to the shared stepper chrome, the subtitle-revision step layout, the transcription editor surface, and the timeline dock so the redesign remains low-risk and easy to verify.
**Tech Stack:** Next.js 16, React, TypeScript, SCSS Modules, Vidstack, Lucide, Chrome DevTools MCP
---
## File Structure
- Modify: `cofee_frontend/src/shared/ui/Stepper/Stepper.module.scss`
Purpose: Reduce stepper visual dominance and align it with the calmer workspace shell.
- Modify: `cofee_frontend/src/widgets/ProjectWizard/ProjectWizard.module.scss`
Purpose: Introduce softer page-level spacing/canvas treatment around the active step content.
- Modify: `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx`
Purpose: Add minimal presentational structure for player/editor panel headers and shell grouping.
- Modify: `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.module.scss`
Purpose: Build the unified editorial workspace shell and responsive balanced split behavior.
- Modify: `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.tsx`
Purpose: Add small semantic wrappers for a stronger editor header and cleaner segment metadata grouping.
- Modify: `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.module.scss`
Purpose: Redesign the editor surface, segment cards, and add-segment action within current tokens.
- Modify: `cofee_frontend/src/widgets/TimelinePanel/TimelinePanel.module.scss`
Purpose: Make the timeline feel like a docked rail within the same workspace shell.
## Task 1: Soften the Stepper and Page Canvas
**Files:**
- Modify: `cofee_frontend/src/shared/ui/Stepper/Stepper.module.scss`
- Modify: `cofee_frontend/src/widgets/ProjectWizard/ProjectWizard.module.scss`
- Test: `cd cofee_frontend && bunx tsc --noEmit`
- [ ] **Step 1: Inspect the current stepper and wizard shell before editing**
Run:
```bash
sed -n '1,220p' cofee_frontend/src/shared/ui/Stepper/Stepper.module.scss
sed -n '1,220p' cofee_frontend/src/widgets/ProjectWizard/ProjectWizard.module.scss
```
Expected: confirm the current stepper uses a saturated active pill and the wizard root is mostly structural with minimal page-level styling.
- [ ] **Step 2: Update the stepper to feel quieter and more integrated**
Apply changes in `cofee_frontend/src/shared/ui/Stepper/Stepper.module.scss` so the active step is calmer and the bar reads as context instead of a hero element.
Use this shape for the key selectors:
```scss
.root {
position: relative;
background: linear-gradient(180deg, variables.$bg-default 0%, variables.$bg-surface 100%);
border-bottom: 1px solid variables.$border-subtle;
}
.scrollContainer {
gap: 10px;
padding: 18px 28px 14px;
}
.step {
padding: 8px 14px 8px 8px;
border-radius: 999px;
background: rgba(255, 255, 255, 0.42);
border: 1px solid transparent;
}
.stepActive {
background: variables.$bg-surface;
border-color: rgba(139, 92, 246, 0.16);
box-shadow: 0 10px 24px rgba(24, 24, 27, 0.06);
}
.stepCompleted {
background: rgba(255, 255, 255, 0.28);
}
```
- [ ] **Step 3: Give the wizard a softer canvas around the active step**
Update `cofee_frontend/src/widgets/ProjectWizard/ProjectWizard.module.scss` so the content area gets breathing room without changing behavior.
Use this shape:
```scss
.root {
display: flex;
flex-direction: column;
height: calc(100vh - var(--header-height));
overflow: hidden;
background: linear-gradient(180deg, variables.$bg-default 0%, rgba(255, 255, 255, 0.55) 100%);
}
.content {
flex: 1;
display: flex;
flex-direction: column;
overflow-y: auto;
min-height: 0;
padding: 18px 24px 24px;
}
```
- [ ] **Step 4: Run the frontend type-check after the shell changes**
Run:
```bash
cd cofee_frontend && bunx tsc --noEmit
```
Expected: exit code `0`.
- [ ] **Step 5: Commit the shell changes**
```bash
git add cofee_frontend/src/shared/ui/Stepper/Stepper.module.scss cofee_frontend/src/widgets/ProjectWizard/ProjectWizard.module.scss
git commit -m "feat: refine project wizard shell"
```
## Task 2: Build the Subtitle Revision Workspace Shell
**Files:**
- Modify: `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx`
- Modify: `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.module.scss`
- Test: `cd cofee_frontend && bunx tsc --noEmit`
- [ ] **Step 1: Inspect the current subtitle-revision markup and styles**
Run:
```bash
sed -n '1,260p' cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx
sed -n '1,260p' cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.module.scss
```
Expected: confirm the current main grid, timeline, and footer are separate blocks with minimal shared shell styling.
- [ ] **Step 2: Add panel headers and a single workspace shell in the TSX**
Update `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx` so the player and editor live inside named panels.
Use this structure inside the `MediaPlayer` content:
```tsx
<div className={styles.workspaceShell}>
<div className={styles.mainGrid}>
<section className={styles.panel}>
<div className={styles.panelHeader}>
<div>
<p className={styles.eyebrow}>Просмотр</p>
<h3 className={styles.panelTitle}>Видео проекта</h3>
</div>
</div>
<div className={styles.playerColumn}>...</div>
</section>
<section className={styles.panel}>
<div className={styles.panelHeader}>
<div>
<p className={styles.eyebrow}>Редактор</p>
<h3 className={styles.panelTitle}>Транскрипция</h3>
</div>
</div>
<div className={styles.editorColumn}>...</div>
</section>
</div>
<div className={styles.timelineWrapper}>...</div>
<div className={styles.footer}>...</div>
</div>
```
- [ ] **Step 3: Style the workspace shell, balanced split, and responsive stack**
Update `cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.module.scss` with a single rounded shell, two equal panels, and a docked lower rail.
Use this shape for the key selectors:
```scss
.workspaceShell {
display: flex;
flex-direction: column;
flex: 1;
min-height: 0;
border: 1px solid rgba(24, 24, 27, 0.08);
border-radius: variables.$radius-lg;
background: linear-gradient(180deg, rgba(255, 255, 255, 0.72) 0%, variables.$bg-surface 100%);
box-shadow: 0 18px 48px rgba(24, 24, 27, 0.08);
overflow: hidden;
}
.mainGrid {
display: grid;
grid-template-columns: minmax(0, 1fr) minmax(0, 1fr);
gap: 18px;
padding: 20px;
flex: 1;
min-height: 0;
}
.panel {
display: flex;
flex-direction: column;
min-height: 0;
border: 1px solid rgba(24, 24, 27, 0.08);
border-radius: variables.$radius-lg;
background: rgba(255, 255, 255, 0.58);
overflow: hidden;
}
```
Add responsive collapse:
```scss
@media (max-width: 1024px) {
.mainGrid {
grid-template-columns: 1fr;
}
}
```
- [ ] **Step 4: Verify the layout still type-checks**
Run:
```bash
cd cofee_frontend && bunx tsc --noEmit
```
Expected: exit code `0`.
- [ ] **Step 5: Commit the workspace shell changes**
```bash
git add cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.tsx cofee_frontend/src/features/project/SubtitleRevisionStep/SubtitleRevisionStep.module.scss
git commit -m "feat: redesign subtitle revision workspace shell"
```
## Task 3: Redesign the Transcription Editor Surface
**Files:**
- Modify: `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.tsx`
- Modify: `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.module.scss`
- Test: `cd cofee_frontend && bunx tsc --noEmit`
- [ ] **Step 1: Inspect the current transcription editor structure**
Run:
```bash
sed -n '1,320p' cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.tsx
sed -n '1,320p' cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.module.scss
```
Expected: confirm the current editor has a plain header, dense segment rows, and a dashed add button.
- [ ] **Step 2: Add semantic wrappers for a stronger header and cleaner metadata row**
Update `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.tsx` with small presentation-only wrappers.
Use this shape:
```tsx
<div className={styles.headerMeta}>
<p className={styles.kicker}>Редактура</p>
<h3 className={styles.title}>Редактор транскрипции</h3>
</div>
<div className={styles.segmentMetaRow}>
<div className={styles.timesGroup}>...</div>
<div className={styles.actionsGroup}>...</div>
</div>
```
For each timing field, wrap the label and input in a chip-like container:
```tsx
<label className={styles.timeChip}>
<span className={styles.timeLabelText}>Начало</span>
<input className={styles.timeInput} ... />
</label>
```
- [ ] **Step 3: Rework the editor styling into a calmer editorial surface**
Update `cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.module.scss` so the editor looks less like a raw form and more like a reading/editing workspace.
Use this shape for the key selectors:
```scss
.root {
display: flex;
flex-direction: column;
height: 100%;
min-height: 0;
background: transparent;
}
.header {
display: flex;
align-items: center;
justify-content: space-between;
padding: 18px 20px 14px;
border-bottom: 1px solid rgba(24, 24, 27, 0.08);
background: rgba(255, 255, 255, 0.68);
}
.segment {
border: 1px solid rgba(24, 24, 27, 0.08);
border-radius: variables.$radius-lg;
padding: 14px;
background: rgba(255, 255, 255, 0.82);
box-shadow: 0 8px 24px rgba(24, 24, 27, 0.04);
}
.timeChip {
display: inline-flex;
align-items: center;
gap: 8px;
padding: 6px 10px;
border-radius: 999px;
background: variables.$bg-hover;
border: 1px solid transparent;
}
.textArea {
padding: 14px 16px;
border-radius: variables.$radius-md;
line-height: 1.65;
background: rgba(244, 244, 245, 0.92);
}
```
Replace the dashed add button treatment with a quieter inset surface:
```scss
.addButton {
margin: 0 20px 18px;
padding: 12px 14px;
border: 1px solid rgba(24, 24, 27, 0.08);
border-radius: variables.$radius-md;
background: rgba(255, 255, 255, 0.6);
}
```
- [ ] **Step 4: Run the frontend type-check after the editor redesign**
Run:
```bash
cd cofee_frontend && bunx tsc --noEmit
```
Expected: exit code `0`.
- [ ] **Step 5: Commit the editor redesign**
```bash
git add cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.tsx cofee_frontend/src/features/project/TranscriptionEditor/TranscriptionEditor.module.scss
git commit -m "feat: refine transcription editor presentation"
```
## Task 4: Dock the Timeline and Verify in Chrome
**Files:**
- Modify: `cofee_frontend/src/widgets/TimelinePanel/TimelinePanel.module.scss`
- Test: `cd cofee_frontend && bunx tsc --noEmit`
- Verify: Chrome at `http://localhost:3000/projects/83eb1396-8217-4ceb-ae32-b3b63cd01982`
- [ ] **Step 1: Inspect the current timeline chrome**
Run:
```bash
sed -n '1,260p' cofee_frontend/src/widgets/TimelinePanel/TimelinePanel.module.scss
```
Expected: confirm the toolbar and label column are functional but visually flatter and less integrated with the workspace shell.
- [ ] **Step 2: Update the timeline dock styling to match the workspace**
Modify `cofee_frontend/src/widgets/TimelinePanel/TimelinePanel.module.scss` so the toolbar, labels column, and scroll area feel like a lower editing rail.
Use this shape:
```scss
.root {
display: flex;
flex-direction: column;
align-self: stretch;
height: 100%;
min-width: 0;
overflow: hidden;
background: rgba(255, 255, 255, 0.56);
}
.toolbar {
height: 40px;
padding: 0 14px;
border-bottom: 1px solid rgba(24, 24, 27, 0.08);
background: rgba(255, 255, 255, 0.72);
}
.labelsColumn {
width: 68px;
background: rgba(255, 255, 255, 0.48);
}
.zoomBtn {
width: 28px;
height: 28px;
border-radius: 999px;
}
```
- [ ] **Step 3: Run the frontend type-check before browser verification**
Run:
```bash
cd cofee_frontend && bunx tsc --noEmit
```
Expected: exit code `0`.
- [ ] **Step 4: Verify the redesigned screen in Chrome**
Check the route:
```text
http://localhost:3000/projects/83eb1396-8217-4ceb-ae32-b3b63cd01982
```
Verify all of the following:
- the stepper is still readable but less dominant
- the player and editor read as one shell
- the desktop split still feels balanced
- the transcription cards are calmer and easier to scan
- the timeline feels docked to the workspace
- the footer stays visually anchored
- the layout still holds together at a narrower viewport
- [ ] **Step 5: Commit the timeline and verification-backed finish**
```bash
git add cofee_frontend/src/widgets/TimelinePanel/TimelinePanel.module.scss
git commit -m "feat: align timeline dock with subtitle workspace"
```
## Self-Review
### Spec Coverage
- Quieter stepper: covered by Task 1
- Single workspace shell: covered by Task 2
- Stronger transcription editor hierarchy: covered by Task 3
- Docked timeline integration: covered by Task 4
- Responsive balanced split: covered by Task 2 and Task 4 browser verification
- Design-system constraint: enforced in every task by reusing existing tokens and limiting scope to SCSS/module presentation
### Placeholder Scan
- No `TODO`, `TBD`, or deferred implementation notes remain
- Each task lists exact files and commands
- Each styling task includes concrete selector/code shapes instead of abstract guidance
### Type Consistency
- `workspaceShell`, `panel`, `panelHeader`, `eyebrow`, and `panelTitle` are introduced in the step component only
- `headerMeta`, `kicker`, `segmentMetaRow`, and `timeChip` are introduced in the editor only
- No new logic APIs or renamed behavioral props are required
@@ -0,0 +1,687 @@
# Subtitle Preset Grid Redesign Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Redesign preset preview cards to match uploaded video aspect ratio with modern visual refresh and style characteristics display
**Architecture:** Fetch video metadata to calculate aspect ratio, pass to preset cards via props. Update StylePreview for dynamic sizing. Add loading skeleton state. Use Catppuccin Mocha color palette matching the project theme.
**Tech Stack:** React, TypeScript, SCSS Modules, TanStack Query, openapi-react-query
**Design Spec:** `docs/superpowers/specs/2026-04-06-subtitle-preset-grid-redesign.md`
---
## File Structure
| File | Purpose |
|------|---------|
| `src/features/project/CaptionSettingsStep/useVideoMetadata.ts` | New hook to fetch video metadata and calculate aspect ratio |
| `src/features/project/CaptionSettingsStep/PresetGrid.tsx` | Modified - adds aspect ratio fetching, loading state, passes ratio to cards |
| `src/features/project/CaptionSettingsStep/PresetGrid.module.scss` | Modified - grid layout, responsive styles |
| `src/features/project/CaptionSettingsStep/PresetCard.tsx` | Modified - adds style characteristics display, checkmark indicator, updated styling |
| `src/features/project/CaptionSettingsStep/PresetCard.module.scss` | Modified - new card design with Catppuccin Mocha colors |
| `src/features/project/CaptionSettingsStep/StylePreview.tsx` | Modified - accepts aspectRatio prop for dynamic sizing |
| `src/features/project/CaptionSettingsStep/StylePreview.module.scss` | Modified - dynamic aspect-ratio container |
| `src/features/project/CaptionSettingsStep/PresetCardSkeleton.tsx` | New - skeleton loading component for preset cards |
| `src/features/project/CaptionSettingsStep/PresetCardSkeleton.module.scss` | New - skeleton styles with shimmer animation |
---
## Task 1: Create useVideoMetadata Hook
**Files:**
- Create: `src/features/project/CaptionSettingsStep/useVideoMetadata.ts`
**Context:** This hook fetches video metadata from the API and calculates the aspect ratio. It uses the existing `api` from `@shared/api` which is openapi-react-query.
- [ ] **Step 1: Write the hook implementation**
```typescript
import { useMemo } from "react"
import api from "@shared/api"
interface UseVideoMetadataResult {
aspectRatio: number
isLoading: boolean
isError: boolean
}
const DEFAULT_ASPECT_RATIO = 16 / 9
export function useVideoMetadata(fileId: string | null): UseVideoMetadataResult {
const { data: mediaFile, isLoading, isError } = api.useQuery(
"get",
"/api/media/mediafiles/{media_file_id}/",
{
params: {
path: {
media_file_id: fileId ?? "",
},
},
},
{
enabled: !!fileId,
}
)
const aspectRatio = useMemo(() => {
if (!mediaFile?.width || !mediaFile?.height) {
return DEFAULT_ASPECT_RATIO
}
return mediaFile.width / mediaFile.height
}, [mediaFile])
return {
aspectRatio,
isLoading,
isError,
}
}
```
- [ ] **Step 2: Commit**
```bash
git add src/features/project/CaptionSettingsStep/useVideoMetadata.ts
git commit -m "feat: add useVideoMetadata hook for aspect ratio calculation"
```
---
## Task 2: Create PresetCardSkeleton Component
**Files:**
- Create: `src/features/project/CaptionSettingsStep/PresetCardSkeleton.tsx`
- Create: `src/features/project/CaptionSettingsStep/PresetCardSkeleton.module.scss`
- [ ] **Step 1: Write the SCSS module**
```scss
// PresetCardSkeleton.module.scss
.skeletonCard {
border-radius: 12px;
overflow: hidden;
background: var(--bg-default);
border: 1px solid var(--border-subtle);
display: flex;
flex-direction: column;
}
.skeletonPreview {
aspect-ratio: 16 / 9;
background: var(--bg-surface);
position: relative;
overflow: hidden;
&::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
90deg,
transparent 0%,
rgba(203, 166, 247, 0.08) 50%,
transparent 100%
);
animation: shimmer 1.5s infinite;
}
}
@keyframes shimmer {
0% {
transform: translateX(-100%);
}
100% {
transform: translateX(100%);
}
}
.skeletonFooter {
padding: 14px 16px;
background: linear-gradient(to top, var(--bg-surface), var(--bg-default));
border-top: 1px solid var(--border-subtle);
display: flex;
flex-direction: column;
gap: 10px;
}
.skeletonLine {
height: 14px;
background: var(--bg-hover);
border-radius: 4px;
width: 60%;
position: relative;
overflow: hidden;
&::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
90deg,
transparent 0%,
rgba(203, 166, 247, 0.06) 50%,
transparent 100%
);
animation: shimmer 1.5s infinite;
}
}
.skeletonLineShort {
composes: skeletonLine;
width: 40%;
height: 10px;
}
```
- [ ] **Step 2: Write the component**
```typescript
// PresetCardSkeleton.tsx
import type { FunctionComponent } from "react"
import type { JSX } from "react"
import styles from "./PresetCardSkeleton.module.scss"
interface IPresetCardSkeletonProps {
aspectRatio?: number
}
export const PresetCardSkeleton: FunctionComponent<IPresetCardSkeletonProps> = ({
aspectRatio = 16 / 9,
}): JSX.Element => {
return (
<div className={styles.skeletonCard}>
<div
className={styles.skeletonPreview}
style={{ aspectRatio }}
/>
<div className={styles.skeletonFooter}>
<div className={styles.skeletonLine} />
<div className={styles.skeletonLineShort} />
</div>
</div>
)
}
```
- [ ] **Step 3: Add barrel export**
Add to `src/features/project/CaptionSettingsStep/index.ts`:
```typescript
export { PresetCardSkeleton } from "./PresetCardSkeleton"
```
- [ ] **Step 4: Commit**
```bash
git add src/features/project/CaptionSettingsStep/PresetCardSkeleton.tsx
git add src/features/project/CaptionSettingsStep/PresetCardSkeleton.module.scss
git add src/features/project/CaptionSettingsStep/index.ts
git commit -m "feat: add PresetCardSkeleton component with shimmer animation"
```
---
## Task 3: Update StylePreview for Dynamic Aspect Ratio
**Files:**
- Modify: `src/features/project/CaptionSettingsStep/StylePreview.tsx`
- Modify: `src/features/project/CaptionSettingsStep/StylePreview.module.scss`
- [ ] **Step 1: Read existing StylePreview files**
```bash
cat src/features/project/CaptionSettingsStep/StylePreview.tsx
cat src/features/project/CaptionSettingsStep/StylePreview.module.scss
```
- [ ] **Step 2: Update StylePreview.module.scss**
Add or modify the preview container to accept dynamic aspect-ratio:
```scss
// Add to existing StylePreview.module.scss
.previewContainer {
position: relative;
width: 100%;
overflow: hidden;
background: #0c0a1a;
display: flex;
align-items: center;
justify-content: center;
}
```
- [ ] **Step 3: Update StylePreview.tsx**
Add `aspectRatio` prop and apply it to the container:
```typescript
// Add to existing imports
import type { CSSProperties } from "react"
// Update interface to include aspectRatio
interface IStylePreviewProps {
preset: CaptionPresetRead
aspectRatio?: number
className?: string
}
// In component, apply aspect ratio
export const StylePreview: FunctionComponent<IStylePreviewProps> = ({
preset,
aspectRatio = 9 / 16, // Default to vertical (original behavior)
className,
}): JSX.Element => {
// ... existing logic ...
const containerStyle: CSSProperties = {
aspectRatio: String(aspectRatio),
}
return (
<div
className={cs(styles.previewContainer, className)}
style={containerStyle}
>
{/* ... existing preview content ... */}
</div>
)
}
```
- [ ] **Step 4: Commit**
```bash
git add src/features/project/CaptionSettingsStep/StylePreview.tsx
git add src/features/project/CaptionSettingsStep/StylePreview.module.scss
git commit -m "feat: add aspectRatio prop to StylePreview for dynamic sizing"
```
---
## Task 4: Update PresetCard with New Design
**Files:**
- Modify: `src/features/project/CaptionSettingsStep/PresetCard.tsx`
- Modify: `src/features/project/CaptionSettingsStep/PresetCard.module.scss`
- [ ] **Step 1: Read existing PresetCard files**
```bash
cat src/features/project/CaptionSettingsStep/PresetCard.tsx
cat src/features/project/CaptionSettingsStep/PresetCard.module.scss
```
- [ ] **Step 2: Rewrite PresetCard.module.scss with new design**
```scss
// PresetCard.module.scss
.presetCard {
position: relative;
border-radius: 12px;
overflow: hidden;
background: var(--bg-default);
border: 1px solid var(--border-subtle);
cursor: pointer;
transition: all 0.2s cubic-bezier(0.2, 0.8, 0.2, 1);
display: flex;
flex-direction: column;
&:hover {
border-color: var(--purple-400);
transform: translateY(-2px);
box-shadow: var(--shadow-md);
}
}
.selected {
border-color: var(--purple-400);
box-shadow:
0 0 0 1px var(--purple-400),
0 0 20px rgba(203, 166, 247, 0.25),
var(--shadow-lg);
&::before {
content: "";
position: absolute;
inset: -1px;
border-radius: 12px;
padding: 1px;
background: linear-gradient(135deg, var(--purple-400), var(--purple-600));
-webkit-mask:
linear-gradient(#fff 0 0) content-box,
linear-gradient(#fff 0 0);
-webkit-mask-composite: xor;
mask-composite: exclude;
pointer-events: none;
}
}
.previewArea {
position: relative;
overflow: hidden;
}
.selectedIndicator {
position: absolute;
top: 8px;
right: 8px;
width: 20px;
height: 20px;
background: var(--purple-400);
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
box-shadow: 0 2px 8px rgba(203, 166, 247, 0.4);
svg {
width: 12px;
height: 12px;
color: var(--bg-canvas);
}
}
.cardFooter {
padding: 14px 16px;
background: linear-gradient(to top, var(--bg-surface), var(--bg-default));
border-top: 1px solid var(--border-subtle);
}
.presetName {
font-size: 14px;
font-weight: 600;
color: var(--text-primary);
margin-bottom: 6px;
display: flex;
align-items: center;
gap: 8px;
}
.systemBadge {
font-size: 10px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
padding: 2px 8px;
background: var(--purple-100);
color: var(--purple-400);
border-radius: 4px;
}
.styleChars {
font-size: 12px;
color: var(--text-tertiary);
display: flex;
align-items: center;
gap: 8px;
}
.colorDot {
width: 8px;
height: 8px;
border-radius: 50%;
display: inline-block;
box-shadow: 0 0 0 1px rgba(255, 255, 255, 0.1);
}
.divider {
color: var(--border-default);
}
```
- [ ] **Step 3: Update PresetCard.tsx with style characteristics**
```typescript
// Add helper to extract style characteristics
function getStyleCharacteristics(preset: CaptionPresetRead): {
fontFamily: string
accentColor: string | null
accentName: string | null
} {
const style = preset.style_config
const fontFamily = style?.text?.font_family ?? "Inter"
// Extract accent color from highlight or text color
const highlightColor = style?.highlight?.color
const textColor = style?.text?.color
// Simple color name mapping (expand as needed)
const colorMap: Record<string, string> = {
"#FFD700": "Золотой",
"#00ffff": "Неоновый",
"#ffffff": "Белый",
"#ff006e": "Розовый",
"#cba6f7": "Пурпурный",
"#f38ba8": "Розовый",
"#a6e3a1": "Зеленый",
"#f9e2af": "Желтый",
"#89dceb": "Голубой",
}
const accentColor = highlightColor || textColor
const accentName = accentColor ? (colorMap[accentColor] ?? null) : null
return {
fontFamily,
accentColor,
accentName,
}
}
// Update component to render characteristics
export const PresetCard: FunctionComponent<IPresetCardProps> = ({
preset,
isSelected,
aspectRatio,
onSelect,
onEdit,
onDelete,
}): JSX.Element => {
const { fontFamily, accentColor, accentName } = getStyleCharacteristics(preset)
return (
<div
className={cs(styles.presetCard, { [styles.selected]: isSelected })}
onClick={onSelect}
>
<div className={styles.previewArea}>
<StylePreview preset={preset} aspectRatio={aspectRatio} />
{isSelected && (
<div className={styles.selectedIndicator}>
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="3">
<polyline points="20 6 9 17 4 12" />
</svg>
</div>
)}
</div>
<div className={styles.cardFooter}>
<div className={styles.presetName}>
{preset.name}
{preset.is_system && <span className={styles.systemBadge}>Системный</span>}
</div>
<div className={styles.styleChars}>
{fontFamily}
{accentColor && accentName && (
<>
<span className={styles.divider}>·</span>
<span
className={styles.colorDot}
style={{ background: accentColor }}
/>
<span style={{ color: accentColor }}>{accentName}</span>
</>
)}
</div>
</div>
{/* Context menu for edit/delete - preserve existing */}
</div>
)
}
```
- [ ] **Step 4: Commit**
```bash
git add src/features/project/CaptionSettingsStep/PresetCard.tsx
git add src/features/project/CaptionSettingsStep/PresetCard.module.scss
git commit -m "feat: redesign PresetCard with style characteristics and checkmark indicator"
```
---
## Task 5: Update PresetGrid with Aspect Ratio and Loading State
**Files:**
- Modify: `src/features/project/CaptionSettingsStep/PresetGrid.tsx`
- Modify: `src/features/project/CaptionSettingsStep/PresetGrid.module.scss`
- [ ] **Step 1: Read existing PresetGrid files**
```bash
cat src/features/project/CaptionSettingsStep/PresetGrid.tsx
cat src/features/project/CaptionSettingsStep/PresetGrid.module.scss
```
- [ ] **Step 2: Update PresetGrid.module.scss**
```scss
// PresetGrid.module.scss
.presetGrid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
gap: 20px;
@media (max-width: 768px) {
grid-template-columns: repeat(2, 1fr);
gap: 12px;
}
}
// Optional: Add fade-in animation for cards
@keyframes fadeInUp {
from {
opacity: 0;
transform: translateY(10px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
.cardWrapper {
animation: fadeInUp 0.3s ease forwards;
// Staggered animation delay
@for $i from 1 through 10 {
&:nth-child(#{$i}) {
animation-delay: #{$i * 50}ms;
}
}
}
```
- [ ] **Step 3: Update PresetGrid.tsx**
```typescript
// Add imports
import { useVideoMetadata } from "./useVideoMetadata"
import { PresetCardSkeleton } from "./PresetCardSkeleton"
import { useWizard } from "../WizardContext"
// In component
export const PresetGrid: FunctionComponent<IPresetGridProps> = ({
presets,
selectedPresetId,
onSelect,
onEdit,
onDelete,
onCreate,
}): JSX.Element => {
const { primaryFileId } = useWizard()
const { aspectRatio, isLoading: isLoadingMetadata } = useVideoMetadata(primaryFileId)
if (isLoadingMetadata) {
return (
<div className={styles.presetGrid}>
{Array.from({ length: 4 }).map((_, i) => (
<PresetCardSkeleton key={i} aspectRatio={aspectRatio} />
))}
</div>
)
}
return (
<div className={styles.presetGrid}>
{presets.map((preset, index) => (
<div
key={preset.id}
className={styles.cardWrapper}
style={{ animationDelay: `${index * 50}ms` }}
>
<PresetCard
preset={preset}
isSelected={preset.id === selectedPresetId}
aspectRatio={aspectRatio}
onSelect={() => onSelect(preset.id)}
onEdit={() => onEdit(preset.id)}
onDelete={() => onDelete(preset.id)}
/>
</div>
))}
{/* Create new card - preserve existing */}
</div>
)
}
```
- [ ] **Step 4: Commit**
```bash
git add src/features/project/CaptionSettingsStep/PresetGrid.tsx
git add src/features/project/CaptionSettingsStep/PresetGrid.module.scss
git commit -m "feat: add aspect ratio and loading state to PresetGrid"
```
---
## Task 6: Type Check and Verify
- [ ] **Step 1: Run type check**
```bash
cd cofee_frontend && bunx tsc --noEmit
```
Expected: No TypeScript errors
- [ ] **Step 2: Run lint check**
```bash
cd cofee_frontend && bun run lint 2>/dev/null || echo "Lint not configured, using type check only"
```
- [ ] **Step 3: Final commit**
```bash
git add .
git commit -m "feat: complete subtitle preset grid redesign with dynamic aspect ratio"
```
---
## Verification Checklist
- [ ] Preset cards display with correct aspect ratio based on uploaded video
- [ ] Loading state shows skeleton cards with shimmer animation
- [ ] Style characteristics (font, color) visible on card footers
- [ ] Selected card shows checkmark indicator and purple glow border
- [ ] Grid is responsive (2 columns on mobile, more on desktop)
- [ ] Hover effects work smoothly
- [ ] Falls back to 16:9 when no video is available
- [ ] All existing functionality preserved (select, edit, delete, create)
@@ -0,0 +1,33 @@
# Codex Team Policy Fixes
Date: 2026-04-05
## Scope
- Fix the `.codex/memories` path convention so the shared rule and per-agent instructions use the same agent IDs.
- Tighten the team-first wording so non-trivial repo work consults the team by default.
- Remove role skill assignments that depend on unavailable review infrastructure.
## Approved Approach
Use a minimal patch:
- Standardize memory paths on the actual Codex agent IDs, which use underscores.
- Change the consultation policy from "before deep analysis" to "before any non-trivial repo task", while keeping a narrow exception for purely mechanical actions and explicit user opt-outs.
- Replace non-executable `requesting-code-review` entries with executable installed skills only.
## Intended Changes
### Memory paths
- Update `.codex/agent-team.md` to state that memories live under `.codex/memories/<agent_id>/`.
- Update every `.codex/agents/*.toml` file to reference underscore-based memory directories matching the agent names.
- Update `.codex/memories/README.md` examples to use `<agent_id>` wording.
### Team-first policy
- Update `AGENTS.md` and `.codex/agent-team.md` to require team consultation before any non-trivial repo task.
- Keep a narrow local-only exception for purely mechanical actions that cannot materially change behavior, architecture, or risk.
### Skill map
- Remove `requesting-code-review` from roles because the required `superpowers:code-reviewer` subagent is not available in this workspace.
- Keep the map limited to executable skills already installed in the current environment.
## Success Criteria
- Shared policy and per-agent instructions point to the same memory paths.
- The root guidance no longer leaves "deep analysis" as the main threshold for consulting the team.
- The skill map contains only practically usable role assignments for this environment.
@@ -0,0 +1,625 @@
<!DOCTYPE html>
<html lang="ru">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Subtitle Preset Grid Redesign Demo</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=Lobster&display=swap" rel="stylesheet">
<style>
/* Catppuccin Mocha - Matching the actual project colors */
:root {
/* Backgrounds */
--bg-canvas: #11111b;
--bg-default: #1e1e2e;
--bg-surface: #313244;
--bg-hover: #45475a;
--bg-default-invert: #eff1f5;
/* Text */
--text-primary: #cdd6f4;
--text-secondary: #bac2de;
--text-tertiary: #9399b2;
/* Borders */
--border-default: #45475a;
--border-subtle: #313244;
/* Purples (accent) */
--purple-400: #cba6f7;
--purple-500: #d9bcfa;
--purple-600: #e4cffc;
--purple-700: #eddfff;
--purple-300: #6a5a93;
--purple-200: #4b4168;
--purple-100: #362f4c;
--purple-50: #2b253b;
/* Shadows */
--shadow-sm: 0 1px 2px rgba(17, 17, 27, 0.5);
--shadow-md: 0 4px 6px -1px rgba(17, 17, 27, 0.58), 0 24px 48px -12px rgba(17, 17, 27, 0.52);
--shadow-lg: 0 10px 15px -3px rgba(17, 17, 27, 0.6), 0 40px 80px -20px rgba(17, 17, 27, 0.7);
/* Accent glow */
--accent-shadow: rgba(203, 166, 247, 0.22);
--accent-shadow-hover: rgba(203, 166, 247, 0.3);
/* Preview background */
--preview-bg: #0c0a1a;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Inter', sans-serif;
background: var(--bg-canvas);
color: var(--text-primary);
padding: 40px 20px;
min-height: 100vh;
}
body::before {
content: "";
position: fixed;
inset: 0;
background: radial-gradient(circle at 50% 0%, rgba(203, 166, 247, 0.08) 0%, transparent 55%);
pointer-events: none;
z-index: -1;
}
.container {
max-width: 1400px;
margin: 0 auto;
}
h1 {
font-size: 24px;
font-weight: 600;
margin-bottom: 32px;
color: var(--text-primary);
}
/* Controls */
.controls {
display: flex;
gap: 16px;
margin-bottom: 32px;
flex-wrap: wrap;
}
.control-group {
display: flex;
flex-direction: column;
gap: 8px;
}
.control-group label {
font-size: 11px;
color: var(--text-tertiary);
text-transform: uppercase;
letter-spacing: 0.8px;
font-weight: 500;
}
.aspect-buttons {
display: flex;
gap: 8px;
}
.aspect-btn {
padding: 8px 14px;
background: var(--bg-surface);
border: 1px solid var(--border-default);
color: var(--text-secondary);
border-radius: 8px;
cursor: pointer;
font-size: 13px;
font-weight: 500;
transition: all 0.15s ease;
}
.aspect-btn:hover {
background: var(--bg-hover);
border-color: var(--purple-400);
color: var(--text-primary);
}
.aspect-btn.active {
background: var(--purple-100);
border-color: var(--purple-400);
color: var(--purple-400);
}
/* Grid */
.preset-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
gap: 20px;
margin-bottom: 48px;
}
/* Preset Card */
.preset-card {
position: relative;
border-radius: 12px;
overflow: hidden;
background: var(--bg-default);
border: 1px solid var(--border-subtle);
cursor: pointer;
transition: all 0.2s cubic-bezier(0.2, 0.8, 0.2, 1);
display: flex;
flex-direction: column;
}
.preset-card:hover {
border-color: var(--purple-400);
transform: translateY(-2px);
box-shadow: var(--shadow-md);
}
.preset-card.selected {
border-color: var(--purple-400);
box-shadow:
0 0 0 1px var(--purple-400),
0 0 20px rgba(203, 166, 247, 0.25),
var(--shadow-lg);
}
.preset-card.selected::before {
content: "";
position: absolute;
inset: -1px;
border-radius: 12px;
padding: 1px;
background: linear-gradient(135deg, var(--purple-400), var(--purple-600));
-webkit-mask:
linear-gradient(#fff 0 0) content-box,
linear-gradient(#fff 0 0);
-webkit-mask-composite: xor;
mask-composite: exclude;
pointer-events: none;
}
/* Preview Area */
.preview-area {
position: relative;
aspect-ratio: 16 / 9;
background: var(--preview-bg);
overflow: hidden;
transition: aspect-ratio 0.3s ease;
display: flex;
align-items: center;
justify-content: center;
}
.preview-area.vertical {
aspect-ratio: 9 / 16;
}
.preview-area.square {
aspect-ratio: 1 / 1;
}
.preview-area.instagram {
aspect-ratio: 4 / 5;
}
.preview-text {
text-align: center;
padding: 16px;
font-size: 28px;
line-height: 1.4;
z-index: 1;
}
.preview-text .highlight {
font-weight: 700;
}
/* Card Footer */
.card-footer {
padding: 14px 16px;
background: linear-gradient(to top, var(--bg-surface), var(--bg-default));
border-top: 1px solid var(--border-subtle);
}
.preset-name {
font-size: 14px;
font-weight: 600;
color: var(--text-primary);
margin-bottom: 6px;
display: flex;
align-items: center;
gap: 8px;
}
.system-badge {
font-size: 10px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
padding: 2px 8px;
background: var(--purple-100);
color: var(--purple-400);
border-radius: 4px;
}
.style-chars {
font-size: 12px;
color: var(--text-tertiary);
display: flex;
align-items: center;
gap: 8px;
}
.color-dot {
width: 8px;
height: 8px;
border-radius: 50%;
display: inline-block;
box-shadow: 0 0 0 1px rgba(255, 255, 255, 0.1);
}
/* Create New Card */
.create-card {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
gap: 12px;
aspect-ratio: 16 / 9;
background: transparent;
border: 2px dashed var(--border-default);
border-radius: 12px;
cursor: pointer;
transition: all 0.2s ease;
color: var(--text-tertiary);
min-height: 100%;
}
.create-card:hover {
border-color: var(--purple-400);
background: rgba(203, 166, 247, 0.05);
color: var(--text-secondary);
}
.create-card svg {
width: 32px;
height: 32px;
opacity: 0.6;
}
/* Skeleton Loading - Improved */
.skeleton-card {
border-radius: 12px;
overflow: hidden;
background: var(--bg-default);
border: 1px solid var(--border-subtle);
display: flex;
flex-direction: column;
}
.skeleton-preview {
aspect-ratio: 16 / 9;
background: var(--bg-surface);
position: relative;
overflow: hidden;
}
.skeleton-preview::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
90deg,
transparent 0%,
rgba(203, 166, 247, 0.08) 50%,
transparent 100%
);
animation: shimmer 1.5s infinite;
}
@keyframes shimmer {
0% { transform: translateX(-100%); }
100% { transform: translateX(100%); }
}
.skeleton-footer {
padding: 14px 16px;
background: linear-gradient(to top, var(--bg-surface), var(--bg-default));
border-top: 1px solid var(--border-subtle);
display: flex;
flex-direction: column;
gap: 10px;
}
.skeleton-line {
height: 14px;
background: var(--bg-hover);
border-radius: 4px;
width: 60%;
position: relative;
overflow: hidden;
}
.skeleton-line.short {
width: 40%;
height: 10px;
}
.skeleton-line::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
90deg,
transparent 0%,
rgba(203, 166, 247, 0.06) 50%,
transparent 100%
);
animation: shimmer 1.5s infinite;
}
/* Section Label */
.section-label {
font-size: 11px;
color: var(--text-tertiary);
text-transform: uppercase;
letter-spacing: 0.8px;
margin-bottom: 12px;
font-weight: 500;
}
/* Selected indicator */
.selected-indicator {
position: absolute;
top: 8px;
right: 8px;
width: 20px;
height: 20px;
background: var(--purple-400);
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
box-shadow: 0 2px 8px rgba(203, 166, 247, 0.4);
}
.selected-indicator svg {
width: 12px;
height: 12px;
color: var(--bg-canvas);
}
/* Responsive */
@media (max-width: 768px) {
.preset-grid {
grid-template-columns: repeat(2, 1fr);
gap: 12px;
}
}
</style>
</head>
<body>
<div class="container">
<h1>Выбор пресета субтитров</h1>
<!-- Controls -->
<div class="controls">
<div class="control-group">
<label>Аспектное соотношение видео</label>
<div class="aspect-buttons">
<button class="aspect-btn active" data-ratio="16:9">16:9 (Широкое)</button>
<button class="aspect-btn" data-ratio="9:16">9:16 (Вертикальное)</button>
<button class="aspect-btn" data-ratio="1:1">1:1 (Квадрат)</button>
<button class="aspect-btn" data-ratio="4:5">4:5 (Instagram)</button>
</div>
</div>
</div>
<!-- Section: Ready Presets -->
<div class="section-label">Пример: готовые пресеты</div>
<div class="preset-grid" id="presetGrid">
<!-- Card 1: Классические -->
<div class="preset-card selected" data-preset="classic">
<div class="preview-area">
<div class="preview-text" style="font-family: 'Lobster', cursive; color: #ffffff;">
Пример <span class="highlight" style="color: #FFD700;">субтитров</span>
</div>
<div class="selected-indicator">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3">
<polyline points="20 6 9 17 4 12"></polyline>
</svg>
</div>
</div>
<div class="card-footer">
<div class="preset-name">
Классические
<span class="system-badge">Системный</span>
</div>
<div class="style-chars">
Lobster
<span style="color: var(--border-default);">·</span>
<span class="color-dot" style="background: #FFD700;"></span>
<span style="color: #FFD700;">Золотой</span>
</div>
</div>
</div>
<!-- Card 2: Неон -->
<div class="preset-card" data-preset="neon">
<div class="preview-area">
<div class="preview-text" style="font-family: 'Inter', sans-serif; font-weight: 700; color: #ffffff; text-shadow: 0 0 10px #00ffff, 0 0 20px #00ffff;">
Пример <span class="highlight" style="color: #00ffff;">субтитров</span>
</div>
</div>
<div class="card-footer">
<div class="preset-name">
Неон
<span class="system-badge">Системный</span>
</div>
<div class="style-chars">
Inter Bold
<span style="color: var(--border-default);">·</span>
<span class="color-dot" style="background: #00ffff; box-shadow: 0 0 6px #00ffff;"></span>
<span style="color: #00ffff;">Неоновый</span>
</div>
</div>
</div>
<!-- Card 3: Минимализм -->
<div class="preset-card" data-preset="minimal">
<div class="preview-area">
<div class="preview-text" style="font-family: 'Inter', sans-serif; font-weight: 400; color: #e0e0e0; font-size: 24px;">
Пример <span class="highlight" style="color: #ffffff; font-weight: 500;">субтитров</span>
</div>
</div>
<div class="card-footer">
<div class="preset-name">
Минимализм
<span class="system-badge">Системный</span>
</div>
<div class="style-chars">
Inter Regular
<span style="color: var(--border-default);">·</span>
<span class="color-dot" style="background: #ffffff;"></span>
<span style="color: #ffffff;">Белый</span>
</div>
</div>
</div>
<!-- Card 4: Жирный -->
<div class="preset-card" data-preset="bold">
<div class="preview-area">
<div class="preview-text" style="font-family: 'Inter', sans-serif; font-weight: 900; color: #ffffff; -webkit-text-stroke: 2px #000000; font-size: 32px;">
Пример <span class="highlight" style="color: #ff006e;">субтитров</span>
</div>
</div>
<div class="card-footer">
<div class="preset-name">
Жирный
<span class="system-badge">Системный</span>
</div>
<div class="style-chars">
Inter Black
<span style="color: var(--border-default);">·</span>
<span class="color-dot" style="background: #ff006e;"></span>
<span style="color: #ff006e;">Розовый</span>
</div>
</div>
</div>
<!-- Create New Card -->
<div class="create-card">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<line x1="12" y1="5" x2="12" y2="19"></line>
<line x1="5" y1="12" x2="19" y2="12"></line>
</svg>
<span style="font-size: 14px; font-weight: 500;">Создать пресет</span>
</div>
</div>
<!-- Section: Loading State -->
<div class="section-label">Пример: состояние загрузки</div>
<div class="preset-grid">
<div class="skeleton-card">
<div class="skeleton-preview"></div>
<div class="skeleton-footer">
<div class="skeleton-line"></div>
<div class="skeleton-line short"></div>
</div>
</div>
<div class="skeleton-card">
<div class="skeleton-preview"></div>
<div class="skeleton-footer">
<div class="skeleton-line"></div>
<div class="skeleton-line short"></div>
</div>
</div>
<div class="skeleton-card">
<div class="skeleton-preview"></div>
<div class="skeleton-footer">
<div class="skeleton-line"></div>
<div class="skeleton-line short"></div>
</div>
</div>
<div class="skeleton-card">
<div class="skeleton-preview"></div>
<div class="skeleton-footer">
<div class="skeleton-line"></div>
<div class="skeleton-line short"></div>
</div>
</div>
</div>
</div>
<script>
// Aspect ratio switching
const aspectButtons = document.querySelectorAll('.aspect-btn');
const previewAreas = document.querySelectorAll('.preview-area');
const skeletonPreviews = document.querySelectorAll('.skeleton-preview');
const createCard = document.querySelector('.create-card');
aspectButtons.forEach(btn => {
btn.addEventListener('click', () => {
// Update active button
aspectButtons.forEach(b => b.classList.remove('active'));
btn.classList.add('active');
const ratio = btn.dataset.ratio;
// Update preview areas
previewAreas.forEach(preview => {
preview.classList.remove('vertical', 'square', 'instagram');
if (ratio === '9:16') preview.classList.add('vertical');
if (ratio === '1:1') preview.classList.add('square');
if (ratio === '4:5') preview.classList.add('instagram');
});
// Update skeleton previews
skeletonPreviews.forEach(preview => {
if (ratio === '9:16') preview.style.aspectRatio = '9 / 16';
else if (ratio === '1:1') preview.style.aspectRatio = '1 / 1';
else if (ratio === '4:5') preview.style.aspectRatio = '4 / 5';
else preview.style.aspectRatio = '16 / 9';
});
// Update create card
if (ratio === '9:16') createCard.style.aspectRatio = '9 / 16';
else if (ratio === '1:1') createCard.style.aspectRatio = '1 / 1';
else if (ratio === '4:5') createCard.style.aspectRatio = '4 / 5';
else createCard.style.aspectRatio = '16 / 9';
});
});
// Card selection
const presetCards = document.querySelectorAll('.preset-card');
presetCards.forEach(card => {
card.addEventListener('click', () => {
presetCards.forEach(c => {
c.classList.remove('selected');
const indicator = c.querySelector('.selected-indicator');
if (indicator) indicator.remove();
});
card.classList.add('selected');
// Add checkmark indicator
const preview = card.querySelector('.preview-area');
if (!preview.querySelector('.selected-indicator')) {
const indicator = document.createElement('div');
indicator.className = 'selected-indicator';
indicator.innerHTML = `
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3">
<polyline points="20 6 9 17 4 12"></polyline>
</svg>
`;
preview.appendChild(indicator);
}
});
});
</script>
</body>
</html>
@@ -0,0 +1,212 @@
# Subtitle Preset Grid Redesign - Design Document
**Date:** 2026-04-06
**Scope:** Redesign preset preview cards in Caption Settings step to match uploaded video aspect ratio with modern visual refresh
---
## Overview
Redesign the subtitle preset selection grid to:
1. Display preset previews with the **same aspect ratio as the uploaded video**
2. Apply a **modern visual refresh** consistent with the app's design language
3. Show **style characteristics** (font, colors) as subtle hints
4. Maintain **responsive layout** across screen sizes
---
## Core Functionality
### Dynamic Aspect Ratio
**Data Flow:**
1. Fetch video metadata via `GET /api/media/mediafiles/{media_file_id}/` using `primaryFileId` from WizardContext
2. Extract `width` and `height` from `MediaFileRead` response
3. Calculate aspect ratio: `width / height`
4. Apply as CSS `aspect-ratio` to preset cards via inline style or CSS variable
5. Handle loading state while fetching metadata
6. Fallback to 16:9 if no video is uploaded or API error occurs
**Implementation Notes:**
- Store aspect ratio in WizardContext alongside other video metadata
- Update ratio when `primaryFileId` changes
- Cards use container queries for responsive sizing
---
## Visual Design (5 Pillars Applied)
### 1. Typography with Character
- Keep existing font system (consistent with app)
- Style name: `font-weight: 500`, `font-size: 14px`
- Characteristic labels: `font-size: 12px`, muted color (`--gray-10`)
### 2. Committed Color & Theme
- Uses **Catppuccin Mocha** palette matching the project:
- Canvas: `--bg-canvas: #11111b`
- Cards: `--bg-default: #1e1e2e`
- Surfaces: `--bg-surface: #313244`
- Borders: `--border-default: #45475a`, `--border-subtle: #313244`
- Text: `--text-primary: #cdd6f4`, `--text-secondary: #bac2de`, `--text-tertiary: #9399b2`
- Selected state: purple accent (`--purple-400: #cba6f7`) with glow shadow
- Card hover: border transitions to purple accent
- System badge: purple-100 background with purple-400 text
- Checkmark indicator on selected card (top-right corner)
### 3. Purposeful Motion
- Cards fade in with staggered animation (50ms delay per card)
- Smooth border-color transition on hover (150ms ease)
- Selection change: immediate border color change
- Loading skeleton: shimmer animation
### 4. Brave Spatial Composition
- CSS Grid with `auto-fill` and `minmax(200px, 1fr)`
- Consistent 16px gap between cards
- Cards maintain video aspect ratio without stretching
- Responsive: more columns on wide screens, fewer on narrow
### 5. Atmosphere & Depth
- Card background: subtle gradient overlay for depth
- Selected card: elevated with `box-shadow` + accent glow
- Dark preview background (`#0c0a1a`) preserved from existing StylePreview
- Rounded corners: `border-radius: 12px`
---
## Component Structure
### PresetCard
```
┌──────────────────────────────────────┐
│ │
│ [StylePreview Component] │ ← Dynamic aspect-ratio
│ "Пример субтитров" │ based on video
│ │
├──────────────────────────────────────┤
│ Style Name [Системный] │ ← Footer
│ Lobster · Yellow accent │ ← Characteristics (subtle)
└──────────────────────────────────────┘
```
**Props:**
- `preset: CaptionPresetRead`
- `isSelected: boolean`
- `aspectRatio: number` (width/height, e.g., 1.777 for 16:9)
- `onSelect: () => void`
- `onEdit: () => void`
- `onDelete: () => void`
### StylePreview Updates
**New Props:**
- `aspectRatio?: number` - overrides default 9/16
**Behavior:**
- Uses passed `aspectRatio` for container sizing
- Falls back to 9/16 if not provided
- Maintains all existing text styling logic
### PresetGrid Updates
**New Behavior:**
- Fetches video metadata via `useVideoMetadata()` hook
- Passes `aspectRatio` to all PresetCard children
- Shows skeleton loading state while fetching
- Responsive grid layout
---
## Style Characteristics Display
Each card footer shows:
- **Font family** (e.g., "Lobster", "Inter") - extracted from `preset.style_config.text.font_family`
- **Accent color** - small color dot + name if distinct from default
- Hidden on cards narrower than 180px (responsive)
**Format:**
```
{font_family} · {accent_color_name}
```
Example: `Lobster · Желтый` or `Inter · Неоновый`
---
## Loading State
**Skeleton Card:**
- Same aspect ratio as target (default 16:9 while loading)
- Shimmer animation on preview area
- Gray placeholder for text
- 4-6 skeleton cards shown while loading
---
## Responsive Behavior
| Screen Width | Grid Columns | Card Min Width |
|--------------|--------------|----------------|
| < 480px | 2 | 140px |
| 480-768px | 3 | 160px |
| 768-1200px | 4 | 180px |
| > 1200px | 5-6 | 200px |
---
## API Integration
### New Hook: `useVideoMetadata`
```typescript
function useVideoMetadata(fileId: string | null) {
return api.useQuery(
"get",
"/api/media/mediafiles/{media_file_id}/",
{ params: { path: { media_file_id: fileId ?? "" } } },
{ enabled: !!fileId }
)
}
```
### Aspect Ratio Calculation
```typescript
const aspectRatio = useMemo(() => {
if (!mediaFile?.width || !mediaFile?.height) return 16 / 9
return mediaFile.width / mediaFile.height
}, [mediaFile])
```
---
## Edge Cases
1. **No video uploaded:** Fall back to 16:9 aspect ratio
2. **Video metadata unavailable:** Show error toast, fall back to 16:9
3. **Very wide video (>21:9):** Cap max card width to prevent overflow
4. **Very tall video (9:16+):** Limit max height, allow scrolling if needed
5. **No presets:** Show empty state with "Создать пресет" card only
---
## Files Modified
1. `src/features/project/CaptionSettingsStep/PresetGrid.tsx` - Grid logic, aspect ratio distribution
2. `src/features/project/CaptionSettingsStep/PresetGrid.module.scss` - Grid styles, responsive layout
3. `src/features/project/CaptionSettingsStep/StylePreview.tsx` - Accept aspect ratio prop
4. `src/features/project/CaptionSettingsStep/StylePreview.module.scss` - Dynamic sizing
5. `src/features/project/CaptionSettingsStep/useVideoMetadata.ts` - New hook (or inline in PresetGrid)
---
## Acceptance Criteria
- [ ] Preset cards display with uploaded video's aspect ratio
- [ ] Grid is responsive and works on mobile/desktop
- [ ] Loading state shows skeleton cards
- [ ] Style characteristics (font, color) visible on cards
- [ ] Selected state clearly visible with accent border
- [ ] Hover effects smooth and purposeful
- [ ] Fallback to 16:9 when no video available
- [ ] All existing functionality preserved (select, edit, delete, create)