# SaluteSpeech Transcription Engine — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add SaluteSpeech (Sber) as a third transcription engine with async REST API, domain-specific models, and word-level timestamps. **Architecture:** Direct integration following existing engine pattern — plain functions in `transcription/service.py`, `if/elif` dispatch in Dramatiq actor, no new abstractions. SaluteSpeech uses a 4-step REST flow (auth → upload → create task → poll → download) with a thread-safe OAuth token cache. **Tech Stack:** Python, httpx (sync), FastAPI, Dramatiq, React/TypeScript **Spec:** `docs/superpowers/specs/2026-04-03-salutespeech-transcription-design.md` --- ## File Map | File | Action | Responsibility | |------|--------|----------------| | `cofee_backend/.certs/russian_trusted_root_ca.pem` | Create | Russian CA certificate for TLS | | `cofee_backend/cpv3/infrastructure/settings.py` | Modify | 3 new SaluteSpeech settings fields | | `cofee_backend/cpv3/modules/transcription/schemas.py` | Modify | New schema types, extend engine enum + type unions | | `cofee_backend/cpv3/modules/transcription/service.py` | Modify | ~8 new functions for SaluteSpeech flow | | `cofee_backend/cpv3/modules/transcription/router.py` | Modify | Direct `/salute-speech/` endpoint | | `cofee_backend/cpv3/modules/tasks/schemas.py` | Modify | Extend engine Literal | | `cofee_backend/cpv3/modules/tasks/service.py` | Modify | ENGINE_MAP + elif dispatch branch | | `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx` | Modify | Engine option, split model options, engine change effect | | `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx` | Modify | Same as TranscriptionModal | | `cofee_backend/tests/integration/test_salutespeech_parsing.py` | Create | Unit tests for timestamp parsing + result conversion | --- ### Task 1: Bundle TLS Certificate **Files:** - Create: `cofee_backend/.certs/russian_trusted_root_ca.pem` - [ ] **Step 1: Create `.certs` directory** ```bash mkdir -p cofee_backend/.certs ``` - [ ] **Step 2: Download the Russian root CA certificate** ```bash curl -k "https://gu-st.ru/content/Other/doc/russian_trusted_root_ca.cer" \ -o cofee_backend/.certs/russian_trusted_root_ca.pem ``` - [ ] **Step 3: Verify the cert is valid PEM format** ```bash openssl x509 -in cofee_backend/.certs/russian_trusted_root_ca.pem -noout -subject -dates ``` Expected: prints subject (Russian CA) and validity dates without errors. If the file is DER format instead of PEM, convert: ```bash openssl x509 -inform DER -in cofee_backend/.certs/russian_trusted_root_ca.pem \ -out cofee_backend/.certs/russian_trusted_root_ca.pem -outform PEM ``` - [ ] **Step 4: Add to `.gitignore` exclusion** The `.certs/` directory should NOT be gitignored — this is a public root CA, safe to commit. Verify it's not caught by any existing gitignore pattern: ```bash cd cofee_backend && git check-ignore .certs/russian_trusted_root_ca.pem ``` Expected: no output (not ignored). - [ ] **Step 5: Commit** ```bash git add cofee_backend/.certs/russian_trusted_root_ca.pem git commit -m "chore(backend): bundle Russian root CA cert for SaluteSpeech TLS" ``` --- ### Task 2: Add SaluteSpeech Settings **Files:** - Modify: `cofee_backend/cpv3/infrastructure/settings.py:97` (after `webhook_base_url` field) - [ ] **Step 1: Add 3 new fields to Settings class** In `cofee_backend/cpv3/infrastructure/settings.py`, after the `webhook_base_url` field (line 97) and before `def get_database_url(self)` (line 99), add: ```python # SaluteSpeech salute_auth_key: str = Field(default="", alias="SALUTE_AUTH_KEY") salute_ca_cert_path: Path | None = Field( default=None, alias="SALUTE_CA_CERT_PATH" ) salute_scope: str = Field( default="SALUTE_SPEECH_PERS", alias="SALUTE_SCOPE" ) ``` - [ ] **Step 2: Verify settings load without errors** ```bash cd cofee_backend && uv run python -c "from cpv3.infrastructure.settings import get_settings; s = get_settings(); print(f'salute_auth_key={s.salute_auth_key!r}, salute_ca_cert_path={s.salute_ca_cert_path!r}, salute_scope={s.salute_scope!r}')" ``` Expected: `salute_auth_key='', salute_ca_cert_path=None, salute_scope='SALUTE_SPEECH_PERS'` - [ ] **Step 3: Commit** ```bash git add cofee_backend/cpv3/infrastructure/settings.py git commit -m "feat(backend): add SaluteSpeech settings (auth key, cert path, scope)" ``` --- ### Task 3: Add SaluteSpeech Schemas **Files:** - Modify: `cofee_backend/cpv3/modules/transcription/schemas.py:10` (engine enum) and after line 147 (EOF, new classes) - [ ] **Step 1: Extend `TranscriptionEngineEnum`** In `cofee_backend/cpv3/modules/transcription/schemas.py`, line 10, change: ```python TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD"] ``` to: ```python TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD", "SALUTE_SPEECH"] ``` - [ ] **Step 2: Add SaluteSpeech schema classes** After the `GoogleSpeechParams` class (line 147, end of file), add: ```python # ---------------------------------- SaluteSpeech Models ---------------------------------- class SaluteSpeechWord(Schema): word: str start: float end: float class SaluteSpeechSegment(Schema): text: str start: float end: float words: list[SaluteSpeechWord] = [] class SaluteSpeechResult(Schema): text: str segments: list[SaluteSpeechSegment] language: str class SaluteSpeechParams(Schema): file_path: str language: str | None = None model: str = "general" ``` - [ ] **Step 3: Verify schemas import correctly** ```bash cd cofee_backend && uv run python -c "from cpv3.modules.transcription.schemas import SaluteSpeechWord, SaluteSpeechSegment, SaluteSpeechResult, SaluteSpeechParams, TranscriptionEngineEnum; print('OK')" ``` Expected: `OK` - [ ] **Step 4: Commit** ```bash git add cofee_backend/cpv3/modules/transcription/schemas.py git commit -m "feat(backend): add SaluteSpeech schema types and extend engine enum" ``` --- ### Task 4: Extend Type Unions in Service **Files:** - Modify: `cofee_backend/cpv3/modules/transcription/service.py:44` and `service.py:222` (type unions) - Modify: `cofee_backend/cpv3/modules/transcription/service.py` imports (top of file) - [ ] **Step 1: Add SaluteSpeech imports** In `cofee_backend/cpv3/modules/transcription/service.py`, in the imports from `transcription.schemas` (around line 229–243), add `SaluteSpeechSegment` to the import list: ```python from cpv3.modules.transcription.schemas import ( Document, GoogleSpeechResult, GoogleSpeechSegment, GoogleSpeechWord, LineNode, SaluteSpeechSegment, SegmentNode, Tag, TimeRange, WhisperResult, WhisperSegment, WhisperWord, WordNode, WordOptions, ) ``` - [ ] **Step 2: Extend `compute_segment_lines` type hint** At line 44, change: ```python def compute_segment_lines( self, segment: WhisperSegment | GoogleSpeechSegment, max_chars_per_line: int ) -> list[LineNode]: ``` to: ```python def compute_segment_lines( self, segment: WhisperSegment | GoogleSpeechSegment | SaluteSpeechSegment, max_chars_per_line: int, ) -> list[LineNode]: ``` - [ ] **Step 3: Extend `_make_document_from_segments` type hint** At line 222, change: ```python def _make_document_from_segments( builder: DocumentBuilder, segments: list[WhisperSegment] | list[GoogleSpeechSegment], *, max_line_width: int, ) -> Document: ``` to: ```python def _make_document_from_segments( builder: DocumentBuilder, segments: list[WhisperSegment] | list[GoogleSpeechSegment] | list[SaluteSpeechSegment], *, max_line_width: int, ) -> Document: ``` - [ ] **Step 4: Run lint to verify** ```bash cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py ``` Expected: no errors. - [ ] **Step 5: Commit** ```bash git add cofee_backend/cpv3/modules/transcription/service.py git commit -m "feat(backend): extend type unions to accept SaluteSpeechSegment" ``` --- ### Task 5: Write Tests for SaluteSpeech Parsing **Files:** - Create: `cofee_backend/tests/integration/test_salutespeech_parsing.py` - [ ] **Step 1: Write the test file** Create `cofee_backend/tests/integration/test_salutespeech_parsing.py`: ```python """Tests for SaluteSpeech result parsing and document building.""" from cpv3.modules.transcription.service import ( _build_document_from_salute_result, _parse_salute_time, ) class TestParseSaluteTime: def test_simple_timestamp(self): assert _parse_salute_time("0.480s") == 0.48 def test_zero(self): assert _parse_salute_time("0.000s") == 0.0 def test_large_timestamp(self): assert _parse_salute_time("123.456s") == 123.456 def test_integer_timestamp(self): assert _parse_salute_time("5s") == 5.0 class TestBuildDocumentFromSaluteResult: def _make_raw_result(self): """Minimal SaluteSpeech API response for testing.""" return [ { "results": [ { "text": "привет мир", "normalized_text": "Привет мир.", "start": "0.480s", "end": "1.200s", "word_alignments": [ {"word": "привет", "start": "0.480s", "end": "0.840s"}, {"word": "мир", "start": "0.960s", "end": "1.200s"}, ], }, { "text": "это тест", "normalized_text": "Это тест.", "start": "1.500s", "end": "2.100s", "word_alignments": [ {"word": "это", "start": "1.500s", "end": "1.700s"}, {"word": "тест", "start": "1.800s", "end": "2.100s"}, ], }, ], "channel": 0, } ] def test_returns_document_with_segments(self): raw = self._make_raw_result() doc = _build_document_from_salute_result(raw, language="ru-RU") assert len(doc.segments) == 2 def test_segment_text(self): raw = self._make_raw_result() doc = _build_document_from_salute_result(raw, language="ru-RU") assert doc.segments[0].lines[0].text == "привет мир" def test_word_timestamps(self): raw = self._make_raw_result() doc = _build_document_from_salute_result(raw, language="ru-RU") first_word = doc.segments[0].lines[0].words[0] assert first_word.text == "привет" assert first_word.time.start == 0.48 assert first_word.time.end == 0.84 def test_segment_time_range(self): raw = self._make_raw_result() doc = _build_document_from_salute_result(raw, language="ru-RU") assert doc.segments[0].time.start == 0.48 assert doc.segments[0].time.end == 1.2 def test_empty_results(self): raw = [{"results": [], "channel": 0}] doc = _build_document_from_salute_result(raw, language="ru-RU") assert len(doc.segments) == 0 def test_missing_word_alignments(self): raw = [ { "results": [ { "text": "привет", "normalized_text": "Привет.", "start": "0.000s", "end": "0.500s", } ], "channel": 0, } ] doc = _build_document_from_salute_result(raw, language="ru-RU") assert len(doc.segments) == 1 # No words but segment still created assert doc.segments[0].time.start == 0.0 ``` - [ ] **Step 2: Run tests to verify they fail** ```bash cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v 2>&1 | head -20 ``` Expected: `ImportError` — `_build_document_from_salute_result` and `_parse_salute_time` don't exist yet. - [ ] **Step 3: Commit test file** ```bash git add cofee_backend/tests/integration/test_salutespeech_parsing.py git commit -m "test(backend): add SaluteSpeech parsing and document building tests" ``` --- ### Task 6: Implement SaluteSpeech Service Functions **Files:** - Modify: `cofee_backend/cpv3/modules/transcription/service.py` (append after line 430) This is the core task — all 8 SaluteSpeech functions. - [ ] **Step 1: Add new imports at top of file** In `cofee_backend/cpv3/modules/transcription/service.py`, add these imports at the top (after the existing imports, around line 10): ```python import threading import time import uuid from pathlib import Path import httpx ``` Note: `time` may already be imported. Check and avoid duplicates. `asyncio` is already imported. `anyio` is already imported. Also add to the schema imports block: ```python from cpv3.modules.transcription.schemas import ( ... # existing imports SaluteSpeechResult, SaluteSpeechSegment, SaluteSpeechWord, SaluteSpeechParams, ) ``` - [ ] **Step 2: Add constants and token cache** After the existing imports (before the `DocumentBuilder` class), add: ```python # ---------------------------------- SaluteSpeech Constants ---------------------------------- SALUTE_AUTH_URL = "https://ngw.devices.sberbank.ru:9443/api/v2/oauth" SALUTE_API_BASE = "https://smartspeech.sber.ru/rest/v1" SALUTE_POLL_INTERVAL_SECONDS = 5.0 SALUTE_POLL_TIMEOUT_SECONDS = 600 SALUTE_TOKEN_REFRESH_MARGIN_SECONDS = 60 SALUTE_ENCODING_MAP: dict[str, str] = { ".mp3": "MP3", ".wav": "PCM_S16LE", ".ogg": "opus", ".flac": "FLAC", } SALUTE_CONTENT_TYPE_MAP: dict[str, str] = { ".mp3": "audio/mpeg", ".wav": "audio/wav", ".ogg": "audio/ogg", ".flac": "audio/flac", } SALUTE_LANGUAGE_MAP: dict[str, str] = { "ru": "ru-RU", "en": "en-US", } ERROR_SALUTE_AUTH_FAILED = "Ошибка авторизации SaluteSpeech: {detail}" ERROR_SALUTE_UPLOAD_FAILED = "Ошибка загрузки файла в SaluteSpeech: {detail}" ERROR_SALUTE_TASK_FAILED = "Ошибка распознавания SaluteSpeech: {detail}" ERROR_SALUTE_TIMEOUT = "Превышено время ожидания распознавания SaluteSpeech" ERROR_SALUTE_UNSUPPORTED_FORMAT = "Неподдерживаемый формат аудио для SaluteSpeech: {ext}" _salute_token_lock = threading.Lock() _salute_token: str | None = None _salute_token_expires_at: float = 0.0 ``` - [ ] **Step 3: Add helper functions** After the end of file (after `transcribe_with_google_speech`), append all SaluteSpeech functions: ```python # ---------------------------------- SaluteSpeech Engine ---------------------------------- def _parse_salute_time(s: str) -> float: """Parse SaluteSpeech timestamp string '0.480s' → 0.48.""" return float(s.rstrip("s")) def _get_salute_access_token(client: httpx.Client) -> str: """Get or refresh SaluteSpeech OAuth token. Thread-safe.""" global _salute_token, _salute_token_expires_at with _salute_token_lock: if _salute_token and time.monotonic() < ( _salute_token_expires_at - SALUTE_TOKEN_REFRESH_MARGIN_SECONDS ): return _salute_token settings = get_settings() response = client.post( SALUTE_AUTH_URL, headers={ "Authorization": f"Basic {settings.salute_auth_key}", "RqUID": str(uuid.uuid4()), "Content-Type": "application/x-www-form-urlencoded", }, content=f"scope={settings.salute_scope}", ) if response.status_code != 200: raise RuntimeError( ERROR_SALUTE_AUTH_FAILED.format(detail=response.text[:200]) ) data = response.json() _salute_token = data["access_token"] expires_in_seconds = (data["expires_at"] / 1000) - time.time() _salute_token_expires_at = time.monotonic() + expires_in_seconds return _salute_token def _upload_salute_audio( client: httpx.Client, token: str, audio_data: bytes, content_type: str ) -> str: """Upload audio to SaluteSpeech, return request_file_id.""" response = client.post( f"{SALUTE_API_BASE}/data:upload", headers={ "Authorization": f"Bearer {token}", "Content-Type": content_type, }, content=audio_data, timeout=120.0, ) if response.status_code != 200: raise RuntimeError( ERROR_SALUTE_UPLOAD_FAILED.format(detail=response.text[:200]) ) return response.json()["result"]["request_file_id"] def _create_salute_task( client: httpx.Client, token: str, file_id: str, *, language: str, model: str, audio_encoding: str, sample_rate: int, ) -> str: """Create async recognition task, return task_id.""" body = { "options": { "audio_encoding": audio_encoding, "sample_rate": sample_rate, "language": language, "model": model, "channels_count": 1, "hypotheses_count": 1, }, "request_file_id": file_id, } response = client.post( f"{SALUTE_API_BASE}/speech:async_recognize", headers={ "Authorization": f"Bearer {token}", "Content-Type": "application/json", }, json=body, ) if response.status_code != 200: raise RuntimeError( ERROR_SALUTE_TASK_FAILED.format(detail=response.text[:200]) ) return response.json()["result"]["id"] def _poll_salute_task( client: httpx.Client, token: str, task_id: str, job_uuid: uuid.UUID | None, on_progress: ProgressCallback | None, ) -> str: """Poll task until DONE, return response_file_id. Checks job cancellation each iteration.""" from cpv3.modules.tasks.service import _raise_if_job_cancelled start = time.monotonic() while True: elapsed = time.monotonic() - start if elapsed > SALUTE_POLL_TIMEOUT_SECONDS: raise TimeoutError(ERROR_SALUTE_TIMEOUT) if job_uuid is not None: _raise_if_job_cancelled(job_uuid) response = client.get( f"{SALUTE_API_BASE}/task:get", params={"id": task_id}, headers={"Authorization": f"Bearer {token}"}, ) response.raise_for_status() result = response.json()["result"] status = result["status"] if status == "DONE": return result["response_file_id"] if status == "ERROR": error_msg = result.get("error", "unknown error") raise RuntimeError( ERROR_SALUTE_TASK_FAILED.format(detail=error_msg) ) if on_progress is not None: pct = min(elapsed / SALUTE_POLL_TIMEOUT_SECONDS * 100, 95.0) on_progress(pct) time.sleep(SALUTE_POLL_INTERVAL_SECONDS) def _download_salute_result( client: httpx.Client, token: str, response_file_id: str ) -> list[dict]: """Download recognition result JSON.""" response = client.get( f"{SALUTE_API_BASE}/data:download", params={"response_file_id": response_file_id}, headers={"Authorization": f"Bearer {token}"}, timeout=60.0, ) response.raise_for_status() return response.json() def _build_document_from_salute_result( raw_channels: list[dict], *, language: str ) -> Document: """Convert SaluteSpeech result JSON to Document.""" builder = DocumentBuilder() words_options = WordOptions() all_segments: list[SaluteSpeechSegment] = [] for channel_data in raw_channels: for result_item in channel_data.get("results", []): word_alignments = result_item.get("word_alignments", []) words = [ SaluteSpeechWord( word=w["word"], start=_parse_salute_time(w["start"]), end=_parse_salute_time(w["end"]), ) for w in word_alignments ] text = result_item.get("text", "") seg_start = _parse_salute_time(result_item["start"]) if words else 0.0 seg_end = _parse_salute_time(result_item["end"]) if words else 0.0 all_segments.append( SaluteSpeechSegment( text=text, start=seg_start, end=seg_end, words=words, ) ) document = _make_document_from_segments( builder, all_segments, max_line_width=words_options.max_line_width ) return builder.process_document(document) def _salute_transcribe_sync( *, local_file_path: str, language: str | None, model: str, sample_rate: int, job_id: uuid.UUID | None = None, on_progress: ProgressCallback | None = None, ) -> Document: """Synchronous SaluteSpeech transcription (runs in Dramatiq worker thread).""" settings = get_settings() ext = Path(local_file_path).suffix.lower() audio_encoding = SALUTE_ENCODING_MAP.get(ext) content_type = SALUTE_CONTENT_TYPE_MAP.get(ext) if not audio_encoding or not content_type: raise ValueError(ERROR_SALUTE_UNSUPPORTED_FORMAT.format(ext=ext)) salute_language = SALUTE_LANGUAGE_MAP.get(language or "", "ru-RU") verify = str(settings.salute_ca_cert_path) if settings.salute_ca_cert_path else True with httpx.Client(verify=verify, timeout=30.0) as client: token = _get_salute_access_token(client) with open(local_file_path, "rb") as f: audio_data = f.read() file_id = _upload_salute_audio(client, token, audio_data, content_type) task_id = _create_salute_task( client, token, file_id, language=salute_language, model=model, audio_encoding=audio_encoding, sample_rate=sample_rate, ) response_file_id = _poll_salute_task( client, token, task_id, job_id, on_progress ) raw_result = _download_salute_result(client, token, response_file_id) return _build_document_from_salute_result(raw_result, language=salute_language) async def transcribe_with_salute_speech( storage: StorageService, *, file_key: str, language: str | None = None, model: str = "general", sample_rate: int = 16000, job_id: uuid.UUID | None = None, on_progress: ProgressCallback | None = None, ) -> Document: """Async wrapper for SaluteSpeech transcription.""" tmp = await storage.download_to_temp(file_key) try: return await anyio.to_thread.run_sync( lambda: _salute_transcribe_sync( local_file_path=tmp.path, language=language, model=model, sample_rate=sample_rate, job_id=job_id, on_progress=on_progress, ) ) finally: tmp.cleanup() ``` - [ ] **Step 4: Run the parsing tests** ```bash cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v ``` Expected: all tests pass. - [ ] **Step 5: Run lint** ```bash cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py ``` Expected: no errors. - [ ] **Step 6: Commit** ```bash git add cofee_backend/cpv3/modules/transcription/service.py git commit -m "feat(backend): implement SaluteSpeech transcription engine (8 functions)" ``` --- ### Task 7: Add Task Dispatch **Files:** - Modify: `cofee_backend/cpv3/modules/tasks/schemas.py:86` (engine Literal) - Modify: `cofee_backend/cpv3/modules/tasks/service.py:88-91` (ENGINE_MAP) - Modify: `cofee_backend/cpv3/modules/tasks/service.py:613-616` (actor import) - Modify: `cofee_backend/cpv3/modules/tasks/service.py:700` (elif branch) - [ ] **Step 1: Extend engine Literal in task schema** In `cofee_backend/cpv3/modules/tasks/schemas.py`, line 86, change: ```python engine: Literal["whisper", "google"] = Field( ``` to: ```python engine: Literal["whisper", "google", "salutespeech"] = Field( ``` - [ ] **Step 2: Add to ENGINE_MAP** In `cofee_backend/cpv3/modules/tasks/service.py`, lines 88-91, change: ```python ENGINE_MAP: dict[str, str] = { "whisper": "LOCAL_WHISPER", "google": "GOOGLE_SPEECH_CLOUD", } ``` to: ```python ENGINE_MAP: dict[str, str] = { "whisper": "LOCAL_WHISPER", "google": "GOOGLE_SPEECH_CLOUD", "salutespeech": "SALUTE_SPEECH", } ``` - [ ] **Step 3: Add import in actor** In `cofee_backend/cpv3/modules/tasks/service.py`, inside `transcription_generate_actor` (lines 613-616), change: ```python from cpv3.modules.transcription.service import ( transcribe_with_google_speech, transcribe_with_whisper, ) ``` to: ```python from cpv3.modules.transcription.service import ( transcribe_with_google_speech, transcribe_with_salute_speech, transcribe_with_whisper, ) ``` - [ ] **Step 4: Add elif dispatch branch** In `cofee_backend/cpv3/modules/tasks/service.py`, after the Google branch (after line 700, before the `else:`), add: ```python elif engine == "salutespeech": # Extract sample rate from probe if available audio_stream = next( (s for s in probe.streams if s.codec_type == "audio"), None ) sr = int(audio_stream.sample_rate) if audio_stream and audio_stream.sample_rate else 16000 document = _run_async( transcribe_with_salute_speech( storage, file_key=file_key, language=language, model=model, sample_rate=sr, job_id=job_uuid, on_progress=_on_whisper_progress, ) ) ``` - [ ] **Step 5: Run lint** ```bash cd cofee_backend && uv run ruff check cpv3/modules/tasks/service.py cpv3/modules/tasks/schemas.py ``` Expected: no errors. - [ ] **Step 6: Commit** ```bash git add cofee_backend/cpv3/modules/tasks/schemas.py cofee_backend/cpv3/modules/tasks/service.py git commit -m "feat(backend): add SaluteSpeech to task dispatch (ENGINE_MAP + elif branch)" ``` --- ### Task 8: Add Direct Endpoint (Optional) **Files:** - Modify: `cofee_backend/cpv3/modules/transcription/router.py` (after line 145) - [ ] **Step 1: Add route** In `cofee_backend/cpv3/modules/transcription/router.py`, add the import at the top alongside existing imports: ```python from cpv3.modules.transcription.schemas import ( ... # existing SaluteSpeechParams, ) from cpv3.modules.transcription.service import ( ... # existing transcribe_with_salute_speech, ) ``` Then append after the last endpoint (after line 145): ```python @router.post("/salute-speech/", response_model=Document) async def salute_speech_transcribe( body: SaluteSpeechParams, current_user: User = Depends(get_current_user), storage: StorageService = Depends(get_storage), ) -> Document: _ = current_user return await transcribe_with_salute_speech( storage, file_key=body.file_path, language=body.language, model=body.model, ) ``` - [ ] **Step 2: Run lint** ```bash cd cofee_backend && uv run ruff check cpv3/modules/transcription/router.py ``` Expected: no errors. - [ ] **Step 3: Commit** ```bash git add cofee_backend/cpv3/modules/transcription/router.py git commit -m "feat(backend): add direct /salute-speech/ transcription endpoint" ``` --- ### Task 9: Frontend — TranscriptionModal **Files:** - Modify: `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx` - [ ] **Step 1: Extend type** At line 17, change: ```typescript engine: "whisper" | "google" ``` to: ```typescript engine: "whisper" | "google" | "salutespeech" ``` - [ ] **Step 2: Add engine option** At lines 22-25, change: ```typescript const ENGINE_OPTIONS = [ { value: "whisper", label: "Whisper (локальный)" }, { value: "google", label: "Google Speech" }, ] ``` to: ```typescript const ENGINE_OPTIONS = [ { value: "whisper", label: "Whisper (локальный)" }, { value: "google", label: "Google Speech" }, { value: "salutespeech", label: "SaluteSpeech" }, ] ``` - [ ] **Step 3: Split model options** Rename the existing `MODEL_OPTIONS` (lines 33-38) and add SaluteSpeech models: ```typescript const WHISPER_MODEL_OPTIONS = [ { value: "base", label: "Базовая" }, { value: "small", label: "Малая" }, { value: "medium", label: "Средняя" }, { value: "large", label: "Большая" }, ] const SALUTE_MODEL_OPTIONS = [ { value: "general", label: "Общая" }, { value: "finance", label: "Финансы" }, { value: "medicine", label: "Медицина" }, ] ``` - [ ] **Step 4: Update model dropdown guard** At line 162, change the model dropdown conditional from: ```typescript {engine === "whisper" && ( ``` to: ```typescript {(engine === "whisper" || engine === "salutespeech") && ( ``` And inside, change the options reference from `MODEL_OPTIONS` to: ```typescript {(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => ( ``` - [ ] **Step 5: Add model reset on engine change** Find the component function body (after the `useForm` call). Add a `useEffect` that resets the model when engine changes: ```typescript const engine = watch("engine") useEffect(() => { if (engine === "salutespeech") { setValue("model", "general") } else if (engine === "whisper") { setValue("model", "base") } }, [engine, setValue]) ``` Note: `watch` and `setValue` come from `useForm` — check that they're destructured. If `watch("engine")` is already used elsewhere, reuse that variable. - [ ] **Step 6: Type check** ```bash cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20 ``` Expected: no new errors. - [ ] **Step 7: Commit** ```bash git add cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionModal" ``` --- ### Task 10: Frontend — TranscriptionSettingsStep **Files:** - Modify: `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx` Apply the same changes as Task 9 to this file (constants are duplicated). - [ ] **Step 1: Extend type** At line 22, change: ```typescript engine: "whisper" | "google" ``` to: ```typescript engine: "whisper" | "google" | "salutespeech" ``` - [ ] **Step 2: Add engine option** At lines 27-30, change: ```typescript const ENGINE_OPTIONS = [ { value: "whisper", label: "Whisper (локальный)" }, { value: "google", label: "Google Speech" }, ] ``` to: ```typescript const ENGINE_OPTIONS = [ { value: "whisper", label: "Whisper (локальный)" }, { value: "google", label: "Google Speech" }, { value: "salutespeech", label: "SaluteSpeech" }, ] ``` - [ ] **Step 3: Split model options** Rename `MODEL_OPTIONS` (lines 38-43) and add SaluteSpeech models: ```typescript const WHISPER_MODEL_OPTIONS = [ { value: "base", label: "Базовая" }, { value: "small", label: "Малая" }, { value: "medium", label: "Средняя" }, { value: "large", label: "Большая" }, ] const SALUTE_MODEL_OPTIONS = [ { value: "general", label: "Общая" }, { value: "finance", label: "Финансы" }, { value: "medicine", label: "Медицина" }, ] ``` - [ ] **Step 4: Update model dropdown guard** At line 263, change: ```typescript {engine === "whisper" && ( ``` to: ```typescript {(engine === "whisper" || engine === "salutespeech") && ( ``` And change the options reference from `MODEL_OPTIONS` to: ```typescript {(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => ( ``` - [ ] **Step 5: Add model reset on engine change** Same `useEffect` as Task 9: ```typescript const engine = watch("engine") useEffect(() => { if (engine === "salutespeech") { setValue("model", "general") } else if (engine === "whisper") { setValue("model", "base") } }, [engine, setValue]) ``` - [ ] **Step 6: Type check** ```bash cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20 ``` Expected: no new errors. - [ ] **Step 7: Commit** ```bash git add cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionSettingsStep" ``` --- ### Task 11: Final Verification **Files:** None (verification only) - [ ] **Step 1: Backend lint** ```bash cd cofee_backend && uv run ruff check cpv3/ 2>&1 | head -20 ``` Expected: no errors. - [ ] **Step 2: Backend tests** ```bash cd cofee_backend && uv run pytest 2>&1 | tail -30 ``` Expected: all tests pass (including new SaluteSpeech parsing tests). - [ ] **Step 3: Frontend type check** ```bash cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20 ``` Expected: no new errors. - [ ] **Step 4: Write verification report** ``` VERIFICATION REPORT =================== Subproject: backend + frontend Level: base Type check: [PASS/FAIL] Lint: [PASS/FAIL] Tests: [PASS/FAIL] (X passed, Y failed) Build: SKIPPED E2E: SKIPPED Files changed: ~10 Status: [READY/NOT READY] ```