remotion_service/docs/superpowers/plans/2026-04-03-salutespeech-transcription.md

# SaluteSpeech Transcription Engine — Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Add SaluteSpeech (Sber) as a third transcription engine with async REST API, domain-specific models, and word-level timestamps.

**Architecture:** Direct integration following existing engine pattern — plain functions in `transcription/service.py`, `if/elif` dispatch in Dramatiq actor, no new abstractions. SaluteSpeech uses a 4-step REST flow (auth → upload → create task → poll → download) with a thread-safe OAuth token cache.

**Tech Stack:** Python, httpx (sync), FastAPI, Dramatiq, React/TypeScript

**Spec:** `docs/superpowers/specs/2026-04-03-salutespeech-transcription-design.md`

---

## File Map

| File | Action | Responsibility |
|------|--------|----------------|
| `cofee_backend/.certs/russian_trusted_root_ca.pem` | Create | Russian CA certificate for TLS |
| `cofee_backend/cpv3/infrastructure/settings.py` | Modify | 3 new SaluteSpeech settings fields |
| `cofee_backend/cpv3/modules/transcription/schemas.py` | Modify | New schema types, extend engine enum + type unions |
| `cofee_backend/cpv3/modules/transcription/service.py` | Modify | ~8 new functions for SaluteSpeech flow |
| `cofee_backend/cpv3/modules/transcription/router.py` | Modify | Direct `/salute-speech/` endpoint |
| `cofee_backend/cpv3/modules/tasks/schemas.py` | Modify | Extend engine Literal |
| `cofee_backend/cpv3/modules/tasks/service.py` | Modify | ENGINE_MAP + elif dispatch branch |
| `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx` | Modify | Engine option, split model options, engine change effect |
| `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx` | Modify | Same as TranscriptionModal |
| `cofee_backend/tests/integration/test_salutespeech_parsing.py` | Create | Unit tests for timestamp parsing + result conversion |

---

### Task 1: Bundle TLS Certificate

**Files:**
- Create: `cofee_backend/.certs/russian_trusted_root_ca.pem`

- [ ] **Step 1: Create `.certs` directory**

```bash
mkdir -p cofee_backend/.certs
```

- [ ] **Step 2: Download the Russian root CA certificate**

```bash
curl -k "https://gu-st.ru/content/Other/doc/russian_trusted_root_ca.cer" \
  -o cofee_backend/.certs/russian_trusted_root_ca.pem
```

- [ ] **Step 3: Verify the cert is valid PEM format**

```bash
openssl x509 -in cofee_backend/.certs/russian_trusted_root_ca.pem -noout -subject -dates
```

Expected: prints subject (Russian CA) and validity dates without errors. If the file is DER format instead of PEM, convert:

```bash
openssl x509 -inform DER -in cofee_backend/.certs/russian_trusted_root_ca.pem \
  -out cofee_backend/.certs/russian_trusted_root_ca.pem -outform PEM
```

- [ ] **Step 4: Add to `.gitignore` exclusion**

The `.certs/` directory should NOT be gitignored — this is a public root CA, safe to commit. Verify it's not caught by any existing gitignore pattern:

```bash
cd cofee_backend && git check-ignore .certs/russian_trusted_root_ca.pem
```

Expected: no output (not ignored).

- [ ] **Step 5: Commit**

```bash
git add cofee_backend/.certs/russian_trusted_root_ca.pem
git commit -m "chore(backend): bundle Russian root CA cert for SaluteSpeech TLS"
```

---

### Task 2: Add SaluteSpeech Settings

**Files:**
- Modify: `cofee_backend/cpv3/infrastructure/settings.py:97` (after `webhook_base_url` field)

- [ ] **Step 1: Add 3 new fields to Settings class**

In `cofee_backend/cpv3/infrastructure/settings.py`, after the `webhook_base_url` field (line 97) and before `def get_database_url(self)` (line 99), add:

```python
    # SaluteSpeech
    salute_auth_key: str = Field(default="", alias="SALUTE_AUTH_KEY")
    salute_ca_cert_path: Path | None = Field(
        default=None, alias="SALUTE_CA_CERT_PATH"
    )
    salute_scope: str = Field(
        default="SALUTE_SPEECH_PERS", alias="SALUTE_SCOPE"
    )
```

- [ ] **Step 2: Verify settings load without errors**

```bash
cd cofee_backend && uv run python -c "from cpv3.infrastructure.settings import get_settings; s = get_settings(); print(f'salute_auth_key={s.salute_auth_key!r}, salute_ca_cert_path={s.salute_ca_cert_path!r}, salute_scope={s.salute_scope!r}')"
```

Expected: `salute_auth_key='', salute_ca_cert_path=None, salute_scope='SALUTE_SPEECH_PERS'`

- [ ] **Step 3: Commit**

```bash
git add cofee_backend/cpv3/infrastructure/settings.py
git commit -m "feat(backend): add SaluteSpeech settings (auth key, cert path, scope)"
```

---

### Task 3: Add SaluteSpeech Schemas

**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/schemas.py:10` (engine enum) and after line 147 (EOF, new classes)

- [ ] **Step 1: Extend `TranscriptionEngineEnum`**

In `cofee_backend/cpv3/modules/transcription/schemas.py`, line 10, change:

```python
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD"]
```

to:

```python
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD", "SALUTE_SPEECH"]
```

- [ ] **Step 2: Add SaluteSpeech schema classes**

After the `GoogleSpeechParams` class (line 147, end of file), add:

```python


# ---------------------------------- SaluteSpeech Models ----------------------------------


class SaluteSpeechWord(Schema):
    word: str
    start: float
    end: float


class SaluteSpeechSegment(Schema):
    text: str
    start: float
    end: float
    words: list[SaluteSpeechWord] = []


class SaluteSpeechResult(Schema):
    text: str
    segments: list[SaluteSpeechSegment]
    language: str


class SaluteSpeechParams(Schema):
    file_path: str
    language: str | None = None
    model: str = "general"
```

- [ ] **Step 3: Verify schemas import correctly**

```bash
cd cofee_backend && uv run python -c "from cpv3.modules.transcription.schemas import SaluteSpeechWord, SaluteSpeechSegment, SaluteSpeechResult, SaluteSpeechParams, TranscriptionEngineEnum; print('OK')"
```

Expected: `OK`

- [ ] **Step 4: Commit**

```bash
git add cofee_backend/cpv3/modules/transcription/schemas.py
git commit -m "feat(backend): add SaluteSpeech schema types and extend engine enum"
```

---

### Task 4: Extend Type Unions in Service

**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/service.py:44` and `service.py:222` (type unions)
- Modify: `cofee_backend/cpv3/modules/transcription/service.py` imports (top of file)

- [ ] **Step 1: Add SaluteSpeech imports**

In `cofee_backend/cpv3/modules/transcription/service.py`, in the imports from `transcription.schemas` (around line 229–243), add `SaluteSpeechSegment` to the import list:

```python
from cpv3.modules.transcription.schemas import (
    Document,
    GoogleSpeechResult,
    GoogleSpeechSegment,
    GoogleSpeechWord,
    LineNode,
    SaluteSpeechSegment,
    SegmentNode,
    Tag,
    TimeRange,
    WhisperResult,
    WhisperSegment,
    WhisperWord,
    WordNode,
    WordOptions,
)
```

- [ ] **Step 2: Extend `compute_segment_lines` type hint**

At line 44, change:

```python
    def compute_segment_lines(
        self, segment: WhisperSegment | GoogleSpeechSegment, max_chars_per_line: int
    ) -> list[LineNode]:
```

to:

```python
    def compute_segment_lines(
        self,
        segment: WhisperSegment | GoogleSpeechSegment | SaluteSpeechSegment,
        max_chars_per_line: int,
    ) -> list[LineNode]:
```

- [ ] **Step 3: Extend `_make_document_from_segments` type hint**

At line 222, change:

```python
def _make_document_from_segments(
    builder: DocumentBuilder,
    segments: list[WhisperSegment] | list[GoogleSpeechSegment],
    *,
    max_line_width: int,
) -> Document:
```

to:

```python
def _make_document_from_segments(
    builder: DocumentBuilder,
    segments: list[WhisperSegment] | list[GoogleSpeechSegment] | list[SaluteSpeechSegment],
    *,
    max_line_width: int,
) -> Document:
```

- [ ] **Step 4: Run lint to verify**

```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
```

Expected: no errors.

- [ ] **Step 5: Commit**

```bash
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): extend type unions to accept SaluteSpeechSegment"
```

---

### Task 5: Write Tests for SaluteSpeech Parsing

**Files:**
- Create: `cofee_backend/tests/integration/test_salutespeech_parsing.py`

- [ ] **Step 1: Write the test file**

Create `cofee_backend/tests/integration/test_salutespeech_parsing.py`:

```python
"""Tests for SaluteSpeech result parsing and document building."""

from cpv3.modules.transcription.service import (
    _build_document_from_salute_result,
    _parse_salute_time,
)


class TestParseSaluteTime:
    def test_simple_timestamp(self):
        assert _parse_salute_time("0.480s") == 0.48

    def test_zero(self):
        assert _parse_salute_time("0.000s") == 0.0

    def test_large_timestamp(self):
        assert _parse_salute_time("123.456s") == 123.456

    def test_integer_timestamp(self):
        assert _parse_salute_time("5s") == 5.0


class TestBuildDocumentFromSaluteResult:
    def _make_raw_result(self):
        """Minimal SaluteSpeech API response for testing."""
        return [
            {
                "results": [
                    {
                        "text": "привет мир",
                        "normalized_text": "Привет мир.",
                        "start": "0.480s",
                        "end": "1.200s",
                        "word_alignments": [
                            {"word": "привет", "start": "0.480s", "end": "0.840s"},
                            {"word": "мир", "start": "0.960s", "end": "1.200s"},
                        ],
                    },
                    {
                        "text": "это тест",
                        "normalized_text": "Это тест.",
                        "start": "1.500s",
                        "end": "2.100s",
                        "word_alignments": [
                            {"word": "это", "start": "1.500s", "end": "1.700s"},
                            {"word": "тест", "start": "1.800s", "end": "2.100s"},
                        ],
                    },
                ],
                "channel": 0,
            }
        ]

    def test_returns_document_with_segments(self):
        raw = self._make_raw_result()
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        assert len(doc.segments) == 2

    def test_segment_text(self):
        raw = self._make_raw_result()
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        assert doc.segments[0].lines[0].text == "привет мир"

    def test_word_timestamps(self):
        raw = self._make_raw_result()
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        first_word = doc.segments[0].lines[0].words[0]
        assert first_word.text == "привет"
        assert first_word.time.start == 0.48
        assert first_word.time.end == 0.84

    def test_segment_time_range(self):
        raw = self._make_raw_result()
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        assert doc.segments[0].time.start == 0.48
        assert doc.segments[0].time.end == 1.2

    def test_empty_results(self):
        raw = [{"results": [], "channel": 0}]
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        assert len(doc.segments) == 0

    def test_missing_word_alignments(self):
        raw = [
            {
                "results": [
                    {
                        "text": "привет",
                        "normalized_text": "Привет.",
                        "start": "0.000s",
                        "end": "0.500s",
                    }
                ],
                "channel": 0,
            }
        ]
        doc = _build_document_from_salute_result(raw, language="ru-RU")
        assert len(doc.segments) == 1
        # No words but segment still created
        assert doc.segments[0].time.start == 0.0
```

- [ ] **Step 2: Run tests to verify they fail**

```bash
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v 2>&1 | head -20
```

Expected: `ImportError` — `_build_document_from_salute_result` and `_parse_salute_time` don't exist yet.

- [ ] **Step 3: Commit test file**

```bash
git add cofee_backend/tests/integration/test_salutespeech_parsing.py
git commit -m "test(backend): add SaluteSpeech parsing and document building tests"
```

---

### Task 6: Implement SaluteSpeech Service Functions

**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/service.py` (append after line 430)

This is the core task — all 8 SaluteSpeech functions.

- [ ] **Step 1: Add new imports at top of file**

In `cofee_backend/cpv3/modules/transcription/service.py`, add these imports at the top (after the existing imports, around line 10):

```python
import threading
import time
import uuid
from pathlib import Path

import httpx
```

Note: `time` may already be imported. Check and avoid duplicates. `asyncio` is already imported. `anyio` is already imported.

Also add to the schema imports block:

```python
from cpv3.modules.transcription.schemas import (
    ...  # existing imports
    SaluteSpeechResult,
    SaluteSpeechSegment,
    SaluteSpeechWord,
    SaluteSpeechParams,
)
```

- [ ] **Step 2: Add constants and token cache**

After the existing imports (before the `DocumentBuilder` class), add:

```python
# ---------------------------------- SaluteSpeech Constants ----------------------------------

SALUTE_AUTH_URL = "https://ngw.devices.sberbank.ru:9443/api/v2/oauth"
SALUTE_API_BASE = "https://smartspeech.sber.ru/rest/v1"
SALUTE_POLL_INTERVAL_SECONDS = 5.0
SALUTE_POLL_TIMEOUT_SECONDS = 600
SALUTE_TOKEN_REFRESH_MARGIN_SECONDS = 60

SALUTE_ENCODING_MAP: dict[str, str] = {
    ".mp3": "MP3",
    ".wav": "PCM_S16LE",
    ".ogg": "opus",
    ".flac": "FLAC",
}

SALUTE_CONTENT_TYPE_MAP: dict[str, str] = {
    ".mp3": "audio/mpeg",
    ".wav": "audio/wav",
    ".ogg": "audio/ogg",
    ".flac": "audio/flac",
}

SALUTE_LANGUAGE_MAP: dict[str, str] = {
    "ru": "ru-RU",
    "en": "en-US",
}

ERROR_SALUTE_AUTH_FAILED = "Ошибка авторизации SaluteSpeech: {detail}"
ERROR_SALUTE_UPLOAD_FAILED = "Ошибка загрузки файла в SaluteSpeech: {detail}"
ERROR_SALUTE_TASK_FAILED = "Ошибка распознавания SaluteSpeech: {detail}"
ERROR_SALUTE_TIMEOUT = "Превышено время ожидания распознавания SaluteSpeech"
ERROR_SALUTE_UNSUPPORTED_FORMAT = "Неподдерживаемый формат аудио для SaluteSpeech: {ext}"

_salute_token_lock = threading.Lock()
_salute_token: str | None = None
_salute_token_expires_at: float = 0.0
```

- [ ] **Step 3: Add helper functions**

After the end of file (after `transcribe_with_google_speech`), append all SaluteSpeech functions:

```python
# ---------------------------------- SaluteSpeech Engine ----------------------------------


def _parse_salute_time(s: str) -> float:
    """Parse SaluteSpeech timestamp string '0.480s' → 0.48."""
    return float(s.rstrip("s"))


def _get_salute_access_token(client: httpx.Client) -> str:
    """Get or refresh SaluteSpeech OAuth token. Thread-safe."""
    global _salute_token, _salute_token_expires_at
    with _salute_token_lock:
        if _salute_token and time.monotonic() < (
            _salute_token_expires_at - SALUTE_TOKEN_REFRESH_MARGIN_SECONDS
        ):
            return _salute_token

        settings = get_settings()
        response = client.post(
            SALUTE_AUTH_URL,
            headers={
                "Authorization": f"Basic {settings.salute_auth_key}",
                "RqUID": str(uuid.uuid4()),
                "Content-Type": "application/x-www-form-urlencoded",
            },
            content=f"scope={settings.salute_scope}",
        )
        if response.status_code != 200:
            raise RuntimeError(
                ERROR_SALUTE_AUTH_FAILED.format(detail=response.text[:200])
            )
        data = response.json()
        _salute_token = data["access_token"]
        expires_in_seconds = (data["expires_at"] / 1000) - time.time()
        _salute_token_expires_at = time.monotonic() + expires_in_seconds
        return _salute_token


def _upload_salute_audio(
    client: httpx.Client, token: str, audio_data: bytes, content_type: str
) -> str:
    """Upload audio to SaluteSpeech, return request_file_id."""
    response = client.post(
        f"{SALUTE_API_BASE}/data:upload",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": content_type,
        },
        content=audio_data,
        timeout=120.0,
    )
    if response.status_code != 200:
        raise RuntimeError(
            ERROR_SALUTE_UPLOAD_FAILED.format(detail=response.text[:200])
        )
    return response.json()["result"]["request_file_id"]


def _create_salute_task(
    client: httpx.Client,
    token: str,
    file_id: str,
    *,
    language: str,
    model: str,
    audio_encoding: str,
    sample_rate: int,
) -> str:
    """Create async recognition task, return task_id."""
    body = {
        "options": {
            "audio_encoding": audio_encoding,
            "sample_rate": sample_rate,
            "language": language,
            "model": model,
            "channels_count": 1,
            "hypotheses_count": 1,
        },
        "request_file_id": file_id,
    }
    response = client.post(
        f"{SALUTE_API_BASE}/speech:async_recognize",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
        },
        json=body,
    )
    if response.status_code != 200:
        raise RuntimeError(
            ERROR_SALUTE_TASK_FAILED.format(detail=response.text[:200])
        )
    return response.json()["result"]["id"]


def _poll_salute_task(
    client: httpx.Client,
    token: str,
    task_id: str,
    job_uuid: uuid.UUID | None,
    on_progress: ProgressCallback | None,
) -> str:
    """Poll task until DONE, return response_file_id. Checks job cancellation each iteration."""
    from cpv3.modules.tasks.service import _raise_if_job_cancelled

    start = time.monotonic()
    while True:
        elapsed = time.monotonic() - start
        if elapsed > SALUTE_POLL_TIMEOUT_SECONDS:
            raise TimeoutError(ERROR_SALUTE_TIMEOUT)

        if job_uuid is not None:
            _raise_if_job_cancelled(job_uuid)

        response = client.get(
            f"{SALUTE_API_BASE}/task:get",
            params={"id": task_id},
            headers={"Authorization": f"Bearer {token}"},
        )
        response.raise_for_status()
        result = response.json()["result"]
        status = result["status"]

        if status == "DONE":
            return result["response_file_id"]
        if status == "ERROR":
            error_msg = result.get("error", "unknown error")
            raise RuntimeError(
                ERROR_SALUTE_TASK_FAILED.format(detail=error_msg)
            )

        if on_progress is not None:
            pct = min(elapsed / SALUTE_POLL_TIMEOUT_SECONDS * 100, 95.0)
            on_progress(pct)

        time.sleep(SALUTE_POLL_INTERVAL_SECONDS)


def _download_salute_result(
    client: httpx.Client, token: str, response_file_id: str
) -> list[dict]:
    """Download recognition result JSON."""
    response = client.get(
        f"{SALUTE_API_BASE}/data:download",
        params={"response_file_id": response_file_id},
        headers={"Authorization": f"Bearer {token}"},
        timeout=60.0,
    )
    response.raise_for_status()
    return response.json()


def _build_document_from_salute_result(
    raw_channels: list[dict], *, language: str
) -> Document:
    """Convert SaluteSpeech result JSON to Document."""
    builder = DocumentBuilder()
    words_options = WordOptions()

    all_segments: list[SaluteSpeechSegment] = []

    for channel_data in raw_channels:
        for result_item in channel_data.get("results", []):
            word_alignments = result_item.get("word_alignments", [])
            words = [
                SaluteSpeechWord(
                    word=w["word"],
                    start=_parse_salute_time(w["start"]),
                    end=_parse_salute_time(w["end"]),
                )
                for w in word_alignments
            ]

            text = result_item.get("text", "")
            seg_start = _parse_salute_time(result_item["start"]) if words else 0.0
            seg_end = _parse_salute_time(result_item["end"]) if words else 0.0

            all_segments.append(
                SaluteSpeechSegment(
                    text=text,
                    start=seg_start,
                    end=seg_end,
                    words=words,
                )
            )

    document = _make_document_from_segments(
        builder, all_segments, max_line_width=words_options.max_line_width
    )
    return builder.process_document(document)


def _salute_transcribe_sync(
    *,
    local_file_path: str,
    language: str | None,
    model: str,
    sample_rate: int,
    job_id: uuid.UUID | None = None,
    on_progress: ProgressCallback | None = None,
) -> Document:
    """Synchronous SaluteSpeech transcription (runs in Dramatiq worker thread)."""
    settings = get_settings()

    ext = Path(local_file_path).suffix.lower()
    audio_encoding = SALUTE_ENCODING_MAP.get(ext)
    content_type = SALUTE_CONTENT_TYPE_MAP.get(ext)
    if not audio_encoding or not content_type:
        raise ValueError(ERROR_SALUTE_UNSUPPORTED_FORMAT.format(ext=ext))

    salute_language = SALUTE_LANGUAGE_MAP.get(language or "", "ru-RU")

    verify = str(settings.salute_ca_cert_path) if settings.salute_ca_cert_path else True
    with httpx.Client(verify=verify, timeout=30.0) as client:
        token = _get_salute_access_token(client)

        with open(local_file_path, "rb") as f:
            audio_data = f.read()

        file_id = _upload_salute_audio(client, token, audio_data, content_type)
        task_id = _create_salute_task(
            client,
            token,
            file_id,
            language=salute_language,
            model=model,
            audio_encoding=audio_encoding,
            sample_rate=sample_rate,
        )
        response_file_id = _poll_salute_task(
            client, token, task_id, job_id, on_progress
        )
        raw_result = _download_salute_result(client, token, response_file_id)

    return _build_document_from_salute_result(raw_result, language=salute_language)


async def transcribe_with_salute_speech(
    storage: StorageService,
    *,
    file_key: str,
    language: str | None = None,
    model: str = "general",
    sample_rate: int = 16000,
    job_id: uuid.UUID | None = None,
    on_progress: ProgressCallback | None = None,
) -> Document:
    """Async wrapper for SaluteSpeech transcription."""
    tmp = await storage.download_to_temp(file_key)
    try:
        return await anyio.to_thread.run_sync(
            lambda: _salute_transcribe_sync(
                local_file_path=tmp.path,
                language=language,
                model=model,
                sample_rate=sample_rate,
                job_id=job_id,
                on_progress=on_progress,
            )
        )
    finally:
        tmp.cleanup()
```

- [ ] **Step 4: Run the parsing tests**

```bash
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v
```

Expected: all tests pass.

- [ ] **Step 5: Run lint**

```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
```

Expected: no errors.

- [ ] **Step 6: Commit**

```bash
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): implement SaluteSpeech transcription engine (8 functions)"
```

---

### Task 7: Add Task Dispatch

**Files:**
- Modify: `cofee_backend/cpv3/modules/tasks/schemas.py:86` (engine Literal)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:88-91` (ENGINE_MAP)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:613-616` (actor import)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:700` (elif branch)

- [ ] **Step 1: Extend engine Literal in task schema**

In `cofee_backend/cpv3/modules/tasks/schemas.py`, line 86, change:

```python
    engine: Literal["whisper", "google"] = Field(
```

to:

```python
    engine: Literal["whisper", "google", "salutespeech"] = Field(
```

- [ ] **Step 2: Add to ENGINE_MAP**

In `cofee_backend/cpv3/modules/tasks/service.py`, lines 88-91, change:

```python
ENGINE_MAP: dict[str, str] = {
    "whisper": "LOCAL_WHISPER",
    "google": "GOOGLE_SPEECH_CLOUD",
}
```

to:

```python
ENGINE_MAP: dict[str, str] = {
    "whisper": "LOCAL_WHISPER",
    "google": "GOOGLE_SPEECH_CLOUD",
    "salutespeech": "SALUTE_SPEECH",
}
```

- [ ] **Step 3: Add import in actor**

In `cofee_backend/cpv3/modules/tasks/service.py`, inside `transcription_generate_actor` (lines 613-616), change:

```python
    from cpv3.modules.transcription.service import (
        transcribe_with_google_speech,
        transcribe_with_whisper,
    )
```

to:

```python
    from cpv3.modules.transcription.service import (
        transcribe_with_google_speech,
        transcribe_with_salute_speech,
        transcribe_with_whisper,
    )
```

- [ ] **Step 4: Add elif dispatch branch**

In `cofee_backend/cpv3/modules/tasks/service.py`, after the Google branch (after line 700, before the `else:`), add:

```python
        elif engine == "salutespeech":
            # Extract sample rate from probe if available
            audio_stream = next(
                (s for s in probe.streams if s.codec_type == "audio"), None
            )
            sr = int(audio_stream.sample_rate) if audio_stream and audio_stream.sample_rate else 16000
            document = _run_async(
                transcribe_with_salute_speech(
                    storage,
                    file_key=file_key,
                    language=language,
                    model=model,
                    sample_rate=sr,
                    job_id=job_uuid,
                    on_progress=_on_whisper_progress,
                )
            )
```

- [ ] **Step 5: Run lint**

```bash
cd cofee_backend && uv run ruff check cpv3/modules/tasks/service.py cpv3/modules/tasks/schemas.py
```

Expected: no errors.

- [ ] **Step 6: Commit**

```bash
git add cofee_backend/cpv3/modules/tasks/schemas.py cofee_backend/cpv3/modules/tasks/service.py
git commit -m "feat(backend): add SaluteSpeech to task dispatch (ENGINE_MAP + elif branch)"
```

---

### Task 8: Add Direct Endpoint (Optional)

**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/router.py` (after line 145)

- [ ] **Step 1: Add route**

In `cofee_backend/cpv3/modules/transcription/router.py`, add the import at the top alongside existing imports:

```python
from cpv3.modules.transcription.schemas import (
    ...  # existing
    SaluteSpeechParams,
)
from cpv3.modules.transcription.service import (
    ...  # existing
    transcribe_with_salute_speech,
)
```

Then append after the last endpoint (after line 145):

```python


@router.post("/salute-speech/", response_model=Document)
async def salute_speech_transcribe(
    body: SaluteSpeechParams,
    current_user: User = Depends(get_current_user),
    storage: StorageService = Depends(get_storage),
) -> Document:
    _ = current_user
    return await transcribe_with_salute_speech(
        storage,
        file_key=body.file_path,
        language=body.language,
        model=body.model,
    )
```

- [ ] **Step 2: Run lint**

```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/router.py
```

Expected: no errors.

- [ ] **Step 3: Commit**

```bash
git add cofee_backend/cpv3/modules/transcription/router.py
git commit -m "feat(backend): add direct /salute-speech/ transcription endpoint"
```

---

### Task 9: Frontend — TranscriptionModal

**Files:**
- Modify: `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx`

- [ ] **Step 1: Extend type**

At line 17, change:

```typescript
    engine: "whisper" | "google"
```

to:

```typescript
    engine: "whisper" | "google" | "salutespeech"
```

- [ ] **Step 2: Add engine option**

At lines 22-25, change:

```typescript
const ENGINE_OPTIONS = [
    { value: "whisper", label: "Whisper (локальный)" },
    { value: "google", label: "Google Speech" },
]
```

to:

```typescript
const ENGINE_OPTIONS = [
    { value: "whisper", label: "Whisper (локальный)" },
    { value: "google", label: "Google Speech" },
    { value: "salutespeech", label: "SaluteSpeech" },
]
```

- [ ] **Step 3: Split model options**

Rename the existing `MODEL_OPTIONS` (lines 33-38) and add SaluteSpeech models:

```typescript
const WHISPER_MODEL_OPTIONS = [
    { value: "base", label: "Базовая" },
    { value: "small", label: "Малая" },
    { value: "medium", label: "Средняя" },
    { value: "large", label: "Большая" },
]

const SALUTE_MODEL_OPTIONS = [
    { value: "general", label: "Общая" },
    { value: "finance", label: "Финансы" },
    { value: "medicine", label: "Медицина" },
]
```

- [ ] **Step 4: Update model dropdown guard**

At line 162, change the model dropdown conditional from:

```typescript
{engine === "whisper" && (
```

to:

```typescript
{(engine === "whisper" || engine === "salutespeech") && (
```

And inside, change the options reference from `MODEL_OPTIONS` to:

```typescript
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
```

- [ ] **Step 5: Add model reset on engine change**

Find the component function body (after the `useForm` call). Add a `useEffect` that resets the model when engine changes:

```typescript
const engine = watch("engine")

useEffect(() => {
    if (engine === "salutespeech") {
        setValue("model", "general")
    } else if (engine === "whisper") {
        setValue("model", "base")
    }
}, [engine, setValue])
```

Note: `watch` and `setValue` come from `useForm` — check that they're destructured. If `watch("engine")` is already used elsewhere, reuse that variable.

- [ ] **Step 6: Type check**

```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```

Expected: no new errors.

- [ ] **Step 7: Commit**

```bash
git add cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionModal"
```

---

### Task 10: Frontend — TranscriptionSettingsStep

**Files:**
- Modify: `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx`

Apply the same changes as Task 9 to this file (constants are duplicated).

- [ ] **Step 1: Extend type**

At line 22, change:

```typescript
    engine: "whisper" | "google"
```

to:

```typescript
    engine: "whisper" | "google" | "salutespeech"
```

- [ ] **Step 2: Add engine option**

At lines 27-30, change:

```typescript
const ENGINE_OPTIONS = [
    { value: "whisper", label: "Whisper (локальный)" },
    { value: "google", label: "Google Speech" },
]
```

to:

```typescript
const ENGINE_OPTIONS = [
    { value: "whisper", label: "Whisper (локальный)" },
    { value: "google", label: "Google Speech" },
    { value: "salutespeech", label: "SaluteSpeech" },
]
```

- [ ] **Step 3: Split model options**

Rename `MODEL_OPTIONS` (lines 38-43) and add SaluteSpeech models:

```typescript
const WHISPER_MODEL_OPTIONS = [
    { value: "base", label: "Базовая" },
    { value: "small", label: "Малая" },
    { value: "medium", label: "Средняя" },
    { value: "large", label: "Большая" },
]

const SALUTE_MODEL_OPTIONS = [
    { value: "general", label: "Общая" },
    { value: "finance", label: "Финансы" },
    { value: "medicine", label: "Медицина" },
]
```

- [ ] **Step 4: Update model dropdown guard**

At line 263, change:

```typescript
{engine === "whisper" && (
```

to:

```typescript
{(engine === "whisper" || engine === "salutespeech") && (
```

And change the options reference from `MODEL_OPTIONS` to:

```typescript
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
```

- [ ] **Step 5: Add model reset on engine change**

Same `useEffect` as Task 9:

```typescript
const engine = watch("engine")

useEffect(() => {
    if (engine === "salutespeech") {
        setValue("model", "general")
    } else if (engine === "whisper") {
        setValue("model", "base")
    }
}, [engine, setValue])
```

- [ ] **Step 6: Type check**

```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```

Expected: no new errors.

- [ ] **Step 7: Commit**

```bash
git add cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionSettingsStep"
```

---

### Task 11: Final Verification

**Files:** None (verification only)

- [ ] **Step 1: Backend lint**

```bash
cd cofee_backend && uv run ruff check cpv3/ 2>&1 | head -20
```

Expected: no errors.

- [ ] **Step 2: Backend tests**

```bash
cd cofee_backend && uv run pytest 2>&1 | tail -30
```

Expected: all tests pass (including new SaluteSpeech parsing tests).

- [ ] **Step 3: Frontend type check**

```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```

Expected: no new errors.

- [ ] **Step 4: Write verification report**

```
VERIFICATION REPORT
===================
Subproject:  backend + frontend
Level:       base
Type check:  [PASS/FAIL]
Lint:        [PASS/FAIL]
Tests:       [PASS/FAIL] (X passed, Y failed)
Build:       SKIPPED
E2E:         SKIPPED

Files changed: ~10
Status: [READY/NOT READY]
```