Files
remotion_service/docs/superpowers/plans/2026-04-03-salutespeech-transcription.md
T
2026-04-03 23:47:58 +03:00

1214 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SaluteSpeech Transcription Engine — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add SaluteSpeech (Sber) as a third transcription engine with async REST API, domain-specific models, and word-level timestamps.
**Architecture:** Direct integration following existing engine pattern — plain functions in `transcription/service.py`, `if/elif` dispatch in Dramatiq actor, no new abstractions. SaluteSpeech uses a 4-step REST flow (auth → upload → create task → poll → download) with a thread-safe OAuth token cache.
**Tech Stack:** Python, httpx (sync), FastAPI, Dramatiq, React/TypeScript
**Spec:** `docs/superpowers/specs/2026-04-03-salutespeech-transcription-design.md`
---
## File Map
| File | Action | Responsibility |
|------|--------|----------------|
| `cofee_backend/.certs/russian_trusted_root_ca.pem` | Create | Russian CA certificate for TLS |
| `cofee_backend/cpv3/infrastructure/settings.py` | Modify | 3 new SaluteSpeech settings fields |
| `cofee_backend/cpv3/modules/transcription/schemas.py` | Modify | New schema types, extend engine enum + type unions |
| `cofee_backend/cpv3/modules/transcription/service.py` | Modify | ~8 new functions for SaluteSpeech flow |
| `cofee_backend/cpv3/modules/transcription/router.py` | Modify | Direct `/salute-speech/` endpoint |
| `cofee_backend/cpv3/modules/tasks/schemas.py` | Modify | Extend engine Literal |
| `cofee_backend/cpv3/modules/tasks/service.py` | Modify | ENGINE_MAP + elif dispatch branch |
| `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx` | Modify | Engine option, split model options, engine change effect |
| `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx` | Modify | Same as TranscriptionModal |
| `cofee_backend/tests/integration/test_salutespeech_parsing.py` | Create | Unit tests for timestamp parsing + result conversion |
---
### Task 1: Bundle TLS Certificate
**Files:**
- Create: `cofee_backend/.certs/russian_trusted_root_ca.pem`
- [ ] **Step 1: Create `.certs` directory**
```bash
mkdir -p cofee_backend/.certs
```
- [ ] **Step 2: Download the Russian root CA certificate**
```bash
curl -k "https://gu-st.ru/content/Other/doc/russian_trusted_root_ca.cer" \
-o cofee_backend/.certs/russian_trusted_root_ca.pem
```
- [ ] **Step 3: Verify the cert is valid PEM format**
```bash
openssl x509 -in cofee_backend/.certs/russian_trusted_root_ca.pem -noout -subject -dates
```
Expected: prints subject (Russian CA) and validity dates without errors. If the file is DER format instead of PEM, convert:
```bash
openssl x509 -inform DER -in cofee_backend/.certs/russian_trusted_root_ca.pem \
-out cofee_backend/.certs/russian_trusted_root_ca.pem -outform PEM
```
- [ ] **Step 4: Add to `.gitignore` exclusion**
The `.certs/` directory should NOT be gitignored — this is a public root CA, safe to commit. Verify it's not caught by any existing gitignore pattern:
```bash
cd cofee_backend && git check-ignore .certs/russian_trusted_root_ca.pem
```
Expected: no output (not ignored).
- [ ] **Step 5: Commit**
```bash
git add cofee_backend/.certs/russian_trusted_root_ca.pem
git commit -m "chore(backend): bundle Russian root CA cert for SaluteSpeech TLS"
```
---
### Task 2: Add SaluteSpeech Settings
**Files:**
- Modify: `cofee_backend/cpv3/infrastructure/settings.py:97` (after `webhook_base_url` field)
- [ ] **Step 1: Add 3 new fields to Settings class**
In `cofee_backend/cpv3/infrastructure/settings.py`, after the `webhook_base_url` field (line 97) and before `def get_database_url(self)` (line 99), add:
```python
# SaluteSpeech
salute_auth_key: str = Field(default="", alias="SALUTE_AUTH_KEY")
salute_ca_cert_path: Path | None = Field(
default=None, alias="SALUTE_CA_CERT_PATH"
)
salute_scope: str = Field(
default="SALUTE_SPEECH_PERS", alias="SALUTE_SCOPE"
)
```
- [ ] **Step 2: Verify settings load without errors**
```bash
cd cofee_backend && uv run python -c "from cpv3.infrastructure.settings import get_settings; s = get_settings(); print(f'salute_auth_key={s.salute_auth_key!r}, salute_ca_cert_path={s.salute_ca_cert_path!r}, salute_scope={s.salute_scope!r}')"
```
Expected: `salute_auth_key='', salute_ca_cert_path=None, salute_scope='SALUTE_SPEECH_PERS'`
- [ ] **Step 3: Commit**
```bash
git add cofee_backend/cpv3/infrastructure/settings.py
git commit -m "feat(backend): add SaluteSpeech settings (auth key, cert path, scope)"
```
---
### Task 3: Add SaluteSpeech Schemas
**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/schemas.py:10` (engine enum) and after line 147 (EOF, new classes)
- [ ] **Step 1: Extend `TranscriptionEngineEnum`**
In `cofee_backend/cpv3/modules/transcription/schemas.py`, line 10, change:
```python
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD"]
```
to:
```python
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD", "SALUTE_SPEECH"]
```
- [ ] **Step 2: Add SaluteSpeech schema classes**
After the `GoogleSpeechParams` class (line 147, end of file), add:
```python
# ---------------------------------- SaluteSpeech Models ----------------------------------
class SaluteSpeechWord(Schema):
word: str
start: float
end: float
class SaluteSpeechSegment(Schema):
text: str
start: float
end: float
words: list[SaluteSpeechWord] = []
class SaluteSpeechResult(Schema):
text: str
segments: list[SaluteSpeechSegment]
language: str
class SaluteSpeechParams(Schema):
file_path: str
language: str | None = None
model: str = "general"
```
- [ ] **Step 3: Verify schemas import correctly**
```bash
cd cofee_backend && uv run python -c "from cpv3.modules.transcription.schemas import SaluteSpeechWord, SaluteSpeechSegment, SaluteSpeechResult, SaluteSpeechParams, TranscriptionEngineEnum; print('OK')"
```
Expected: `OK`
- [ ] **Step 4: Commit**
```bash
git add cofee_backend/cpv3/modules/transcription/schemas.py
git commit -m "feat(backend): add SaluteSpeech schema types and extend engine enum"
```
---
### Task 4: Extend Type Unions in Service
**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/service.py:44` and `service.py:222` (type unions)
- Modify: `cofee_backend/cpv3/modules/transcription/service.py` imports (top of file)
- [ ] **Step 1: Add SaluteSpeech imports**
In `cofee_backend/cpv3/modules/transcription/service.py`, in the imports from `transcription.schemas` (around line 229243), add `SaluteSpeechSegment` to the import list:
```python
from cpv3.modules.transcription.schemas import (
Document,
GoogleSpeechResult,
GoogleSpeechSegment,
GoogleSpeechWord,
LineNode,
SaluteSpeechSegment,
SegmentNode,
Tag,
TimeRange,
WhisperResult,
WhisperSegment,
WhisperWord,
WordNode,
WordOptions,
)
```
- [ ] **Step 2: Extend `compute_segment_lines` type hint**
At line 44, change:
```python
def compute_segment_lines(
self, segment: WhisperSegment | GoogleSpeechSegment, max_chars_per_line: int
) -> list[LineNode]:
```
to:
```python
def compute_segment_lines(
self,
segment: WhisperSegment | GoogleSpeechSegment | SaluteSpeechSegment,
max_chars_per_line: int,
) -> list[LineNode]:
```
- [ ] **Step 3: Extend `_make_document_from_segments` type hint**
At line 222, change:
```python
def _make_document_from_segments(
builder: DocumentBuilder,
segments: list[WhisperSegment] | list[GoogleSpeechSegment],
*,
max_line_width: int,
) -> Document:
```
to:
```python
def _make_document_from_segments(
builder: DocumentBuilder,
segments: list[WhisperSegment] | list[GoogleSpeechSegment] | list[SaluteSpeechSegment],
*,
max_line_width: int,
) -> Document:
```
- [ ] **Step 4: Run lint to verify**
```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
```
Expected: no errors.
- [ ] **Step 5: Commit**
```bash
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): extend type unions to accept SaluteSpeechSegment"
```
---
### Task 5: Write Tests for SaluteSpeech Parsing
**Files:**
- Create: `cofee_backend/tests/integration/test_salutespeech_parsing.py`
- [ ] **Step 1: Write the test file**
Create `cofee_backend/tests/integration/test_salutespeech_parsing.py`:
```python
"""Tests for SaluteSpeech result parsing and document building."""
from cpv3.modules.transcription.service import (
_build_document_from_salute_result,
_parse_salute_time,
)
class TestParseSaluteTime:
def test_simple_timestamp(self):
assert _parse_salute_time("0.480s") == 0.48
def test_zero(self):
assert _parse_salute_time("0.000s") == 0.0
def test_large_timestamp(self):
assert _parse_salute_time("123.456s") == 123.456
def test_integer_timestamp(self):
assert _parse_salute_time("5s") == 5.0
class TestBuildDocumentFromSaluteResult:
def _make_raw_result(self):
"""Minimal SaluteSpeech API response for testing."""
return [
{
"results": [
{
"text": "привет мир",
"normalized_text": "Привет мир.",
"start": "0.480s",
"end": "1.200s",
"word_alignments": [
{"word": "привет", "start": "0.480s", "end": "0.840s"},
{"word": "мир", "start": "0.960s", "end": "1.200s"},
],
},
{
"text": "это тест",
"normalized_text": "Это тест.",
"start": "1.500s",
"end": "2.100s",
"word_alignments": [
{"word": "это", "start": "1.500s", "end": "1.700s"},
{"word": "тест", "start": "1.800s", "end": "2.100s"},
],
},
],
"channel": 0,
}
]
def test_returns_document_with_segments(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 2
def test_segment_text(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert doc.segments[0].lines[0].text == "привет мир"
def test_word_timestamps(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
first_word = doc.segments[0].lines[0].words[0]
assert first_word.text == "привет"
assert first_word.time.start == 0.48
assert first_word.time.end == 0.84
def test_segment_time_range(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert doc.segments[0].time.start == 0.48
assert doc.segments[0].time.end == 1.2
def test_empty_results(self):
raw = [{"results": [], "channel": 0}]
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 0
def test_missing_word_alignments(self):
raw = [
{
"results": [
{
"text": "привет",
"normalized_text": "Привет.",
"start": "0.000s",
"end": "0.500s",
}
],
"channel": 0,
}
]
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 1
# No words but segment still created
assert doc.segments[0].time.start == 0.0
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v 2>&1 | head -20
```
Expected: `ImportError``_build_document_from_salute_result` and `_parse_salute_time` don't exist yet.
- [ ] **Step 3: Commit test file**
```bash
git add cofee_backend/tests/integration/test_salutespeech_parsing.py
git commit -m "test(backend): add SaluteSpeech parsing and document building tests"
```
---
### Task 6: Implement SaluteSpeech Service Functions
**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/service.py` (append after line 430)
This is the core task — all 8 SaluteSpeech functions.
- [ ] **Step 1: Add new imports at top of file**
In `cofee_backend/cpv3/modules/transcription/service.py`, add these imports at the top (after the existing imports, around line 10):
```python
import threading
import time
import uuid
from pathlib import Path
import httpx
```
Note: `time` may already be imported. Check and avoid duplicates. `asyncio` is already imported. `anyio` is already imported.
Also add to the schema imports block:
```python
from cpv3.modules.transcription.schemas import (
... # existing imports
SaluteSpeechResult,
SaluteSpeechSegment,
SaluteSpeechWord,
SaluteSpeechParams,
)
```
- [ ] **Step 2: Add constants and token cache**
After the existing imports (before the `DocumentBuilder` class), add:
```python
# ---------------------------------- SaluteSpeech Constants ----------------------------------
SALUTE_AUTH_URL = "https://ngw.devices.sberbank.ru:9443/api/v2/oauth"
SALUTE_API_BASE = "https://smartspeech.sber.ru/rest/v1"
SALUTE_POLL_INTERVAL_SECONDS = 5.0
SALUTE_POLL_TIMEOUT_SECONDS = 600
SALUTE_TOKEN_REFRESH_MARGIN_SECONDS = 60
SALUTE_ENCODING_MAP: dict[str, str] = {
".mp3": "MP3",
".wav": "PCM_S16LE",
".ogg": "opus",
".flac": "FLAC",
}
SALUTE_CONTENT_TYPE_MAP: dict[str, str] = {
".mp3": "audio/mpeg",
".wav": "audio/wav",
".ogg": "audio/ogg",
".flac": "audio/flac",
}
SALUTE_LANGUAGE_MAP: dict[str, str] = {
"ru": "ru-RU",
"en": "en-US",
}
ERROR_SALUTE_AUTH_FAILED = "Ошибка авторизации SaluteSpeech: {detail}"
ERROR_SALUTE_UPLOAD_FAILED = "Ошибка загрузки файла в SaluteSpeech: {detail}"
ERROR_SALUTE_TASK_FAILED = "Ошибка распознавания SaluteSpeech: {detail}"
ERROR_SALUTE_TIMEOUT = "Превышено время ожидания распознавания SaluteSpeech"
ERROR_SALUTE_UNSUPPORTED_FORMAT = "Неподдерживаемый формат аудио для SaluteSpeech: {ext}"
_salute_token_lock = threading.Lock()
_salute_token: str | None = None
_salute_token_expires_at: float = 0.0
```
- [ ] **Step 3: Add helper functions**
After the end of file (after `transcribe_with_google_speech`), append all SaluteSpeech functions:
```python
# ---------------------------------- SaluteSpeech Engine ----------------------------------
def _parse_salute_time(s: str) -> float:
"""Parse SaluteSpeech timestamp string '0.480s' → 0.48."""
return float(s.rstrip("s"))
def _get_salute_access_token(client: httpx.Client) -> str:
"""Get or refresh SaluteSpeech OAuth token. Thread-safe."""
global _salute_token, _salute_token_expires_at
with _salute_token_lock:
if _salute_token and time.monotonic() < (
_salute_token_expires_at - SALUTE_TOKEN_REFRESH_MARGIN_SECONDS
):
return _salute_token
settings = get_settings()
response = client.post(
SALUTE_AUTH_URL,
headers={
"Authorization": f"Basic {settings.salute_auth_key}",
"RqUID": str(uuid.uuid4()),
"Content-Type": "application/x-www-form-urlencoded",
},
content=f"scope={settings.salute_scope}",
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_AUTH_FAILED.format(detail=response.text[:200])
)
data = response.json()
_salute_token = data["access_token"]
expires_in_seconds = (data["expires_at"] / 1000) - time.time()
_salute_token_expires_at = time.monotonic() + expires_in_seconds
return _salute_token
def _upload_salute_audio(
client: httpx.Client, token: str, audio_data: bytes, content_type: str
) -> str:
"""Upload audio to SaluteSpeech, return request_file_id."""
response = client.post(
f"{SALUTE_API_BASE}/data:upload",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": content_type,
},
content=audio_data,
timeout=120.0,
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_UPLOAD_FAILED.format(detail=response.text[:200])
)
return response.json()["result"]["request_file_id"]
def _create_salute_task(
client: httpx.Client,
token: str,
file_id: str,
*,
language: str,
model: str,
audio_encoding: str,
sample_rate: int,
) -> str:
"""Create async recognition task, return task_id."""
body = {
"options": {
"audio_encoding": audio_encoding,
"sample_rate": sample_rate,
"language": language,
"model": model,
"channels_count": 1,
"hypotheses_count": 1,
},
"request_file_id": file_id,
}
response = client.post(
f"{SALUTE_API_BASE}/speech:async_recognize",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
},
json=body,
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_TASK_FAILED.format(detail=response.text[:200])
)
return response.json()["result"]["id"]
def _poll_salute_task(
client: httpx.Client,
token: str,
task_id: str,
job_uuid: uuid.UUID | None,
on_progress: ProgressCallback | None,
) -> str:
"""Poll task until DONE, return response_file_id. Checks job cancellation each iteration."""
from cpv3.modules.tasks.service import _raise_if_job_cancelled
start = time.monotonic()
while True:
elapsed = time.monotonic() - start
if elapsed > SALUTE_POLL_TIMEOUT_SECONDS:
raise TimeoutError(ERROR_SALUTE_TIMEOUT)
if job_uuid is not None:
_raise_if_job_cancelled(job_uuid)
response = client.get(
f"{SALUTE_API_BASE}/task:get",
params={"id": task_id},
headers={"Authorization": f"Bearer {token}"},
)
response.raise_for_status()
result = response.json()["result"]
status = result["status"]
if status == "DONE":
return result["response_file_id"]
if status == "ERROR":
error_msg = result.get("error", "unknown error")
raise RuntimeError(
ERROR_SALUTE_TASK_FAILED.format(detail=error_msg)
)
if on_progress is not None:
pct = min(elapsed / SALUTE_POLL_TIMEOUT_SECONDS * 100, 95.0)
on_progress(pct)
time.sleep(SALUTE_POLL_INTERVAL_SECONDS)
def _download_salute_result(
client: httpx.Client, token: str, response_file_id: str
) -> list[dict]:
"""Download recognition result JSON."""
response = client.get(
f"{SALUTE_API_BASE}/data:download",
params={"response_file_id": response_file_id},
headers={"Authorization": f"Bearer {token}"},
timeout=60.0,
)
response.raise_for_status()
return response.json()
def _build_document_from_salute_result(
raw_channels: list[dict], *, language: str
) -> Document:
"""Convert SaluteSpeech result JSON to Document."""
builder = DocumentBuilder()
words_options = WordOptions()
all_segments: list[SaluteSpeechSegment] = []
for channel_data in raw_channels:
for result_item in channel_data.get("results", []):
word_alignments = result_item.get("word_alignments", [])
words = [
SaluteSpeechWord(
word=w["word"],
start=_parse_salute_time(w["start"]),
end=_parse_salute_time(w["end"]),
)
for w in word_alignments
]
text = result_item.get("text", "")
seg_start = _parse_salute_time(result_item["start"]) if words else 0.0
seg_end = _parse_salute_time(result_item["end"]) if words else 0.0
all_segments.append(
SaluteSpeechSegment(
text=text,
start=seg_start,
end=seg_end,
words=words,
)
)
document = _make_document_from_segments(
builder, all_segments, max_line_width=words_options.max_line_width
)
return builder.process_document(document)
def _salute_transcribe_sync(
*,
local_file_path: str,
language: str | None,
model: str,
sample_rate: int,
job_id: uuid.UUID | None = None,
on_progress: ProgressCallback | None = None,
) -> Document:
"""Synchronous SaluteSpeech transcription (runs in Dramatiq worker thread)."""
settings = get_settings()
ext = Path(local_file_path).suffix.lower()
audio_encoding = SALUTE_ENCODING_MAP.get(ext)
content_type = SALUTE_CONTENT_TYPE_MAP.get(ext)
if not audio_encoding or not content_type:
raise ValueError(ERROR_SALUTE_UNSUPPORTED_FORMAT.format(ext=ext))
salute_language = SALUTE_LANGUAGE_MAP.get(language or "", "ru-RU")
verify = str(settings.salute_ca_cert_path) if settings.salute_ca_cert_path else True
with httpx.Client(verify=verify, timeout=30.0) as client:
token = _get_salute_access_token(client)
with open(local_file_path, "rb") as f:
audio_data = f.read()
file_id = _upload_salute_audio(client, token, audio_data, content_type)
task_id = _create_salute_task(
client,
token,
file_id,
language=salute_language,
model=model,
audio_encoding=audio_encoding,
sample_rate=sample_rate,
)
response_file_id = _poll_salute_task(
client, token, task_id, job_id, on_progress
)
raw_result = _download_salute_result(client, token, response_file_id)
return _build_document_from_salute_result(raw_result, language=salute_language)
async def transcribe_with_salute_speech(
storage: StorageService,
*,
file_key: str,
language: str | None = None,
model: str = "general",
sample_rate: int = 16000,
job_id: uuid.UUID | None = None,
on_progress: ProgressCallback | None = None,
) -> Document:
"""Async wrapper for SaluteSpeech transcription."""
tmp = await storage.download_to_temp(file_key)
try:
return await anyio.to_thread.run_sync(
lambda: _salute_transcribe_sync(
local_file_path=tmp.path,
language=language,
model=model,
sample_rate=sample_rate,
job_id=job_id,
on_progress=on_progress,
)
)
finally:
tmp.cleanup()
```
- [ ] **Step 4: Run the parsing tests**
```bash
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v
```
Expected: all tests pass.
- [ ] **Step 5: Run lint**
```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
```
Expected: no errors.
- [ ] **Step 6: Commit**
```bash
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): implement SaluteSpeech transcription engine (8 functions)"
```
---
### Task 7: Add Task Dispatch
**Files:**
- Modify: `cofee_backend/cpv3/modules/tasks/schemas.py:86` (engine Literal)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:88-91` (ENGINE_MAP)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:613-616` (actor import)
- Modify: `cofee_backend/cpv3/modules/tasks/service.py:700` (elif branch)
- [ ] **Step 1: Extend engine Literal in task schema**
In `cofee_backend/cpv3/modules/tasks/schemas.py`, line 86, change:
```python
engine: Literal["whisper", "google"] = Field(
```
to:
```python
engine: Literal["whisper", "google", "salutespeech"] = Field(
```
- [ ] **Step 2: Add to ENGINE_MAP**
In `cofee_backend/cpv3/modules/tasks/service.py`, lines 88-91, change:
```python
ENGINE_MAP: dict[str, str] = {
"whisper": "LOCAL_WHISPER",
"google": "GOOGLE_SPEECH_CLOUD",
}
```
to:
```python
ENGINE_MAP: dict[str, str] = {
"whisper": "LOCAL_WHISPER",
"google": "GOOGLE_SPEECH_CLOUD",
"salutespeech": "SALUTE_SPEECH",
}
```
- [ ] **Step 3: Add import in actor**
In `cofee_backend/cpv3/modules/tasks/service.py`, inside `transcription_generate_actor` (lines 613-616), change:
```python
from cpv3.modules.transcription.service import (
transcribe_with_google_speech,
transcribe_with_whisper,
)
```
to:
```python
from cpv3.modules.transcription.service import (
transcribe_with_google_speech,
transcribe_with_salute_speech,
transcribe_with_whisper,
)
```
- [ ] **Step 4: Add elif dispatch branch**
In `cofee_backend/cpv3/modules/tasks/service.py`, after the Google branch (after line 700, before the `else:`), add:
```python
elif engine == "salutespeech":
# Extract sample rate from probe if available
audio_stream = next(
(s for s in probe.streams if s.codec_type == "audio"), None
)
sr = int(audio_stream.sample_rate) if audio_stream and audio_stream.sample_rate else 16000
document = _run_async(
transcribe_with_salute_speech(
storage,
file_key=file_key,
language=language,
model=model,
sample_rate=sr,
job_id=job_uuid,
on_progress=_on_whisper_progress,
)
)
```
- [ ] **Step 5: Run lint**
```bash
cd cofee_backend && uv run ruff check cpv3/modules/tasks/service.py cpv3/modules/tasks/schemas.py
```
Expected: no errors.
- [ ] **Step 6: Commit**
```bash
git add cofee_backend/cpv3/modules/tasks/schemas.py cofee_backend/cpv3/modules/tasks/service.py
git commit -m "feat(backend): add SaluteSpeech to task dispatch (ENGINE_MAP + elif branch)"
```
---
### Task 8: Add Direct Endpoint (Optional)
**Files:**
- Modify: `cofee_backend/cpv3/modules/transcription/router.py` (after line 145)
- [ ] **Step 1: Add route**
In `cofee_backend/cpv3/modules/transcription/router.py`, add the import at the top alongside existing imports:
```python
from cpv3.modules.transcription.schemas import (
... # existing
SaluteSpeechParams,
)
from cpv3.modules.transcription.service import (
... # existing
transcribe_with_salute_speech,
)
```
Then append after the last endpoint (after line 145):
```python
@router.post("/salute-speech/", response_model=Document)
async def salute_speech_transcribe(
body: SaluteSpeechParams,
current_user: User = Depends(get_current_user),
storage: StorageService = Depends(get_storage),
) -> Document:
_ = current_user
return await transcribe_with_salute_speech(
storage,
file_key=body.file_path,
language=body.language,
model=body.model,
)
```
- [ ] **Step 2: Run lint**
```bash
cd cofee_backend && uv run ruff check cpv3/modules/transcription/router.py
```
Expected: no errors.
- [ ] **Step 3: Commit**
```bash
git add cofee_backend/cpv3/modules/transcription/router.py
git commit -m "feat(backend): add direct /salute-speech/ transcription endpoint"
```
---
### Task 9: Frontend — TranscriptionModal
**Files:**
- Modify: `cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx`
- [ ] **Step 1: Extend type**
At line 17, change:
```typescript
engine: "whisper" | "google"
```
to:
```typescript
engine: "whisper" | "google" | "salutespeech"
```
- [ ] **Step 2: Add engine option**
At lines 22-25, change:
```typescript
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
]
```
to:
```typescript
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
{ value: "salutespeech", label: "SaluteSpeech" },
]
```
- [ ] **Step 3: Split model options**
Rename the existing `MODEL_OPTIONS` (lines 33-38) and add SaluteSpeech models:
```typescript
const WHISPER_MODEL_OPTIONS = [
{ value: "base", label: "Базовая" },
{ value: "small", label: "Малая" },
{ value: "medium", label: "Средняя" },
{ value: "large", label: "Большая" },
]
const SALUTE_MODEL_OPTIONS = [
{ value: "general", label: "Общая" },
{ value: "finance", label: "Финансы" },
{ value: "medicine", label: "Медицина" },
]
```
- [ ] **Step 4: Update model dropdown guard**
At line 162, change the model dropdown conditional from:
```typescript
{engine === "whisper" && (
```
to:
```typescript
{(engine === "whisper" || engine === "salutespeech") && (
```
And inside, change the options reference from `MODEL_OPTIONS` to:
```typescript
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
```
- [ ] **Step 5: Add model reset on engine change**
Find the component function body (after the `useForm` call). Add a `useEffect` that resets the model when engine changes:
```typescript
const engine = watch("engine")
useEffect(() => {
if (engine === "salutespeech") {
setValue("model", "general")
} else if (engine === "whisper") {
setValue("model", "base")
}
}, [engine, setValue])
```
Note: `watch` and `setValue` come from `useForm` — check that they're destructured. If `watch("engine")` is already used elsewhere, reuse that variable.
- [ ] **Step 6: Type check**
```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```
Expected: no new errors.
- [ ] **Step 7: Commit**
```bash
git add cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionModal"
```
---
### Task 10: Frontend — TranscriptionSettingsStep
**Files:**
- Modify: `cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx`
Apply the same changes as Task 9 to this file (constants are duplicated).
- [ ] **Step 1: Extend type**
At line 22, change:
```typescript
engine: "whisper" | "google"
```
to:
```typescript
engine: "whisper" | "google" | "salutespeech"
```
- [ ] **Step 2: Add engine option**
At lines 27-30, change:
```typescript
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
]
```
to:
```typescript
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
{ value: "salutespeech", label: "SaluteSpeech" },
]
```
- [ ] **Step 3: Split model options**
Rename `MODEL_OPTIONS` (lines 38-43) and add SaluteSpeech models:
```typescript
const WHISPER_MODEL_OPTIONS = [
{ value: "base", label: "Базовая" },
{ value: "small", label: "Малая" },
{ value: "medium", label: "Средняя" },
{ value: "large", label: "Большая" },
]
const SALUTE_MODEL_OPTIONS = [
{ value: "general", label: "Общая" },
{ value: "finance", label: "Финансы" },
{ value: "medicine", label: "Медицина" },
]
```
- [ ] **Step 4: Update model dropdown guard**
At line 263, change:
```typescript
{engine === "whisper" && (
```
to:
```typescript
{(engine === "whisper" || engine === "salutespeech") && (
```
And change the options reference from `MODEL_OPTIONS` to:
```typescript
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
```
- [ ] **Step 5: Add model reset on engine change**
Same `useEffect` as Task 9:
```typescript
const engine = watch("engine")
useEffect(() => {
if (engine === "salutespeech") {
setValue("model", "general")
} else if (engine === "whisper") {
setValue("model", "base")
}
}, [engine, setValue])
```
- [ ] **Step 6: Type check**
```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```
Expected: no new errors.
- [ ] **Step 7: Commit**
```bash
git add cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionSettingsStep"
```
---
### Task 11: Final Verification
**Files:** None (verification only)
- [ ] **Step 1: Backend lint**
```bash
cd cofee_backend && uv run ruff check cpv3/ 2>&1 | head -20
```
Expected: no errors.
- [ ] **Step 2: Backend tests**
```bash
cd cofee_backend && uv run pytest 2>&1 | tail -30
```
Expected: all tests pass (including new SaluteSpeech parsing tests).
- [ ] **Step 3: Frontend type check**
```bash
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
```
Expected: no new errors.
- [ ] **Step 4: Write verification report**
```
VERIFICATION REPORT
===================
Subproject: backend + frontend
Level: base
Type check: [PASS/FAIL]
Lint: [PASS/FAIL]
Tests: [PASS/FAIL] (X passed, Y failed)
Build: SKIPPED
E2E: SKIPPED
Files changed: ~10
Status: [READY/NOT READY]
```