Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34 KiB
SaluteSpeech Transcription Engine — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add SaluteSpeech (Sber) as a third transcription engine with async REST API, domain-specific models, and word-level timestamps.
Architecture: Direct integration following existing engine pattern — plain functions in transcription/service.py, if/elif dispatch in Dramatiq actor, no new abstractions. SaluteSpeech uses a 4-step REST flow (auth → upload → create task → poll → download) with a thread-safe OAuth token cache.
Tech Stack: Python, httpx (sync), FastAPI, Dramatiq, React/TypeScript
Spec: docs/superpowers/specs/2026-04-03-salutespeech-transcription-design.md
File Map
| File | Action | Responsibility |
|---|---|---|
cofee_backend/.certs/russian_trusted_root_ca.pem |
Create | Russian CA certificate for TLS |
cofee_backend/cpv3/infrastructure/settings.py |
Modify | 3 new SaluteSpeech settings fields |
cofee_backend/cpv3/modules/transcription/schemas.py |
Modify | New schema types, extend engine enum + type unions |
cofee_backend/cpv3/modules/transcription/service.py |
Modify | ~8 new functions for SaluteSpeech flow |
cofee_backend/cpv3/modules/transcription/router.py |
Modify | Direct /salute-speech/ endpoint |
cofee_backend/cpv3/modules/tasks/schemas.py |
Modify | Extend engine Literal |
cofee_backend/cpv3/modules/tasks/service.py |
Modify | ENGINE_MAP + elif dispatch branch |
cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx |
Modify | Engine option, split model options, engine change effect |
cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx |
Modify | Same as TranscriptionModal |
cofee_backend/tests/integration/test_salutespeech_parsing.py |
Create | Unit tests for timestamp parsing + result conversion |
Task 1: Bundle TLS Certificate
Files:
-
Create:
cofee_backend/.certs/russian_trusted_root_ca.pem -
Step 1: Create
.certsdirectory
mkdir -p cofee_backend/.certs
- Step 2: Download the Russian root CA certificate
curl -k "https://gu-st.ru/content/Other/doc/russian_trusted_root_ca.cer" \
-o cofee_backend/.certs/russian_trusted_root_ca.pem
- Step 3: Verify the cert is valid PEM format
openssl x509 -in cofee_backend/.certs/russian_trusted_root_ca.pem -noout -subject -dates
Expected: prints subject (Russian CA) and validity dates without errors. If the file is DER format instead of PEM, convert:
openssl x509 -inform DER -in cofee_backend/.certs/russian_trusted_root_ca.pem \
-out cofee_backend/.certs/russian_trusted_root_ca.pem -outform PEM
- Step 4: Add to
.gitignoreexclusion
The .certs/ directory should NOT be gitignored — this is a public root CA, safe to commit. Verify it's not caught by any existing gitignore pattern:
cd cofee_backend && git check-ignore .certs/russian_trusted_root_ca.pem
Expected: no output (not ignored).
- Step 5: Commit
git add cofee_backend/.certs/russian_trusted_root_ca.pem
git commit -m "chore(backend): bundle Russian root CA cert for SaluteSpeech TLS"
Task 2: Add SaluteSpeech Settings
Files:
-
Modify:
cofee_backend/cpv3/infrastructure/settings.py:97(afterwebhook_base_urlfield) -
Step 1: Add 3 new fields to Settings class
In cofee_backend/cpv3/infrastructure/settings.py, after the webhook_base_url field (line 97) and before def get_database_url(self) (line 99), add:
# SaluteSpeech
salute_auth_key: str = Field(default="", alias="SALUTE_AUTH_KEY")
salute_ca_cert_path: Path | None = Field(
default=None, alias="SALUTE_CA_CERT_PATH"
)
salute_scope: str = Field(
default="SALUTE_SPEECH_PERS", alias="SALUTE_SCOPE"
)
- Step 2: Verify settings load without errors
cd cofee_backend && uv run python -c "from cpv3.infrastructure.settings import get_settings; s = get_settings(); print(f'salute_auth_key={s.salute_auth_key!r}, salute_ca_cert_path={s.salute_ca_cert_path!r}, salute_scope={s.salute_scope!r}')"
Expected: salute_auth_key='', salute_ca_cert_path=None, salute_scope='SALUTE_SPEECH_PERS'
- Step 3: Commit
git add cofee_backend/cpv3/infrastructure/settings.py
git commit -m "feat(backend): add SaluteSpeech settings (auth key, cert path, scope)"
Task 3: Add SaluteSpeech Schemas
Files:
-
Modify:
cofee_backend/cpv3/modules/transcription/schemas.py:10(engine enum) and after line 147 (EOF, new classes) -
Step 1: Extend
TranscriptionEngineEnum
In cofee_backend/cpv3/modules/transcription/schemas.py, line 10, change:
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD"]
to:
TranscriptionEngineEnum = Literal["LOCAL_WHISPER", "GOOGLE_SPEECH_CLOUD", "SALUTE_SPEECH"]
- Step 2: Add SaluteSpeech schema classes
After the GoogleSpeechParams class (line 147, end of file), add:
# ---------------------------------- SaluteSpeech Models ----------------------------------
class SaluteSpeechWord(Schema):
word: str
start: float
end: float
class SaluteSpeechSegment(Schema):
text: str
start: float
end: float
words: list[SaluteSpeechWord] = []
class SaluteSpeechResult(Schema):
text: str
segments: list[SaluteSpeechSegment]
language: str
class SaluteSpeechParams(Schema):
file_path: str
language: str | None = None
model: str = "general"
- Step 3: Verify schemas import correctly
cd cofee_backend && uv run python -c "from cpv3.modules.transcription.schemas import SaluteSpeechWord, SaluteSpeechSegment, SaluteSpeechResult, SaluteSpeechParams, TranscriptionEngineEnum; print('OK')"
Expected: OK
- Step 4: Commit
git add cofee_backend/cpv3/modules/transcription/schemas.py
git commit -m "feat(backend): add SaluteSpeech schema types and extend engine enum"
Task 4: Extend Type Unions in Service
Files:
-
Modify:
cofee_backend/cpv3/modules/transcription/service.py:44andservice.py:222(type unions) -
Modify:
cofee_backend/cpv3/modules/transcription/service.pyimports (top of file) -
Step 1: Add SaluteSpeech imports
In cofee_backend/cpv3/modules/transcription/service.py, in the imports from transcription.schemas (around line 229–243), add SaluteSpeechSegment to the import list:
from cpv3.modules.transcription.schemas import (
Document,
GoogleSpeechResult,
GoogleSpeechSegment,
GoogleSpeechWord,
LineNode,
SaluteSpeechSegment,
SegmentNode,
Tag,
TimeRange,
WhisperResult,
WhisperSegment,
WhisperWord,
WordNode,
WordOptions,
)
- Step 2: Extend
compute_segment_linestype hint
At line 44, change:
def compute_segment_lines(
self, segment: WhisperSegment | GoogleSpeechSegment, max_chars_per_line: int
) -> list[LineNode]:
to:
def compute_segment_lines(
self,
segment: WhisperSegment | GoogleSpeechSegment | SaluteSpeechSegment,
max_chars_per_line: int,
) -> list[LineNode]:
- Step 3: Extend
_make_document_from_segmentstype hint
At line 222, change:
def _make_document_from_segments(
builder: DocumentBuilder,
segments: list[WhisperSegment] | list[GoogleSpeechSegment],
*,
max_line_width: int,
) -> Document:
to:
def _make_document_from_segments(
builder: DocumentBuilder,
segments: list[WhisperSegment] | list[GoogleSpeechSegment] | list[SaluteSpeechSegment],
*,
max_line_width: int,
) -> Document:
- Step 4: Run lint to verify
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
Expected: no errors.
- Step 5: Commit
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): extend type unions to accept SaluteSpeechSegment"
Task 5: Write Tests for SaluteSpeech Parsing
Files:
-
Create:
cofee_backend/tests/integration/test_salutespeech_parsing.py -
Step 1: Write the test file
Create cofee_backend/tests/integration/test_salutespeech_parsing.py:
"""Tests for SaluteSpeech result parsing and document building."""
from cpv3.modules.transcription.service import (
_build_document_from_salute_result,
_parse_salute_time,
)
class TestParseSaluteTime:
def test_simple_timestamp(self):
assert _parse_salute_time("0.480s") == 0.48
def test_zero(self):
assert _parse_salute_time("0.000s") == 0.0
def test_large_timestamp(self):
assert _parse_salute_time("123.456s") == 123.456
def test_integer_timestamp(self):
assert _parse_salute_time("5s") == 5.0
class TestBuildDocumentFromSaluteResult:
def _make_raw_result(self):
"""Minimal SaluteSpeech API response for testing."""
return [
{
"results": [
{
"text": "привет мир",
"normalized_text": "Привет мир.",
"start": "0.480s",
"end": "1.200s",
"word_alignments": [
{"word": "привет", "start": "0.480s", "end": "0.840s"},
{"word": "мир", "start": "0.960s", "end": "1.200s"},
],
},
{
"text": "это тест",
"normalized_text": "Это тест.",
"start": "1.500s",
"end": "2.100s",
"word_alignments": [
{"word": "это", "start": "1.500s", "end": "1.700s"},
{"word": "тест", "start": "1.800s", "end": "2.100s"},
],
},
],
"channel": 0,
}
]
def test_returns_document_with_segments(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 2
def test_segment_text(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert doc.segments[0].lines[0].text == "привет мир"
def test_word_timestamps(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
first_word = doc.segments[0].lines[0].words[0]
assert first_word.text == "привет"
assert first_word.time.start == 0.48
assert first_word.time.end == 0.84
def test_segment_time_range(self):
raw = self._make_raw_result()
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert doc.segments[0].time.start == 0.48
assert doc.segments[0].time.end == 1.2
def test_empty_results(self):
raw = [{"results": [], "channel": 0}]
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 0
def test_missing_word_alignments(self):
raw = [
{
"results": [
{
"text": "привет",
"normalized_text": "Привет.",
"start": "0.000s",
"end": "0.500s",
}
],
"channel": 0,
}
]
doc = _build_document_from_salute_result(raw, language="ru-RU")
assert len(doc.segments) == 1
# No words but segment still created
assert doc.segments[0].time.start == 0.0
- Step 2: Run tests to verify they fail
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v 2>&1 | head -20
Expected: ImportError — _build_document_from_salute_result and _parse_salute_time don't exist yet.
- Step 3: Commit test file
git add cofee_backend/tests/integration/test_salutespeech_parsing.py
git commit -m "test(backend): add SaluteSpeech parsing and document building tests"
Task 6: Implement SaluteSpeech Service Functions
Files:
- Modify:
cofee_backend/cpv3/modules/transcription/service.py(append after line 430)
This is the core task — all 8 SaluteSpeech functions.
- Step 1: Add new imports at top of file
In cofee_backend/cpv3/modules/transcription/service.py, add these imports at the top (after the existing imports, around line 10):
import threading
import time
import uuid
from pathlib import Path
import httpx
Note: time may already be imported. Check and avoid duplicates. asyncio is already imported. anyio is already imported.
Also add to the schema imports block:
from cpv3.modules.transcription.schemas import (
... # existing imports
SaluteSpeechResult,
SaluteSpeechSegment,
SaluteSpeechWord,
SaluteSpeechParams,
)
- Step 2: Add constants and token cache
After the existing imports (before the DocumentBuilder class), add:
# ---------------------------------- SaluteSpeech Constants ----------------------------------
SALUTE_AUTH_URL = "https://ngw.devices.sberbank.ru:9443/api/v2/oauth"
SALUTE_API_BASE = "https://smartspeech.sber.ru/rest/v1"
SALUTE_POLL_INTERVAL_SECONDS = 5.0
SALUTE_POLL_TIMEOUT_SECONDS = 600
SALUTE_TOKEN_REFRESH_MARGIN_SECONDS = 60
SALUTE_ENCODING_MAP: dict[str, str] = {
".mp3": "MP3",
".wav": "PCM_S16LE",
".ogg": "opus",
".flac": "FLAC",
}
SALUTE_CONTENT_TYPE_MAP: dict[str, str] = {
".mp3": "audio/mpeg",
".wav": "audio/wav",
".ogg": "audio/ogg",
".flac": "audio/flac",
}
SALUTE_LANGUAGE_MAP: dict[str, str] = {
"ru": "ru-RU",
"en": "en-US",
}
ERROR_SALUTE_AUTH_FAILED = "Ошибка авторизации SaluteSpeech: {detail}"
ERROR_SALUTE_UPLOAD_FAILED = "Ошибка загрузки файла в SaluteSpeech: {detail}"
ERROR_SALUTE_TASK_FAILED = "Ошибка распознавания SaluteSpeech: {detail}"
ERROR_SALUTE_TIMEOUT = "Превышено время ожидания распознавания SaluteSpeech"
ERROR_SALUTE_UNSUPPORTED_FORMAT = "Неподдерживаемый формат аудио для SaluteSpeech: {ext}"
_salute_token_lock = threading.Lock()
_salute_token: str | None = None
_salute_token_expires_at: float = 0.0
- Step 3: Add helper functions
After the end of file (after transcribe_with_google_speech), append all SaluteSpeech functions:
# ---------------------------------- SaluteSpeech Engine ----------------------------------
def _parse_salute_time(s: str) -> float:
"""Parse SaluteSpeech timestamp string '0.480s' → 0.48."""
return float(s.rstrip("s"))
def _get_salute_access_token(client: httpx.Client) -> str:
"""Get or refresh SaluteSpeech OAuth token. Thread-safe."""
global _salute_token, _salute_token_expires_at
with _salute_token_lock:
if _salute_token and time.monotonic() < (
_salute_token_expires_at - SALUTE_TOKEN_REFRESH_MARGIN_SECONDS
):
return _salute_token
settings = get_settings()
response = client.post(
SALUTE_AUTH_URL,
headers={
"Authorization": f"Basic {settings.salute_auth_key}",
"RqUID": str(uuid.uuid4()),
"Content-Type": "application/x-www-form-urlencoded",
},
content=f"scope={settings.salute_scope}",
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_AUTH_FAILED.format(detail=response.text[:200])
)
data = response.json()
_salute_token = data["access_token"]
expires_in_seconds = (data["expires_at"] / 1000) - time.time()
_salute_token_expires_at = time.monotonic() + expires_in_seconds
return _salute_token
def _upload_salute_audio(
client: httpx.Client, token: str, audio_data: bytes, content_type: str
) -> str:
"""Upload audio to SaluteSpeech, return request_file_id."""
response = client.post(
f"{SALUTE_API_BASE}/data:upload",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": content_type,
},
content=audio_data,
timeout=120.0,
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_UPLOAD_FAILED.format(detail=response.text[:200])
)
return response.json()["result"]["request_file_id"]
def _create_salute_task(
client: httpx.Client,
token: str,
file_id: str,
*,
language: str,
model: str,
audio_encoding: str,
sample_rate: int,
) -> str:
"""Create async recognition task, return task_id."""
body = {
"options": {
"audio_encoding": audio_encoding,
"sample_rate": sample_rate,
"language": language,
"model": model,
"channels_count": 1,
"hypotheses_count": 1,
},
"request_file_id": file_id,
}
response = client.post(
f"{SALUTE_API_BASE}/speech:async_recognize",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
},
json=body,
)
if response.status_code != 200:
raise RuntimeError(
ERROR_SALUTE_TASK_FAILED.format(detail=response.text[:200])
)
return response.json()["result"]["id"]
def _poll_salute_task(
client: httpx.Client,
token: str,
task_id: str,
job_uuid: uuid.UUID | None,
on_progress: ProgressCallback | None,
) -> str:
"""Poll task until DONE, return response_file_id. Checks job cancellation each iteration."""
from cpv3.modules.tasks.service import _raise_if_job_cancelled
start = time.monotonic()
while True:
elapsed = time.monotonic() - start
if elapsed > SALUTE_POLL_TIMEOUT_SECONDS:
raise TimeoutError(ERROR_SALUTE_TIMEOUT)
if job_uuid is not None:
_raise_if_job_cancelled(job_uuid)
response = client.get(
f"{SALUTE_API_BASE}/task:get",
params={"id": task_id},
headers={"Authorization": f"Bearer {token}"},
)
response.raise_for_status()
result = response.json()["result"]
status = result["status"]
if status == "DONE":
return result["response_file_id"]
if status == "ERROR":
error_msg = result.get("error", "unknown error")
raise RuntimeError(
ERROR_SALUTE_TASK_FAILED.format(detail=error_msg)
)
if on_progress is not None:
pct = min(elapsed / SALUTE_POLL_TIMEOUT_SECONDS * 100, 95.0)
on_progress(pct)
time.sleep(SALUTE_POLL_INTERVAL_SECONDS)
def _download_salute_result(
client: httpx.Client, token: str, response_file_id: str
) -> list[dict]:
"""Download recognition result JSON."""
response = client.get(
f"{SALUTE_API_BASE}/data:download",
params={"response_file_id": response_file_id},
headers={"Authorization": f"Bearer {token}"},
timeout=60.0,
)
response.raise_for_status()
return response.json()
def _build_document_from_salute_result(
raw_channels: list[dict], *, language: str
) -> Document:
"""Convert SaluteSpeech result JSON to Document."""
builder = DocumentBuilder()
words_options = WordOptions()
all_segments: list[SaluteSpeechSegment] = []
for channel_data in raw_channels:
for result_item in channel_data.get("results", []):
word_alignments = result_item.get("word_alignments", [])
words = [
SaluteSpeechWord(
word=w["word"],
start=_parse_salute_time(w["start"]),
end=_parse_salute_time(w["end"]),
)
for w in word_alignments
]
text = result_item.get("text", "")
seg_start = _parse_salute_time(result_item["start"]) if words else 0.0
seg_end = _parse_salute_time(result_item["end"]) if words else 0.0
all_segments.append(
SaluteSpeechSegment(
text=text,
start=seg_start,
end=seg_end,
words=words,
)
)
document = _make_document_from_segments(
builder, all_segments, max_line_width=words_options.max_line_width
)
return builder.process_document(document)
def _salute_transcribe_sync(
*,
local_file_path: str,
language: str | None,
model: str,
sample_rate: int,
job_id: uuid.UUID | None = None,
on_progress: ProgressCallback | None = None,
) -> Document:
"""Synchronous SaluteSpeech transcription (runs in Dramatiq worker thread)."""
settings = get_settings()
ext = Path(local_file_path).suffix.lower()
audio_encoding = SALUTE_ENCODING_MAP.get(ext)
content_type = SALUTE_CONTENT_TYPE_MAP.get(ext)
if not audio_encoding or not content_type:
raise ValueError(ERROR_SALUTE_UNSUPPORTED_FORMAT.format(ext=ext))
salute_language = SALUTE_LANGUAGE_MAP.get(language or "", "ru-RU")
verify = str(settings.salute_ca_cert_path) if settings.salute_ca_cert_path else True
with httpx.Client(verify=verify, timeout=30.0) as client:
token = _get_salute_access_token(client)
with open(local_file_path, "rb") as f:
audio_data = f.read()
file_id = _upload_salute_audio(client, token, audio_data, content_type)
task_id = _create_salute_task(
client,
token,
file_id,
language=salute_language,
model=model,
audio_encoding=audio_encoding,
sample_rate=sample_rate,
)
response_file_id = _poll_salute_task(
client, token, task_id, job_id, on_progress
)
raw_result = _download_salute_result(client, token, response_file_id)
return _build_document_from_salute_result(raw_result, language=salute_language)
async def transcribe_with_salute_speech(
storage: StorageService,
*,
file_key: str,
language: str | None = None,
model: str = "general",
sample_rate: int = 16000,
job_id: uuid.UUID | None = None,
on_progress: ProgressCallback | None = None,
) -> Document:
"""Async wrapper for SaluteSpeech transcription."""
tmp = await storage.download_to_temp(file_key)
try:
return await anyio.to_thread.run_sync(
lambda: _salute_transcribe_sync(
local_file_path=tmp.path,
language=language,
model=model,
sample_rate=sample_rate,
job_id=job_id,
on_progress=on_progress,
)
)
finally:
tmp.cleanup()
- Step 4: Run the parsing tests
cd cofee_backend && uv run pytest tests/integration/test_salutespeech_parsing.py -v
Expected: all tests pass.
- Step 5: Run lint
cd cofee_backend && uv run ruff check cpv3/modules/transcription/service.py
Expected: no errors.
- Step 6: Commit
git add cofee_backend/cpv3/modules/transcription/service.py
git commit -m "feat(backend): implement SaluteSpeech transcription engine (8 functions)"
Task 7: Add Task Dispatch
Files:
-
Modify:
cofee_backend/cpv3/modules/tasks/schemas.py:86(engine Literal) -
Modify:
cofee_backend/cpv3/modules/tasks/service.py:88-91(ENGINE_MAP) -
Modify:
cofee_backend/cpv3/modules/tasks/service.py:613-616(actor import) -
Modify:
cofee_backend/cpv3/modules/tasks/service.py:700(elif branch) -
Step 1: Extend engine Literal in task schema
In cofee_backend/cpv3/modules/tasks/schemas.py, line 86, change:
engine: Literal["whisper", "google"] = Field(
to:
engine: Literal["whisper", "google", "salutespeech"] = Field(
- Step 2: Add to ENGINE_MAP
In cofee_backend/cpv3/modules/tasks/service.py, lines 88-91, change:
ENGINE_MAP: dict[str, str] = {
"whisper": "LOCAL_WHISPER",
"google": "GOOGLE_SPEECH_CLOUD",
}
to:
ENGINE_MAP: dict[str, str] = {
"whisper": "LOCAL_WHISPER",
"google": "GOOGLE_SPEECH_CLOUD",
"salutespeech": "SALUTE_SPEECH",
}
- Step 3: Add import in actor
In cofee_backend/cpv3/modules/tasks/service.py, inside transcription_generate_actor (lines 613-616), change:
from cpv3.modules.transcription.service import (
transcribe_with_google_speech,
transcribe_with_whisper,
)
to:
from cpv3.modules.transcription.service import (
transcribe_with_google_speech,
transcribe_with_salute_speech,
transcribe_with_whisper,
)
- Step 4: Add elif dispatch branch
In cofee_backend/cpv3/modules/tasks/service.py, after the Google branch (after line 700, before the else:), add:
elif engine == "salutespeech":
# Extract sample rate from probe if available
audio_stream = next(
(s for s in probe.streams if s.codec_type == "audio"), None
)
sr = int(audio_stream.sample_rate) if audio_stream and audio_stream.sample_rate else 16000
document = _run_async(
transcribe_with_salute_speech(
storage,
file_key=file_key,
language=language,
model=model,
sample_rate=sr,
job_id=job_uuid,
on_progress=_on_whisper_progress,
)
)
- Step 5: Run lint
cd cofee_backend && uv run ruff check cpv3/modules/tasks/service.py cpv3/modules/tasks/schemas.py
Expected: no errors.
- Step 6: Commit
git add cofee_backend/cpv3/modules/tasks/schemas.py cofee_backend/cpv3/modules/tasks/service.py
git commit -m "feat(backend): add SaluteSpeech to task dispatch (ENGINE_MAP + elif branch)"
Task 8: Add Direct Endpoint (Optional)
Files:
-
Modify:
cofee_backend/cpv3/modules/transcription/router.py(after line 145) -
Step 1: Add route
In cofee_backend/cpv3/modules/transcription/router.py, add the import at the top alongside existing imports:
from cpv3.modules.transcription.schemas import (
... # existing
SaluteSpeechParams,
)
from cpv3.modules.transcription.service import (
... # existing
transcribe_with_salute_speech,
)
Then append after the last endpoint (after line 145):
@router.post("/salute-speech/", response_model=Document)
async def salute_speech_transcribe(
body: SaluteSpeechParams,
current_user: User = Depends(get_current_user),
storage: StorageService = Depends(get_storage),
) -> Document:
_ = current_user
return await transcribe_with_salute_speech(
storage,
file_key=body.file_path,
language=body.language,
model=body.model,
)
- Step 2: Run lint
cd cofee_backend && uv run ruff check cpv3/modules/transcription/router.py
Expected: no errors.
- Step 3: Commit
git add cofee_backend/cpv3/modules/transcription/router.py
git commit -m "feat(backend): add direct /salute-speech/ transcription endpoint"
Task 9: Frontend — TranscriptionModal
Files:
-
Modify:
cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx -
Step 1: Extend type
At line 17, change:
engine: "whisper" | "google"
to:
engine: "whisper" | "google" | "salutespeech"
- Step 2: Add engine option
At lines 22-25, change:
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
]
to:
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
{ value: "salutespeech", label: "SaluteSpeech" },
]
- Step 3: Split model options
Rename the existing MODEL_OPTIONS (lines 33-38) and add SaluteSpeech models:
const WHISPER_MODEL_OPTIONS = [
{ value: "base", label: "Базовая" },
{ value: "small", label: "Малая" },
{ value: "medium", label: "Средняя" },
{ value: "large", label: "Большая" },
]
const SALUTE_MODEL_OPTIONS = [
{ value: "general", label: "Общая" },
{ value: "finance", label: "Финансы" },
{ value: "medicine", label: "Медицина" },
]
- Step 4: Update model dropdown guard
At line 162, change the model dropdown conditional from:
{engine === "whisper" && (
to:
{(engine === "whisper" || engine === "salutespeech") && (
And inside, change the options reference from MODEL_OPTIONS to:
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
- Step 5: Add model reset on engine change
Find the component function body (after the useForm call). Add a useEffect that resets the model when engine changes:
const engine = watch("engine")
useEffect(() => {
if (engine === "salutespeech") {
setValue("model", "general")
} else if (engine === "whisper") {
setValue("model", "base")
}
}, [engine, setValue])
Note: watch and setValue come from useForm — check that they're destructured. If watch("engine") is already used elsewhere, reuse that variable.
- Step 6: Type check
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
Expected: no new errors.
- Step 7: Commit
git add cofee_frontend/src/features/project/TranscriptionModal/TranscriptionModal.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionModal"
Task 10: Frontend — TranscriptionSettingsStep
Files:
- Modify:
cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx
Apply the same changes as Task 9 to this file (constants are duplicated).
- Step 1: Extend type
At line 22, change:
engine: "whisper" | "google"
to:
engine: "whisper" | "google" | "salutespeech"
- Step 2: Add engine option
At lines 27-30, change:
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
]
to:
const ENGINE_OPTIONS = [
{ value: "whisper", label: "Whisper (локальный)" },
{ value: "google", label: "Google Speech" },
{ value: "salutespeech", label: "SaluteSpeech" },
]
- Step 3: Split model options
Rename MODEL_OPTIONS (lines 38-43) and add SaluteSpeech models:
const WHISPER_MODEL_OPTIONS = [
{ value: "base", label: "Базовая" },
{ value: "small", label: "Малая" },
{ value: "medium", label: "Средняя" },
{ value: "large", label: "Большая" },
]
const SALUTE_MODEL_OPTIONS = [
{ value: "general", label: "Общая" },
{ value: "finance", label: "Финансы" },
{ value: "medicine", label: "Медицина" },
]
- Step 4: Update model dropdown guard
At line 263, change:
{engine === "whisper" && (
to:
{(engine === "whisper" || engine === "salutespeech") && (
And change the options reference from MODEL_OPTIONS to:
{(engine === "whisper" ? WHISPER_MODEL_OPTIONS : SALUTE_MODEL_OPTIONS).map((opt) => (
- Step 5: Add model reset on engine change
Same useEffect as Task 9:
const engine = watch("engine")
useEffect(() => {
if (engine === "salutespeech") {
setValue("model", "general")
} else if (engine === "whisper") {
setValue("model", "base")
}
}, [engine, setValue])
- Step 6: Type check
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
Expected: no new errors.
- Step 7: Commit
git add cofee_frontend/src/features/project/TranscriptionSettingsStep/TranscriptionSettingsStep.tsx
git commit -m "feat(frontend): add SaluteSpeech engine option to TranscriptionSettingsStep"
Task 11: Final Verification
Files: None (verification only)
- Step 1: Backend lint
cd cofee_backend && uv run ruff check cpv3/ 2>&1 | head -20
Expected: no errors.
- Step 2: Backend tests
cd cofee_backend && uv run pytest 2>&1 | tail -30
Expected: all tests pass (including new SaluteSpeech parsing tests).
- Step 3: Frontend type check
cd cofee_frontend && bunx tsc --noEmit 2>&1 | grep -v "app/template.tsx" | grep -v "CreateProjectModal" | head -20
Expected: no new errors.
- Step 4: Write verification report
VERIFICATION REPORT
===================
Subproject: backend + frontend
Level: base
Type check: [PASS/FAIL]
Lint: [PASS/FAIL]
Tests: [PASS/FAIL] (X passed, Y failed)
Build: SKIPPED
E2E: SKIPPED
Files changed: ~10
Status: [READY/NOT READY]