# Backend Comprehensive Refactor — Design Spec **Date:** 2026-03-15 **Scope:** `cofee_backend/` — all modules, common layer, infrastructure layer **Approach:** Top-down (decompose tasks/ first, then extract shared helpers, then apply across modules) ## Problem Summary The backend has grown feature-by-feature and accumulated structural debt: 1. **`tasks/service.py`** is 1673 lines mixing four concerns: broker infrastructure, actor lifecycle, artifact persistence, and task submission orchestration. Eight actors repeat an identical ~40-line lifecycle skeleton. 2. **CRUD modules** repeat the same patterns: repository update loops (`model_dump` + commit + refresh), router permission checks (fetch-or-404, owner-or-staff, raise 403), inline error strings. 3. **Auth/ownership** is inconsistent — some modules enforce ownership, others don't. 4. **Layering** is inconsistent — some routers bypass the service layer and call repositories directly. 5. **Error messages** are inline strings instead of `ERROR_` constants. 6. **Dead code**, naming issues, and minor bugs exist across several modules. ## Guiding Principles - **DRY:** Extract repeated patterns into small free functions, not base classes - **KISS:** Helpers are plain functions — no magic, no decorators, no hidden behavior - **Developer-readable:** Every call site shows what it does; no hidden abstractions - **Preserve structure:** Keep the existing module pattern; allow extra files only when a single `service.py` would exceed ~300 lines - **Shared helpers in `common/`:** Application-level helpers (DB update, HTTP permission checks) - **Infrastructure for infrastructure:** Only broker/transport concerns go in `infrastructure/` ## Phase 1: Decompose `tasks/service.py` ### 1.1 New file layout ``` cpv3/modules/tasks/ ├── __init__.py (unchanged) ├── schemas.py (unchanged) ├── router.py (unchanged) ├── service.py TaskService class only: submit_* methods + record_webhook_event + cancel_job ├── actors.py NEW: all @dramatiq.actor functions, each using _run_actor() ├── artifacts.py NEW: _save_transcription_artifacts, _save_convert_artifacts, _save_captions_artifacts ├── webhooks.py NEW: webhook transport (_send_webhook_event, _build_webhook_url, event helpers) └── constants.py NEW: ERROR_, MESSAGE_, STATUS_ constants, JobCancelledError, all in Russian ``` New infrastructure file: ``` cpv3/infrastructure/dramatiq.py - RedisBroker setup and initialization - _serialize_broker_reference() / _parse_broker_reference() - _get_job_status_sync() (raw psycopg2 cancellation check) - _raise_if_job_cancelled() ``` ### 1.2 The `_run_actor()` pattern One helper function replaces the repeated lifecycle skeleton across all 8 actors: ```python def _run_actor( job_id: str, webhook_url: str, work_fn: Callable[..., dict[str, Any]], *, starting_message: str = MESSAGE_STARTING, **work_kwargs: Any, ) -> None: """Shared actor lifecycle: cancel-check → RUNNING → work → DONE/FAILED.""" job_uuid = uuid.UUID(job_id) try: _raise_if_job_cancelled(job_uuid) except JobCancelledError: logger.info("Actor cancelled before start: %s", job_uuid) return _send_webhook_event(webhook_url, TaskWebhookEvent( status=JOB_STATUS_RUNNING, current_message=starting_message, started_at=_utc_now(), )) try: result = work_fn(job_uuid=job_uuid, webhook_url=webhook_url, **work_kwargs) _send_webhook_event(webhook_url, TaskWebhookEvent( status=JOB_STATUS_DONE, output=result, finished_at=_utc_now(), )) except JobCancelledError: logger.info("Actor cancelled mid-run: %s", job_uuid) except Exception as exc: logger.exception("Actor failed: %s", job_uuid) _send_webhook_event(webhook_url, TaskWebhookEvent( status=JOB_STATUS_FAILED, error_message=str(exc), finished_at=_utc_now(), )) ``` Each actor becomes ~5-15 lines defining only its unique work function and parameters. ### 1.3 Artifact persistence → owning modules Move artifact-saving logic to the modules that own those domain concepts: - `_save_transcription_artifacts()` → `transcription/service.py` as `TranscriptionService.save_artifacts()` - `_save_convert_artifacts()` → `media/service.py` as `MediaService.save_artifacts()` - `_save_captions_artifacts()` → `captions/service.py` as `CaptionService.save_artifacts()` `TaskService.record_webhook_event()` dispatches to these module-owned methods. ### 1.4 What stays in `tasks/service.py` Only the `TaskService` class (~300-400 lines): - `__init__()` with session + job repo - `submit_media_probe()`, `submit_silence_detect()`, `submit_silence_remove()`, `submit_convert()`, `submit_frame_extract()`, `submit_transcription_generate()`, `submit_captions_generate()` - `record_webhook_event()` — orchestration only, delegates artifact saving - `cancel_job()` ### 1.5 Constants language All user-facing constants in `tasks/constants.py` are in Russian. The current mix of English and Russian (`tasks/service.py:93-103`) is unified to Russian. ## Phase 2: Shared Helpers in `common/` ### 2.1 `common/db.py` — Repository helpers Three free functions: ```python async def apply_partial_update(instance: object, data: BaseModel) -> None: """Apply non-None fields from a Pydantic schema to a SQLAlchemy model.""" async def commit_and_refresh(session: AsyncSession, instance: object) -> None: """Commit the session and refresh the instance.""" async def soft_delete( session: AsyncSession, instance: object, *, field: str = "is_active", value: bool = False, ) -> None: """Mark an instance as soft-deleted and commit. Supports both is_active=False and is_deleted=True patterns.""" ``` Soft-delete documentation comment explains which modules use which pattern. No unification of `is_active` vs `is_deleted` in this refactor — that requires a data migration. ### 2.2 `common/http.py` — Router helpers Two free functions + two error constants: ```python ERROR_NOT_FOUND = "Объект не найден" ERROR_FORBIDDEN = "Доступ запрещён" def ensure_found(instance: object | None, detail: str = ERROR_NOT_FOUND) -> None: """Raise 404 if instance is None.""" def ensure_owner_or_staff( instance: object, user: User, *, owner_field: str = "owner_id", ) -> None: """Raise 403 if user is not staff and doesn't own the resource.""" ``` ## Phase 3: Apply Shared Helpers Across All CRUD Modules ### 3.1 Repository updates (7 modules) Replace `model_dump` loop + commit + refresh with `apply_partial_update()` + `commit_and_refresh()`: - `files/repository.py` - `media/repository.py` (×2: MediaFile + Artifact) - `projects/repository.py` - `jobs/repository.py` (×2: Job + JobEvent) - `webhooks/repository.py` - `transcription/repository.py` - `captions/repository.py` (special case: `style_config` nested serialization stays explicit) ### 3.2 Router permission checks (7 modules) Replace inline 404/403 checks with `ensure_found()` + `ensure_owner_or_staff()`: - `users/router.py` (3 sites) - `projects/router.py` (3 sites) - `files/router.py` (3 sites + storage path check) - `media/router.py` (6 sites: mediafiles + artifacts) - `jobs/router.py` (6 sites: jobs + events) - `webhooks/router.py` (3 sites) - `captions/router.py` (2 sites) ### 3.3 Add missing permission checks While applying helpers, add ownership checks where currently absent: | Module | Current state | Fix | |--------|--------------|-----| | `transcription/router.py` | No ownership checks | Add `ensure_owner_or_staff()` via parent project | | `media/router.py` artifacts | `_ = current_user` (ignores user) | Add ownership check via parent media file | | `captions/router.py` get-by-id | Returns any preset by ID | Only return if system preset or owned by requester | | `jobs/router.py` events | No user scoping on events | Add ownership check via parent job | ## Phase 4: Error Handling, Naming, Dead Code, Security ### 4.1 Error message constants Extract inline strings into `ERROR_` constants at the top of each file: - `infrastructure/auth.py` → `ERROR_TOKEN_EXPIRED`, `ERROR_INVALID_TOKEN`, `ERROR_USER_NOT_FOUND` - `users/service.py` → `ERROR_EMAIL_ALREADY_EXISTS`, `ERROR_INVALID_CREDENTIALS` - `users/router.py` → `ERROR_PASSWORD_MISMATCH`, `ERROR_INCORRECT_PASSWORD` - `files/router.py` → `ERROR_FILE_TOO_LARGE`, `ERROR_EMPTY_FILE` - `media/service.py` → `ERROR_PROBE_FAILED`, `ERROR_NO_VIDEO_STREAM`, `ERROR_FFMPEG_FAILED` ### 4.2 Naming fix Rename `project_pct` → `progress_pct` in: - `jobs/models.py` - `jobs/schemas.py` (3 occurrences) - Alembic migration for the DB column rename - Check frontend for field dependency before applying ### 4.3 Dead code removal | Dead code | Location | |-----------|----------| | Legacy service function exports | `users/service.py:60-105`, `projects/service.py:46-69` | | `WordOptions.max_line_count` | `transcription/schemas.py:84-88` | | Duplicate `import json` | `media/service.py:7` (top-level unused) | | `filename_without_ext` variable | `media/service.py:396, 434` | ### 4.4 Notification completeness Add missing entries to `notifications/service.py`: - `JOB_TYPE_LABELS`: add `FRAME_EXTRACT` (Russian label) - `STATUS_TITLES`: add `CANCELLED` (Russian label) - All notification messages must be in Russian ### 4.5 Duplicate WebSocket auth Extract `decode_user_id_from_token(token: str) -> uuid.UUID` in `infrastructure/auth.py`. Both `get_current_user()` and `notifications/router.py` WebSocket handler reuse it. WebSocket handler keeps its own session management. ### 4.6 Settings fix Change `remotion_service_url` default from `http://localhost:8001` to `http://localhost:3001` in `infrastructure/settings.py`. ### 4.7 Path traversal fix Add root boundary check in `files/router.py` local file serving: ```python full_path = (settings.local_storage_dir / file_path).resolve() root = Path(settings.local_storage_dir).resolve() if not str(full_path).startswith(str(root)): raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=ERROR_FORBIDDEN) ``` ### 4.8 Mutable default fix Change `media/schemas.py:138` from `streams: list[StreamSchema] = []` to `streams: list[StreamSchema] = Field(default_factory=list)`. ## Phase 5: Consistent Layering and Tests ### 5.1 Fix Router → Service → Repository layering | Module | Current state | Fix | |--------|--------------|-----| | `media/router.py` | CRUD calls `MediaFileRepository` directly | Route through `MediaService` — add thin CRUD wrappers | | `transcription/router.py` | All routes call `TranscriptionRepository` directly | Route through `TranscriptionService` — add thin CRUD wrappers | | `notifications/router.py` | Uses `NotificationRepository` directly | Route through `NotificationService` — add list/get/mark_read | Service methods are pass-through calls to the repository. No new business logic invented. Value is consistent architecture. ### 5.2 Test additions | Test | Reason | |------|--------| | Unit test `_run_actor()` lifecycle (cancel/success/failure) | Core new abstraction | | Unit test `ensure_found()` / `ensure_owner_or_staff()` | Shared helpers | | Unit test `apply_partial_update()` / `commit_and_refresh()` | Shared helpers | | Integration tests for new ownership checks (transcription, media artifacts, caption presets, job events) | New permission enforcement | | Test path traversal prevention on local file route | Security fix | No full test rewrites. No new integration test infrastructure. ### 5.3 Update `CLAUDE.md` Reflect the new architecture: - Module file rule updated: extra files allowed when `service.py` would exceed ~300 lines - Document `common/db.py` and `common/http.py` - Document `infrastructure/dramatiq.py` - Document `ensure_found` / `ensure_owner_or_staff` as the standard router permission pattern - Document `apply_partial_update` / `commit_and_refresh` as the standard repository update pattern ## Files Changed Summary ### New files | File | Purpose | |------|---------| | `cpv3/common/db.py` | Repository helpers: apply_partial_update, commit_and_refresh, soft_delete | | `cpv3/common/http.py` | Router helpers: ensure_found, ensure_owner_or_staff, ERROR_ constants | | `cpv3/infrastructure/dramatiq.py` | RedisBroker setup, broker-id serialization, cancellation check | | `cpv3/modules/tasks/actors.py` | All @dramatiq.actor functions + _run_actor() lifecycle helper | | `cpv3/modules/tasks/artifacts.py` | Artifact persistence helpers (moved from tasks/service.py) | | `cpv3/modules/tasks/webhooks.py` | Webhook transport helpers | | `cpv3/modules/tasks/constants.py` | All ERROR_, MESSAGE_, STATUS_ constants (Russian) | ### Modified files (by phase) **Phase 1 — tasks decomposition:** - `cpv3/modules/tasks/service.py` — stripped to TaskService class only - `cpv3/modules/transcription/service.py` — receives `save_artifacts()` - `cpv3/modules/media/service.py` — receives `save_artifacts()` - `cpv3/modules/captions/service.py` — receives `save_artifacts()` **Phase 3 — apply shared helpers:** - All 7 module `repository.py` files — use `apply_partial_update()` + `commit_and_refresh()` - All 7 module `router.py` files — use `ensure_found()` + `ensure_owner_or_staff()` **Phase 4 — error handling, cleanup:** - `cpv3/infrastructure/auth.py` — error constants + shared token decoder - `cpv3/infrastructure/settings.py` — fix remotion port default - `cpv3/modules/users/service.py` — error constants + remove dead exports - `cpv3/modules/users/router.py` — error constants - `cpv3/modules/projects/service.py` — remove dead exports - `cpv3/modules/files/router.py` — path traversal fix + error constants - `cpv3/modules/media/service.py` — error constants + dead code removal - `cpv3/modules/media/schemas.py` — mutable default fix - `cpv3/modules/transcription/schemas.py` — remove dead `max_line_count` - `cpv3/modules/notifications/service.py` — complete label maps (Russian) - `cpv3/modules/notifications/router.py` — use shared token decoder - `cpv3/modules/jobs/models.py` — rename `project_pct` → `progress_pct` - `cpv3/modules/jobs/schemas.py` — rename `project_pct` → `progress_pct` **Phase 5 — layering + tests:** - `cpv3/modules/media/router.py` — route through service - `cpv3/modules/transcription/router.py` — route through service - `cpv3/modules/notifications/router.py` — route through service - `cofee_backend/CLAUDE.md` — updated architecture docs - New test files for helpers and permission checks ### Alembic migration One migration for `project_pct` → `progress_pct` column rename in jobs table. ## Risks and Mitigations | Risk | Mitigation | |------|-----------| | Actor refactor breaks background processing | Test `_run_actor()` lifecycle thoroughly before deploying. Keep old actors available on a branch for rollback. | | `progress_pct` rename breaks frontend | Check frontend API types for `project_pct` usage before applying migration. Coordinate with frontend update. | | Missing auth checks break existing workflows | New permission checks only restrict access that was unintentionally open. Test with real user flows. | | Shared helpers become a dumping ground | Keep `common/db.py` and `common/http.py` minimal. New helpers need clear justification. |