Files
main_backend/docs/superpowers/specs/2026-03-15-backend-comprehensive-refactor-design.md
T
Daniil 2944040ac7 docs: add backend comprehensive refactor design spec
Covers decomposition of tasks/service.py, shared CRUD/auth helpers
in common/, permission fixes, error handling, naming, and layering
consistency across all backend modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:07:22 +03:00

361 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Backend Comprehensive Refactor — Design Spec
**Date:** 2026-03-15
**Scope:** `cofee_backend/` — all modules, common layer, infrastructure layer
**Approach:** Top-down (decompose tasks/ first, then extract shared helpers, then apply across modules)
## Problem Summary
The backend has grown feature-by-feature and accumulated structural debt:
1. **`tasks/service.py`** is 1673 lines mixing four concerns: broker infrastructure, actor lifecycle, artifact persistence, and task submission orchestration. Eight actors repeat an identical ~40-line lifecycle skeleton.
2. **CRUD modules** repeat the same patterns: repository update loops (`model_dump` + commit + refresh), router permission checks (fetch-or-404, owner-or-staff, raise 403), inline error strings.
3. **Auth/ownership** is inconsistent — some modules enforce ownership, others don't.
4. **Layering** is inconsistent — some routers bypass the service layer and call repositories directly.
5. **Error messages** are inline strings instead of `ERROR_` constants.
6. **Dead code**, naming issues, and minor bugs exist across several modules.
## Guiding Principles
- **DRY:** Extract repeated patterns into small free functions, not base classes
- **KISS:** Helpers are plain functions — no magic, no decorators, no hidden behavior
- **Developer-readable:** Every call site shows what it does; no hidden abstractions
- **Preserve structure:** Keep the existing module pattern; allow extra files only when a single `service.py` would exceed ~300 lines
- **Shared helpers in `common/`:** Application-level helpers (DB update, HTTP permission checks)
- **Infrastructure for infrastructure:** Only broker/transport concerns go in `infrastructure/`
## Phase 1: Decompose `tasks/service.py`
### 1.1 New file layout
```
cpv3/modules/tasks/
├── __init__.py (unchanged)
├── schemas.py (unchanged)
├── router.py (unchanged)
├── service.py TaskService class only: submit_* methods + record_webhook_event + cancel_job
├── actors.py NEW: all @dramatiq.actor functions, each using _run_actor()
├── artifacts.py NEW: _save_transcription_artifacts, _save_convert_artifacts, _save_captions_artifacts
├── webhooks.py NEW: webhook transport (_send_webhook_event, _build_webhook_url, event helpers)
└── constants.py NEW: ERROR_, MESSAGE_, STATUS_ constants, JobCancelledError, all in Russian
```
New infrastructure file:
```
cpv3/infrastructure/dramatiq.py
- RedisBroker setup and initialization
- _serialize_broker_reference() / _parse_broker_reference()
- _get_job_status_sync() (raw psycopg2 cancellation check)
- _raise_if_job_cancelled()
```
### 1.2 The `_run_actor()` pattern
One helper function replaces the repeated lifecycle skeleton across all 8 actors:
```python
def _run_actor(
job_id: str,
webhook_url: str,
work_fn: Callable[..., dict[str, Any]],
*,
starting_message: str = MESSAGE_STARTING,
**work_kwargs: Any,
) -> None:
"""Shared actor lifecycle: cancel-check → RUNNING → work → DONE/FAILED."""
job_uuid = uuid.UUID(job_id)
try:
_raise_if_job_cancelled(job_uuid)
except JobCancelledError:
logger.info("Actor cancelled before start: %s", job_uuid)
return
_send_webhook_event(webhook_url, TaskWebhookEvent(
status=JOB_STATUS_RUNNING,
current_message=starting_message,
started_at=_utc_now(),
))
try:
result = work_fn(job_uuid=job_uuid, webhook_url=webhook_url, **work_kwargs)
_send_webhook_event(webhook_url, TaskWebhookEvent(
status=JOB_STATUS_DONE,
output=result,
finished_at=_utc_now(),
))
except JobCancelledError:
logger.info("Actor cancelled mid-run: %s", job_uuid)
except Exception as exc:
logger.exception("Actor failed: %s", job_uuid)
_send_webhook_event(webhook_url, TaskWebhookEvent(
status=JOB_STATUS_FAILED,
error_message=str(exc),
finished_at=_utc_now(),
))
```
Each actor becomes ~5-15 lines defining only its unique work function and parameters.
### 1.3 Artifact persistence → owning modules
Move artifact-saving logic to the modules that own those domain concepts:
- `_save_transcription_artifacts()``transcription/service.py` as `TranscriptionService.save_artifacts()`
- `_save_convert_artifacts()``media/service.py` as `MediaService.save_artifacts()`
- `_save_captions_artifacts()``captions/service.py` as `CaptionService.save_artifacts()`
`TaskService.record_webhook_event()` dispatches to these module-owned methods.
### 1.4 What stays in `tasks/service.py`
Only the `TaskService` class (~300-400 lines):
- `__init__()` with session + job repo
- `submit_media_probe()`, `submit_silence_detect()`, `submit_silence_remove()`, `submit_convert()`, `submit_frame_extract()`, `submit_transcription_generate()`, `submit_captions_generate()`
- `record_webhook_event()` — orchestration only, delegates artifact saving
- `cancel_job()`
### 1.5 Constants language
All user-facing constants in `tasks/constants.py` are in Russian. The current mix of English and Russian (`tasks/service.py:93-103`) is unified to Russian.
## Phase 2: Shared Helpers in `common/`
### 2.1 `common/db.py` — Repository helpers
Three free functions:
```python
async def apply_partial_update(instance: object, data: BaseModel) -> None:
"""Apply non-None fields from a Pydantic schema to a SQLAlchemy model."""
async def commit_and_refresh(session: AsyncSession, instance: object) -> None:
"""Commit the session and refresh the instance."""
async def soft_delete(
session: AsyncSession,
instance: object,
*,
field: str = "is_active",
value: bool = False,
) -> None:
"""Mark an instance as soft-deleted and commit.
Supports both is_active=False and is_deleted=True patterns."""
```
Soft-delete documentation comment explains which modules use which pattern.
No unification of `is_active` vs `is_deleted` in this refactor — that requires a data migration.
### 2.2 `common/http.py` — Router helpers
Two free functions + two error constants:
```python
ERROR_NOT_FOUND = "Объект не найден"
ERROR_FORBIDDEN = "Доступ запрещён"
def ensure_found(instance: object | None, detail: str = ERROR_NOT_FOUND) -> None:
"""Raise 404 if instance is None."""
def ensure_owner_or_staff(
instance: object,
user: User,
*,
owner_field: str = "owner_id",
) -> None:
"""Raise 403 if user is not staff and doesn't own the resource."""
```
## Phase 3: Apply Shared Helpers Across All CRUD Modules
### 3.1 Repository updates (7 modules)
Replace `model_dump` loop + commit + refresh with `apply_partial_update()` + `commit_and_refresh()`:
- `files/repository.py`
- `media/repository.py` (×2: MediaFile + Artifact)
- `projects/repository.py`
- `jobs/repository.py` (×2: Job + JobEvent)
- `webhooks/repository.py`
- `transcription/repository.py`
- `captions/repository.py` (special case: `style_config` nested serialization stays explicit)
### 3.2 Router permission checks (7 modules)
Replace inline 404/403 checks with `ensure_found()` + `ensure_owner_or_staff()`:
- `users/router.py` (3 sites)
- `projects/router.py` (3 sites)
- `files/router.py` (3 sites + storage path check)
- `media/router.py` (6 sites: mediafiles + artifacts)
- `jobs/router.py` (6 sites: jobs + events)
- `webhooks/router.py` (3 sites)
- `captions/router.py` (2 sites)
### 3.3 Add missing permission checks
While applying helpers, add ownership checks where currently absent:
| Module | Current state | Fix |
|--------|--------------|-----|
| `transcription/router.py` | No ownership checks | Add `ensure_owner_or_staff()` via parent project |
| `media/router.py` artifacts | `_ = current_user` (ignores user) | Add ownership check via parent media file |
| `captions/router.py` get-by-id | Returns any preset by ID | Only return if system preset or owned by requester |
| `jobs/router.py` events | No user scoping on events | Add ownership check via parent job |
## Phase 4: Error Handling, Naming, Dead Code, Security
### 4.1 Error message constants
Extract inline strings into `ERROR_` constants at the top of each file:
- `infrastructure/auth.py``ERROR_TOKEN_EXPIRED`, `ERROR_INVALID_TOKEN`, `ERROR_USER_NOT_FOUND`
- `users/service.py``ERROR_EMAIL_ALREADY_EXISTS`, `ERROR_INVALID_CREDENTIALS`
- `users/router.py``ERROR_PASSWORD_MISMATCH`, `ERROR_INCORRECT_PASSWORD`
- `files/router.py``ERROR_FILE_TOO_LARGE`, `ERROR_EMPTY_FILE`
- `media/service.py``ERROR_PROBE_FAILED`, `ERROR_NO_VIDEO_STREAM`, `ERROR_FFMPEG_FAILED`
### 4.2 Naming fix
Rename `project_pct``progress_pct` in:
- `jobs/models.py`
- `jobs/schemas.py` (3 occurrences)
- Alembic migration for the DB column rename
- Check frontend for field dependency before applying
### 4.3 Dead code removal
| Dead code | Location |
|-----------|----------|
| Legacy service function exports | `users/service.py:60-105`, `projects/service.py:46-69` |
| `WordOptions.max_line_count` | `transcription/schemas.py:84-88` |
| Duplicate `import json` | `media/service.py:7` (top-level unused) |
| `filename_without_ext` variable | `media/service.py:396, 434` |
### 4.4 Notification completeness
Add missing entries to `notifications/service.py`:
- `JOB_TYPE_LABELS`: add `FRAME_EXTRACT` (Russian label)
- `STATUS_TITLES`: add `CANCELLED` (Russian label)
- All notification messages must be in Russian
### 4.5 Duplicate WebSocket auth
Extract `decode_user_id_from_token(token: str) -> uuid.UUID` in `infrastructure/auth.py`.
Both `get_current_user()` and `notifications/router.py` WebSocket handler reuse it.
WebSocket handler keeps its own session management.
### 4.6 Settings fix
Change `remotion_service_url` default from `http://localhost:8001` to `http://localhost:3001` in `infrastructure/settings.py`.
### 4.7 Path traversal fix
Add root boundary check in `files/router.py` local file serving:
```python
full_path = (settings.local_storage_dir / file_path).resolve()
root = Path(settings.local_storage_dir).resolve()
if not str(full_path).startswith(str(root)):
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=ERROR_FORBIDDEN)
```
### 4.8 Mutable default fix
Change `media/schemas.py:138` from `streams: list[StreamSchema] = []` to `streams: list[StreamSchema] = Field(default_factory=list)`.
## Phase 5: Consistent Layering and Tests
### 5.1 Fix Router → Service → Repository layering
| Module | Current state | Fix |
|--------|--------------|-----|
| `media/router.py` | CRUD calls `MediaFileRepository` directly | Route through `MediaService` — add thin CRUD wrappers |
| `transcription/router.py` | All routes call `TranscriptionRepository` directly | Route through `TranscriptionService` — add thin CRUD wrappers |
| `notifications/router.py` | Uses `NotificationRepository` directly | Route through `NotificationService` — add list/get/mark_read |
Service methods are pass-through calls to the repository. No new business logic invented. Value is consistent architecture.
### 5.2 Test additions
| Test | Reason |
|------|--------|
| Unit test `_run_actor()` lifecycle (cancel/success/failure) | Core new abstraction |
| Unit test `ensure_found()` / `ensure_owner_or_staff()` | Shared helpers |
| Unit test `apply_partial_update()` / `commit_and_refresh()` | Shared helpers |
| Integration tests for new ownership checks (transcription, media artifacts, caption presets, job events) | New permission enforcement |
| Test path traversal prevention on local file route | Security fix |
No full test rewrites. No new integration test infrastructure.
### 5.3 Update `CLAUDE.md`
Reflect the new architecture:
- Module file rule updated: extra files allowed when `service.py` would exceed ~300 lines
- Document `common/db.py` and `common/http.py`
- Document `infrastructure/dramatiq.py`
- Document `ensure_found` / `ensure_owner_or_staff` as the standard router permission pattern
- Document `apply_partial_update` / `commit_and_refresh` as the standard repository update pattern
## Files Changed Summary
### New files
| File | Purpose |
|------|---------|
| `cpv3/common/db.py` | Repository helpers: apply_partial_update, commit_and_refresh, soft_delete |
| `cpv3/common/http.py` | Router helpers: ensure_found, ensure_owner_or_staff, ERROR_ constants |
| `cpv3/infrastructure/dramatiq.py` | RedisBroker setup, broker-id serialization, cancellation check |
| `cpv3/modules/tasks/actors.py` | All @dramatiq.actor functions + _run_actor() lifecycle helper |
| `cpv3/modules/tasks/artifacts.py` | Artifact persistence helpers (moved from tasks/service.py) |
| `cpv3/modules/tasks/webhooks.py` | Webhook transport helpers |
| `cpv3/modules/tasks/constants.py` | All ERROR_, MESSAGE_, STATUS_ constants (Russian) |
### Modified files (by phase)
**Phase 1 — tasks decomposition:**
- `cpv3/modules/tasks/service.py` — stripped to TaskService class only
- `cpv3/modules/transcription/service.py` — receives `save_artifacts()`
- `cpv3/modules/media/service.py` — receives `save_artifacts()`
- `cpv3/modules/captions/service.py` — receives `save_artifacts()`
**Phase 3 — apply shared helpers:**
- All 7 module `repository.py` files — use `apply_partial_update()` + `commit_and_refresh()`
- All 7 module `router.py` files — use `ensure_found()` + `ensure_owner_or_staff()`
**Phase 4 — error handling, cleanup:**
- `cpv3/infrastructure/auth.py` — error constants + shared token decoder
- `cpv3/infrastructure/settings.py` — fix remotion port default
- `cpv3/modules/users/service.py` — error constants + remove dead exports
- `cpv3/modules/users/router.py` — error constants
- `cpv3/modules/projects/service.py` — remove dead exports
- `cpv3/modules/files/router.py` — path traversal fix + error constants
- `cpv3/modules/media/service.py` — error constants + dead code removal
- `cpv3/modules/media/schemas.py` — mutable default fix
- `cpv3/modules/transcription/schemas.py` — remove dead `max_line_count`
- `cpv3/modules/notifications/service.py` — complete label maps (Russian)
- `cpv3/modules/notifications/router.py` — use shared token decoder
- `cpv3/modules/jobs/models.py` — rename `project_pct``progress_pct`
- `cpv3/modules/jobs/schemas.py` — rename `project_pct``progress_pct`
**Phase 5 — layering + tests:**
- `cpv3/modules/media/router.py` — route through service
- `cpv3/modules/transcription/router.py` — route through service
- `cpv3/modules/notifications/router.py` — route through service
- `cofee_backend/CLAUDE.md` — updated architecture docs
- New test files for helpers and permission checks
### Alembic migration
One migration for `project_pct``progress_pct` column rename in jobs table.
## Risks and Mitigations
| Risk | Mitigation |
|------|-----------|
| Actor refactor breaks background processing | Test `_run_actor()` lifecycle thoroughly before deploying. Keep old actors available on a branch for rollback. |
| `progress_pct` rename breaks frontend | Check frontend API types for `project_pct` usage before applying migration. Coordinate with frontend update. |
| Missing auth checks break existing workflows | New permission checks only restrict access that was unintentionally open. Test with real user flows. |
| Shared helpers become a dumping ground | Keep `common/db.py` and `common/http.py` minimal. New helpers need clear justification. |