docs: add backend comprehensive refactor design spec
Covers decomposition of tasks/service.py, shared CRUD/auth helpers in common/, permission fixes, error handling, naming, and layering consistency across all backend modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,360 @@
|
|||||||
|
# Backend Comprehensive Refactor — Design Spec
|
||||||
|
|
||||||
|
**Date:** 2026-03-15
|
||||||
|
**Scope:** `cofee_backend/` — all modules, common layer, infrastructure layer
|
||||||
|
**Approach:** Top-down (decompose tasks/ first, then extract shared helpers, then apply across modules)
|
||||||
|
|
||||||
|
## Problem Summary
|
||||||
|
|
||||||
|
The backend has grown feature-by-feature and accumulated structural debt:
|
||||||
|
|
||||||
|
1. **`tasks/service.py`** is 1673 lines mixing four concerns: broker infrastructure, actor lifecycle, artifact persistence, and task submission orchestration. Eight actors repeat an identical ~40-line lifecycle skeleton.
|
||||||
|
2. **CRUD modules** repeat the same patterns: repository update loops (`model_dump` + commit + refresh), router permission checks (fetch-or-404, owner-or-staff, raise 403), inline error strings.
|
||||||
|
3. **Auth/ownership** is inconsistent — some modules enforce ownership, others don't.
|
||||||
|
4. **Layering** is inconsistent — some routers bypass the service layer and call repositories directly.
|
||||||
|
5. **Error messages** are inline strings instead of `ERROR_` constants.
|
||||||
|
6. **Dead code**, naming issues, and minor bugs exist across several modules.
|
||||||
|
|
||||||
|
## Guiding Principles
|
||||||
|
|
||||||
|
- **DRY:** Extract repeated patterns into small free functions, not base classes
|
||||||
|
- **KISS:** Helpers are plain functions — no magic, no decorators, no hidden behavior
|
||||||
|
- **Developer-readable:** Every call site shows what it does; no hidden abstractions
|
||||||
|
- **Preserve structure:** Keep the existing module pattern; allow extra files only when a single `service.py` would exceed ~300 lines
|
||||||
|
- **Shared helpers in `common/`:** Application-level helpers (DB update, HTTP permission checks)
|
||||||
|
- **Infrastructure for infrastructure:** Only broker/transport concerns go in `infrastructure/`
|
||||||
|
|
||||||
|
## Phase 1: Decompose `tasks/service.py`
|
||||||
|
|
||||||
|
### 1.1 New file layout
|
||||||
|
|
||||||
|
```
|
||||||
|
cpv3/modules/tasks/
|
||||||
|
├── __init__.py (unchanged)
|
||||||
|
├── schemas.py (unchanged)
|
||||||
|
├── router.py (unchanged)
|
||||||
|
├── service.py TaskService class only: submit_* methods + record_webhook_event + cancel_job
|
||||||
|
├── actors.py NEW: all @dramatiq.actor functions, each using _run_actor()
|
||||||
|
├── artifacts.py NEW: _save_transcription_artifacts, _save_convert_artifacts, _save_captions_artifacts
|
||||||
|
├── webhooks.py NEW: webhook transport (_send_webhook_event, _build_webhook_url, event helpers)
|
||||||
|
└── constants.py NEW: ERROR_, MESSAGE_, STATUS_ constants, JobCancelledError, all in Russian
|
||||||
|
```
|
||||||
|
|
||||||
|
New infrastructure file:
|
||||||
|
|
||||||
|
```
|
||||||
|
cpv3/infrastructure/dramatiq.py
|
||||||
|
- RedisBroker setup and initialization
|
||||||
|
- _serialize_broker_reference() / _parse_broker_reference()
|
||||||
|
- _get_job_status_sync() (raw psycopg2 cancellation check)
|
||||||
|
- _raise_if_job_cancelled()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 1.2 The `_run_actor()` pattern
|
||||||
|
|
||||||
|
One helper function replaces the repeated lifecycle skeleton across all 8 actors:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _run_actor(
|
||||||
|
job_id: str,
|
||||||
|
webhook_url: str,
|
||||||
|
work_fn: Callable[..., dict[str, Any]],
|
||||||
|
*,
|
||||||
|
starting_message: str = MESSAGE_STARTING,
|
||||||
|
**work_kwargs: Any,
|
||||||
|
) -> None:
|
||||||
|
"""Shared actor lifecycle: cancel-check → RUNNING → work → DONE/FAILED."""
|
||||||
|
job_uuid = uuid.UUID(job_id)
|
||||||
|
try:
|
||||||
|
_raise_if_job_cancelled(job_uuid)
|
||||||
|
except JobCancelledError:
|
||||||
|
logger.info("Actor cancelled before start: %s", job_uuid)
|
||||||
|
return
|
||||||
|
|
||||||
|
_send_webhook_event(webhook_url, TaskWebhookEvent(
|
||||||
|
status=JOB_STATUS_RUNNING,
|
||||||
|
current_message=starting_message,
|
||||||
|
started_at=_utc_now(),
|
||||||
|
))
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = work_fn(job_uuid=job_uuid, webhook_url=webhook_url, **work_kwargs)
|
||||||
|
_send_webhook_event(webhook_url, TaskWebhookEvent(
|
||||||
|
status=JOB_STATUS_DONE,
|
||||||
|
output=result,
|
||||||
|
finished_at=_utc_now(),
|
||||||
|
))
|
||||||
|
except JobCancelledError:
|
||||||
|
logger.info("Actor cancelled mid-run: %s", job_uuid)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.exception("Actor failed: %s", job_uuid)
|
||||||
|
_send_webhook_event(webhook_url, TaskWebhookEvent(
|
||||||
|
status=JOB_STATUS_FAILED,
|
||||||
|
error_message=str(exc),
|
||||||
|
finished_at=_utc_now(),
|
||||||
|
))
|
||||||
|
```
|
||||||
|
|
||||||
|
Each actor becomes ~5-15 lines defining only its unique work function and parameters.
|
||||||
|
|
||||||
|
### 1.3 Artifact persistence → owning modules
|
||||||
|
|
||||||
|
Move artifact-saving logic to the modules that own those domain concepts:
|
||||||
|
|
||||||
|
- `_save_transcription_artifacts()` → `transcription/service.py` as `TranscriptionService.save_artifacts()`
|
||||||
|
- `_save_convert_artifacts()` → `media/service.py` as `MediaService.save_artifacts()`
|
||||||
|
- `_save_captions_artifacts()` → `captions/service.py` as `CaptionService.save_artifacts()`
|
||||||
|
|
||||||
|
`TaskService.record_webhook_event()` dispatches to these module-owned methods.
|
||||||
|
|
||||||
|
### 1.4 What stays in `tasks/service.py`
|
||||||
|
|
||||||
|
Only the `TaskService` class (~300-400 lines):
|
||||||
|
|
||||||
|
- `__init__()` with session + job repo
|
||||||
|
- `submit_media_probe()`, `submit_silence_detect()`, `submit_silence_remove()`, `submit_convert()`, `submit_frame_extract()`, `submit_transcription_generate()`, `submit_captions_generate()`
|
||||||
|
- `record_webhook_event()` — orchestration only, delegates artifact saving
|
||||||
|
- `cancel_job()`
|
||||||
|
|
||||||
|
### 1.5 Constants language
|
||||||
|
|
||||||
|
All user-facing constants in `tasks/constants.py` are in Russian. The current mix of English and Russian (`tasks/service.py:93-103`) is unified to Russian.
|
||||||
|
|
||||||
|
## Phase 2: Shared Helpers in `common/`
|
||||||
|
|
||||||
|
### 2.1 `common/db.py` — Repository helpers
|
||||||
|
|
||||||
|
Three free functions:
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def apply_partial_update(instance: object, data: BaseModel) -> None:
|
||||||
|
"""Apply non-None fields from a Pydantic schema to a SQLAlchemy model."""
|
||||||
|
|
||||||
|
async def commit_and_refresh(session: AsyncSession, instance: object) -> None:
|
||||||
|
"""Commit the session and refresh the instance."""
|
||||||
|
|
||||||
|
async def soft_delete(
|
||||||
|
session: AsyncSession,
|
||||||
|
instance: object,
|
||||||
|
*,
|
||||||
|
field: str = "is_active",
|
||||||
|
value: bool = False,
|
||||||
|
) -> None:
|
||||||
|
"""Mark an instance as soft-deleted and commit.
|
||||||
|
Supports both is_active=False and is_deleted=True patterns."""
|
||||||
|
```
|
||||||
|
|
||||||
|
Soft-delete documentation comment explains which modules use which pattern.
|
||||||
|
No unification of `is_active` vs `is_deleted` in this refactor — that requires a data migration.
|
||||||
|
|
||||||
|
### 2.2 `common/http.py` — Router helpers
|
||||||
|
|
||||||
|
Two free functions + two error constants:
|
||||||
|
|
||||||
|
```python
|
||||||
|
ERROR_NOT_FOUND = "Объект не найден"
|
||||||
|
ERROR_FORBIDDEN = "Доступ запрещён"
|
||||||
|
|
||||||
|
def ensure_found(instance: object | None, detail: str = ERROR_NOT_FOUND) -> None:
|
||||||
|
"""Raise 404 if instance is None."""
|
||||||
|
|
||||||
|
def ensure_owner_or_staff(
|
||||||
|
instance: object,
|
||||||
|
user: User,
|
||||||
|
*,
|
||||||
|
owner_field: str = "owner_id",
|
||||||
|
) -> None:
|
||||||
|
"""Raise 403 if user is not staff and doesn't own the resource."""
|
||||||
|
```
|
||||||
|
|
||||||
|
## Phase 3: Apply Shared Helpers Across All CRUD Modules
|
||||||
|
|
||||||
|
### 3.1 Repository updates (7 modules)
|
||||||
|
|
||||||
|
Replace `model_dump` loop + commit + refresh with `apply_partial_update()` + `commit_and_refresh()`:
|
||||||
|
|
||||||
|
- `files/repository.py`
|
||||||
|
- `media/repository.py` (×2: MediaFile + Artifact)
|
||||||
|
- `projects/repository.py`
|
||||||
|
- `jobs/repository.py` (×2: Job + JobEvent)
|
||||||
|
- `webhooks/repository.py`
|
||||||
|
- `transcription/repository.py`
|
||||||
|
- `captions/repository.py` (special case: `style_config` nested serialization stays explicit)
|
||||||
|
|
||||||
|
### 3.2 Router permission checks (7 modules)
|
||||||
|
|
||||||
|
Replace inline 404/403 checks with `ensure_found()` + `ensure_owner_or_staff()`:
|
||||||
|
|
||||||
|
- `users/router.py` (3 sites)
|
||||||
|
- `projects/router.py` (3 sites)
|
||||||
|
- `files/router.py` (3 sites + storage path check)
|
||||||
|
- `media/router.py` (6 sites: mediafiles + artifacts)
|
||||||
|
- `jobs/router.py` (6 sites: jobs + events)
|
||||||
|
- `webhooks/router.py` (3 sites)
|
||||||
|
- `captions/router.py` (2 sites)
|
||||||
|
|
||||||
|
### 3.3 Add missing permission checks
|
||||||
|
|
||||||
|
While applying helpers, add ownership checks where currently absent:
|
||||||
|
|
||||||
|
| Module | Current state | Fix |
|
||||||
|
|--------|--------------|-----|
|
||||||
|
| `transcription/router.py` | No ownership checks | Add `ensure_owner_or_staff()` via parent project |
|
||||||
|
| `media/router.py` artifacts | `_ = current_user` (ignores user) | Add ownership check via parent media file |
|
||||||
|
| `captions/router.py` get-by-id | Returns any preset by ID | Only return if system preset or owned by requester |
|
||||||
|
| `jobs/router.py` events | No user scoping on events | Add ownership check via parent job |
|
||||||
|
|
||||||
|
## Phase 4: Error Handling, Naming, Dead Code, Security
|
||||||
|
|
||||||
|
### 4.1 Error message constants
|
||||||
|
|
||||||
|
Extract inline strings into `ERROR_` constants at the top of each file:
|
||||||
|
|
||||||
|
- `infrastructure/auth.py` → `ERROR_TOKEN_EXPIRED`, `ERROR_INVALID_TOKEN`, `ERROR_USER_NOT_FOUND`
|
||||||
|
- `users/service.py` → `ERROR_EMAIL_ALREADY_EXISTS`, `ERROR_INVALID_CREDENTIALS`
|
||||||
|
- `users/router.py` → `ERROR_PASSWORD_MISMATCH`, `ERROR_INCORRECT_PASSWORD`
|
||||||
|
- `files/router.py` → `ERROR_FILE_TOO_LARGE`, `ERROR_EMPTY_FILE`
|
||||||
|
- `media/service.py` → `ERROR_PROBE_FAILED`, `ERROR_NO_VIDEO_STREAM`, `ERROR_FFMPEG_FAILED`
|
||||||
|
|
||||||
|
### 4.2 Naming fix
|
||||||
|
|
||||||
|
Rename `project_pct` → `progress_pct` in:
|
||||||
|
- `jobs/models.py`
|
||||||
|
- `jobs/schemas.py` (3 occurrences)
|
||||||
|
- Alembic migration for the DB column rename
|
||||||
|
- Check frontend for field dependency before applying
|
||||||
|
|
||||||
|
### 4.3 Dead code removal
|
||||||
|
|
||||||
|
| Dead code | Location |
|
||||||
|
|-----------|----------|
|
||||||
|
| Legacy service function exports | `users/service.py:60-105`, `projects/service.py:46-69` |
|
||||||
|
| `WordOptions.max_line_count` | `transcription/schemas.py:84-88` |
|
||||||
|
| Duplicate `import json` | `media/service.py:7` (top-level unused) |
|
||||||
|
| `filename_without_ext` variable | `media/service.py:396, 434` |
|
||||||
|
|
||||||
|
### 4.4 Notification completeness
|
||||||
|
|
||||||
|
Add missing entries to `notifications/service.py`:
|
||||||
|
- `JOB_TYPE_LABELS`: add `FRAME_EXTRACT` (Russian label)
|
||||||
|
- `STATUS_TITLES`: add `CANCELLED` (Russian label)
|
||||||
|
- All notification messages must be in Russian
|
||||||
|
|
||||||
|
### 4.5 Duplicate WebSocket auth
|
||||||
|
|
||||||
|
Extract `decode_user_id_from_token(token: str) -> uuid.UUID` in `infrastructure/auth.py`.
|
||||||
|
Both `get_current_user()` and `notifications/router.py` WebSocket handler reuse it.
|
||||||
|
WebSocket handler keeps its own session management.
|
||||||
|
|
||||||
|
### 4.6 Settings fix
|
||||||
|
|
||||||
|
Change `remotion_service_url` default from `http://localhost:8001` to `http://localhost:3001` in `infrastructure/settings.py`.
|
||||||
|
|
||||||
|
### 4.7 Path traversal fix
|
||||||
|
|
||||||
|
Add root boundary check in `files/router.py` local file serving:
|
||||||
|
|
||||||
|
```python
|
||||||
|
full_path = (settings.local_storage_dir / file_path).resolve()
|
||||||
|
root = Path(settings.local_storage_dir).resolve()
|
||||||
|
if not str(full_path).startswith(str(root)):
|
||||||
|
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=ERROR_FORBIDDEN)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.8 Mutable default fix
|
||||||
|
|
||||||
|
Change `media/schemas.py:138` from `streams: list[StreamSchema] = []` to `streams: list[StreamSchema] = Field(default_factory=list)`.
|
||||||
|
|
||||||
|
## Phase 5: Consistent Layering and Tests
|
||||||
|
|
||||||
|
### 5.1 Fix Router → Service → Repository layering
|
||||||
|
|
||||||
|
| Module | Current state | Fix |
|
||||||
|
|--------|--------------|-----|
|
||||||
|
| `media/router.py` | CRUD calls `MediaFileRepository` directly | Route through `MediaService` — add thin CRUD wrappers |
|
||||||
|
| `transcription/router.py` | All routes call `TranscriptionRepository` directly | Route through `TranscriptionService` — add thin CRUD wrappers |
|
||||||
|
| `notifications/router.py` | Uses `NotificationRepository` directly | Route through `NotificationService` — add list/get/mark_read |
|
||||||
|
|
||||||
|
Service methods are pass-through calls to the repository. No new business logic invented. Value is consistent architecture.
|
||||||
|
|
||||||
|
### 5.2 Test additions
|
||||||
|
|
||||||
|
| Test | Reason |
|
||||||
|
|------|--------|
|
||||||
|
| Unit test `_run_actor()` lifecycle (cancel/success/failure) | Core new abstraction |
|
||||||
|
| Unit test `ensure_found()` / `ensure_owner_or_staff()` | Shared helpers |
|
||||||
|
| Unit test `apply_partial_update()` / `commit_and_refresh()` | Shared helpers |
|
||||||
|
| Integration tests for new ownership checks (transcription, media artifacts, caption presets, job events) | New permission enforcement |
|
||||||
|
| Test path traversal prevention on local file route | Security fix |
|
||||||
|
|
||||||
|
No full test rewrites. No new integration test infrastructure.
|
||||||
|
|
||||||
|
### 5.3 Update `CLAUDE.md`
|
||||||
|
|
||||||
|
Reflect the new architecture:
|
||||||
|
- Module file rule updated: extra files allowed when `service.py` would exceed ~300 lines
|
||||||
|
- Document `common/db.py` and `common/http.py`
|
||||||
|
- Document `infrastructure/dramatiq.py`
|
||||||
|
- Document `ensure_found` / `ensure_owner_or_staff` as the standard router permission pattern
|
||||||
|
- Document `apply_partial_update` / `commit_and_refresh` as the standard repository update pattern
|
||||||
|
|
||||||
|
## Files Changed Summary
|
||||||
|
|
||||||
|
### New files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `cpv3/common/db.py` | Repository helpers: apply_partial_update, commit_and_refresh, soft_delete |
|
||||||
|
| `cpv3/common/http.py` | Router helpers: ensure_found, ensure_owner_or_staff, ERROR_ constants |
|
||||||
|
| `cpv3/infrastructure/dramatiq.py` | RedisBroker setup, broker-id serialization, cancellation check |
|
||||||
|
| `cpv3/modules/tasks/actors.py` | All @dramatiq.actor functions + _run_actor() lifecycle helper |
|
||||||
|
| `cpv3/modules/tasks/artifacts.py` | Artifact persistence helpers (moved from tasks/service.py) |
|
||||||
|
| `cpv3/modules/tasks/webhooks.py` | Webhook transport helpers |
|
||||||
|
| `cpv3/modules/tasks/constants.py` | All ERROR_, MESSAGE_, STATUS_ constants (Russian) |
|
||||||
|
|
||||||
|
### Modified files (by phase)
|
||||||
|
|
||||||
|
**Phase 1 — tasks decomposition:**
|
||||||
|
- `cpv3/modules/tasks/service.py` — stripped to TaskService class only
|
||||||
|
- `cpv3/modules/transcription/service.py` — receives `save_artifacts()`
|
||||||
|
- `cpv3/modules/media/service.py` — receives `save_artifacts()`
|
||||||
|
- `cpv3/modules/captions/service.py` — receives `save_artifacts()`
|
||||||
|
|
||||||
|
**Phase 3 — apply shared helpers:**
|
||||||
|
- All 7 module `repository.py` files — use `apply_partial_update()` + `commit_and_refresh()`
|
||||||
|
- All 7 module `router.py` files — use `ensure_found()` + `ensure_owner_or_staff()`
|
||||||
|
|
||||||
|
**Phase 4 — error handling, cleanup:**
|
||||||
|
- `cpv3/infrastructure/auth.py` — error constants + shared token decoder
|
||||||
|
- `cpv3/infrastructure/settings.py` — fix remotion port default
|
||||||
|
- `cpv3/modules/users/service.py` — error constants + remove dead exports
|
||||||
|
- `cpv3/modules/users/router.py` — error constants
|
||||||
|
- `cpv3/modules/projects/service.py` — remove dead exports
|
||||||
|
- `cpv3/modules/files/router.py` — path traversal fix + error constants
|
||||||
|
- `cpv3/modules/media/service.py` — error constants + dead code removal
|
||||||
|
- `cpv3/modules/media/schemas.py` — mutable default fix
|
||||||
|
- `cpv3/modules/transcription/schemas.py` — remove dead `max_line_count`
|
||||||
|
- `cpv3/modules/notifications/service.py` — complete label maps (Russian)
|
||||||
|
- `cpv3/modules/notifications/router.py` — use shared token decoder
|
||||||
|
- `cpv3/modules/jobs/models.py` — rename `project_pct` → `progress_pct`
|
||||||
|
- `cpv3/modules/jobs/schemas.py` — rename `project_pct` → `progress_pct`
|
||||||
|
|
||||||
|
**Phase 5 — layering + tests:**
|
||||||
|
- `cpv3/modules/media/router.py` — route through service
|
||||||
|
- `cpv3/modules/transcription/router.py` — route through service
|
||||||
|
- `cpv3/modules/notifications/router.py` — route through service
|
||||||
|
- `cofee_backend/CLAUDE.md` — updated architecture docs
|
||||||
|
- New test files for helpers and permission checks
|
||||||
|
|
||||||
|
### Alembic migration
|
||||||
|
|
||||||
|
One migration for `project_pct` → `progress_pct` column rename in jobs table.
|
||||||
|
|
||||||
|
## Risks and Mitigations
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|-----------|
|
||||||
|
| Actor refactor breaks background processing | Test `_run_actor()` lifecycle thoroughly before deploying. Keep old actors available on a branch for rollback. |
|
||||||
|
| `progress_pct` rename breaks frontend | Check frontend API types for `project_pct` usage before applying migration. Coordinate with frontend update. |
|
||||||
|
| Missing auth checks break existing workflows | New permission checks only restrict access that was unintentionally open. Test with real user flows. |
|
||||||
|
| Shared helpers become a dumping ground | Keep `common/db.py` and `common/http.py` minimal. New helpers need clear justification. |
|
||||||
Reference in New Issue
Block a user