diff --git a/docs/superpowers/specs/2026-03-15-backend-comprehensive-refactor-design.md b/docs/superpowers/specs/2026-03-15-backend-comprehensive-refactor-design.md index 692ce3b..586d063 100644 --- a/docs/superpowers/specs/2026-03-15-backend-comprehensive-refactor-design.md +++ b/docs/superpowers/specs/2026-03-15-backend-comprehensive-refactor-design.md @@ -34,12 +34,13 @@ cpv3/modules/tasks/ ├── schemas.py (unchanged) ├── router.py (unchanged) ├── service.py TaskService class only: submit_* methods + record_webhook_event + cancel_job -├── actors.py NEW: all @dramatiq.actor functions, each using _run_actor() -├── artifacts.py NEW: _save_transcription_artifacts, _save_convert_artifacts, _save_captions_artifacts +├── actors.py NEW: all @dramatiq.actor functions, _run_actor() helper, _run_async(), _parse_frame_rate() ├── webhooks.py NEW: webhook transport (_send_webhook_event, _build_webhook_url, event helpers) └── constants.py NEW: ERROR_, MESSAGE_, STATUS_ constants, JobCancelledError, all in Russian ``` +**Note:** `tasks/artifacts.py` is NOT created. Artifact persistence moves directly to the owning modules (see Section 1.3). + New infrastructure file: ``` @@ -48,11 +49,18 @@ cpv3/infrastructure/dramatiq.py - _serialize_broker_reference() / _parse_broker_reference() - _get_job_status_sync() (raw psycopg2 cancellation check) - _raise_if_job_cancelled() + - _run_async() (sync wrapper for running async code in worker context) ``` +**Important:** `infrastructure/dramatiq.py` MUST be imported before any actor module to ensure the broker is set. Add `import cpv3.infrastructure.dramatiq` at the top of `tasks/actors.py`. The Dramatiq worker command changes from `uv run dramatiq cpv3.modules.tasks.service` to `uv run dramatiq cpv3.modules.tasks.actors`. Update `CLAUDE.md`, `docker-compose.yml`, and any deployment configs accordingly. + ### 1.2 The `_run_actor()` pattern -One helper function replaces the repeated lifecycle skeleton across all 8 actors: +Two variants: a simple lifecycle helper for straightforward actors, and manual lifecycle management for complex actors that need intermediate progress or polling. + +#### Simple actors (5 of 8): `_run_actor()` + +Used by: `media_probe_actor`, `silence_remove_actor`, `silence_detect_actor`, `silence_apply_actor`, `media_convert_actor` ```python def _run_actor( @@ -63,7 +71,12 @@ def _run_actor( starting_message: str = MESSAGE_STARTING, **work_kwargs: Any, ) -> None: - """Shared actor lifecycle: cancel-check → RUNNING → work → DONE/FAILED.""" + """Shared actor lifecycle: cancel-check → RUNNING → work → DONE/FAILED. + + work_fn is synchronous. For async work, use _run_async() inside work_fn. + work_fn receives job_uuid (uuid.UUID) and webhook_url (str) as kwargs + plus any additional **work_kwargs. + """ job_uuid = uuid.UUID(job_id) try: _raise_if_job_cancelled(job_uuid) @@ -95,7 +108,19 @@ def _run_actor( )) ``` -Each actor becomes ~5-15 lines defining only its unique work function and parameters. +Each simple actor becomes ~5-15 lines defining only its unique work function and parameters. + +#### Complex actors (3 of 8): manual lifecycle with shared helpers + +`transcription_generate_actor`, `captions_generate_actor`, and `frame_extract_actor` do NOT use `_run_actor()` because they need: +- Intermediate progress webhooks from inside the work function +- Polling loops with `time.sleep()` and repeated cancellation checks (`captions_generate`) +- Progress callbacks from external tools (`_on_whisper_progress` in transcription) +- Multi-phase status updates (e.g., "deleting old frames → extracting → uploading" in frame_extract) + +These actors keep their own lifecycle management but use the shared helpers from `tasks/webhooks.py` (`_send_webhook_event`) and `infrastructure/dramatiq.py` (`_raise_if_job_cancelled`). They still benefit from the decomposition because constants, webhook transport, and broker infrastructure are no longer duplicated — just the lifecycle wrapper is actor-specific. + +Approximate line reduction: ~40-60 lines each (down from ~100-130) due to extracted constants, imports, and webhook helpers. ### 1.3 Artifact persistence → owning modules @@ -124,28 +149,27 @@ All user-facing constants in `tasks/constants.py` are in Russian. The current mi ### 2.1 `common/db.py` — Repository helpers -Three free functions: +Two free functions: ```python -async def apply_partial_update(instance: object, data: BaseModel) -> None: - """Apply non-None fields from a Pydantic schema to a SQLAlchemy model.""" +def apply_partial_update(instance: object, data: BaseModel) -> None: + """Apply non-None fields from a Pydantic schema to a SQLAlchemy model. + Synchronous — no I/O, just attribute setting.""" async def commit_and_refresh(session: AsyncSession, instance: object) -> None: """Commit the session and refresh the instance.""" - -async def soft_delete( - session: AsyncSession, - instance: object, - *, - field: str = "is_active", - value: bool = False, -) -> None: - """Mark an instance as soft-deleted and commit. - Supports both is_active=False and is_deleted=True patterns.""" ``` -Soft-delete documentation comment explains which modules use which pattern. -No unification of `is_active` vs `is_deleted` in this refactor — that requires a data migration. +**Note:** `apply_partial_update` is deliberately synchronous — it only does `model_dump` + `setattr`, no I/O. `soft_delete` is NOT extracted as a shared helper — it's a one-liner (`setattr` + commit) and the `is_active` vs `is_deleted` split makes a generic helper more confusing than helpful. Each repository keeps its own delete method with a comment noting which soft-delete pattern it uses. + +Documentation comment in `common/db.py` explains the two soft-delete patterns: +```python +# Soft-delete patterns in the codebase: +# - is_active=False: users, projects, jobs, webhooks, transcription, captions +# - is_deleted=True: files, media (MediaFile, ArtifactMediaFile) +# NOT unified in this refactor — requires data migration. +# Each repository handles its own soft-delete; see the module's repository.py. +``` ### 2.2 `common/http.py` — Router helpers @@ -221,6 +245,10 @@ Extract inline strings into `ERROR_` constants at the top of each file: Rename `project_pct` → `progress_pct` in: - `jobs/models.py` - `jobs/schemas.py` (3 occurrences) +- `tasks/service.py` (references in `record_webhook_event` and submit methods) +- `tasks/router.py` (maps job field to response) +- `notifications/service.py` (reads progress for notification display) +- `tests/integration/test_jobs_endpoints.py` (test data and assertions) - Alembic migration for the DB column rename - Check frontend for field dependency before applying @@ -229,9 +257,10 @@ Rename `project_pct` → `progress_pct` in: | Dead code | Location | |-----------|----------| | Legacy service function exports | `users/service.py:60-105`, `projects/service.py:46-69` | -| `WordOptions.max_line_count` | `transcription/schemas.py:84-88` | -| Duplicate `import json` | `media/service.py:7` (top-level unused) | -| `filename_without_ext` variable | `media/service.py:396, 434` | +| `WordOptions.max_line_count` field | `transcription/schemas.py` — field defined but never read in any code path. Check frontend API types before removing to confirm it's not sent by the client. If frontend sends it, keep the field but add a `# TODO: implement` comment. | +| `filename_without_ext` dead assignment | `media/service.py:434` — the variable is assigned but never used at this line (earlier assignments at ~396 are valid) | + +**Correction:** The top-level `import json` in `media/service.py:7` is NOT dead — `json.loads()` and `json.dumps()` are used at multiple points. Do not remove it. ### 4.4 Notification completeness @@ -304,11 +333,10 @@ Reflect the new architecture: | File | Purpose | |------|---------| -| `cpv3/common/db.py` | Repository helpers: apply_partial_update, commit_and_refresh, soft_delete | +| `cpv3/common/db.py` | Repository helpers: apply_partial_update, commit_and_refresh | | `cpv3/common/http.py` | Router helpers: ensure_found, ensure_owner_or_staff, ERROR_ constants | -| `cpv3/infrastructure/dramatiq.py` | RedisBroker setup, broker-id serialization, cancellation check | -| `cpv3/modules/tasks/actors.py` | All @dramatiq.actor functions + _run_actor() lifecycle helper | -| `cpv3/modules/tasks/artifacts.py` | Artifact persistence helpers (moved from tasks/service.py) | +| `cpv3/infrastructure/dramatiq.py` | RedisBroker setup, broker-id serialization, cancellation check, _run_async() | +| `cpv3/modules/tasks/actors.py` | All @dramatiq.actor functions + _run_actor() lifecycle helper + _parse_frame_rate() | | `cpv3/modules/tasks/webhooks.py` | Webhook transport helpers | | `cpv3/modules/tasks/constants.py` | All ERROR_, MESSAGE_, STATUS_ constants (Russian) | @@ -355,6 +383,7 @@ One migration for `project_pct` → `progress_pct` column rename in jobs table. | Risk | Mitigation | |------|-----------| | Actor refactor breaks background processing | Test `_run_actor()` lifecycle thoroughly before deploying. Keep old actors available on a branch for rollback. | +| Dramatiq broker import order | `infrastructure/dramatiq.py` must be imported before any actor module. Enforce via explicit import at top of `tasks/actors.py`. Worker command changes to `uv run dramatiq cpv3.modules.tasks.actors`. | | `progress_pct` rename breaks frontend | Check frontend API types for `project_pct` usage before applying migration. Coordinate with frontend update. | | Missing auth checks break existing workflows | New permission checks only restrict access that was unintentionally open. Test with real user flows. | | Shared helpers become a dumping ground | Keep `common/db.py` and `common/http.py` minimal. New helpers need clear justification. |