394 lines
13 KiB
Markdown
394 lines
13 KiB
Markdown
# CODEX Refactor Plan
|
||
|
||
## Chosen direction
|
||
|
||
This plan adopts **Option 1: in-place cleanup inside current boundaries**.
|
||
|
||
The goal is to make backend feature work safer and the code easier to read **without moving far from the current implementation**.
|
||
|
||
This means:
|
||
|
||
- keep the existing module structure (`models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`)
|
||
- keep public HTTP routes and response contracts stable
|
||
- avoid hidden abstractions, base classes, registries, or plugin-style indirection
|
||
- prefer explicit, local helpers over framework-like refactors
|
||
- optimize first for **developer readability and safer future changes**
|
||
|
||
## Why this option
|
||
|
||
The biggest current readability and change-risk hotspot is `cpv3/modules/tasks/service.py`.
|
||
|
||
Right now it mixes several responsibilities in one place:
|
||
|
||
- broker/bootstrap concerns
|
||
- Dramatiq actor definitions
|
||
- task submission flow
|
||
- job lifecycle side effects
|
||
- webhook application
|
||
- artifact persistence
|
||
- notification sending
|
||
- cancellation side effects
|
||
|
||
The highest concentration of risk is in:
|
||
|
||
- `cpv3/modules/tasks/service.py` — especially `TaskService` and `record_webhook_event()`
|
||
- `cpv3/modules/jobs/router.py` — cancellation path delegates into task orchestration
|
||
- `cpv3/modules/captions/service.py` — already contains both preset CRUD and Remotion client logic
|
||
|
||
A bigger ownership move would be possible later, but it is not the safest first step.
|
||
|
||
## Primary objectives
|
||
|
||
1. Make task/job orchestration easier to understand.
|
||
2. Reduce the blast radius of changes in `TaskService`.
|
||
3. Replace raw internal JSON/dict handling with typed, explicit internal contracts.
|
||
4. Keep behavior stable for current API consumers and existing module boundaries.
|
||
5. Improve confidence with focused unit and integration coverage before and during refactoring.
|
||
|
||
## Constraints
|
||
|
||
### Must keep
|
||
|
||
- Current backend module structure.
|
||
- Current route layout and public API behavior.
|
||
- Current job types and overall task flow.
|
||
- Current ownership model at module level.
|
||
- Explicit code over clever abstractions.
|
||
|
||
### Allowed
|
||
|
||
- Small internal contract cleanup.
|
||
- Small private helper extraction inside existing files.
|
||
- Better naming, smaller methods, and clearer sequencing.
|
||
- Add or expand tests.
|
||
|
||
### Not allowed in this phase
|
||
|
||
- New module subdirectories or extra architecture layers.
|
||
- Generic handler registries.
|
||
- Base service hierarchies.
|
||
- Large movement of lifecycle ownership from `tasks` into `jobs`.
|
||
- Pushing more orchestration into `captions`, `media`, or `transcription` before task contracts are cleaned up.
|
||
- Broad request/response or frontend-facing contract changes.
|
||
|
||
## Refactor scope
|
||
|
||
### In scope
|
||
|
||
- `cpv3/modules/tasks/service.py`
|
||
- `cpv3/modules/tasks/schemas.py`
|
||
- `cpv3/modules/tasks/router.py` where needed for clarity only
|
||
- `cpv3/modules/jobs/router.py` where needed for clearer delegation only
|
||
- tests around task submission, webhook handling, cancellation, and caption task behavior
|
||
|
||
### Nearby modules to touch carefully if needed
|
||
|
||
- `cpv3/modules/captions/service.py`
|
||
- `cpv3/modules/media/service.py`
|
||
- `cpv3/modules/transcription/repository.py`
|
||
- `cpv3/modules/jobs/service.py`
|
||
|
||
These are supporting touches only. The refactor should remain centered in `tasks`.
|
||
|
||
## Target end state
|
||
|
||
After this phase:
|
||
|
||
- `TaskService` still owns task orchestration,
|
||
- but each major step is explicit and small,
|
||
- stored job payloads are validated through typed internal schemas,
|
||
- webhook application is broken into readable stages,
|
||
- repeated submission logic is reduced through simple private helpers,
|
||
- actor flow is easier to scan,
|
||
- tests cover the critical seams that future feature work will depend on.
|
||
|
||
`tasks/service.py` may still be large after this phase, but it should stop being a "many responsibilities blurred together" file.
|
||
|
||
## Main decisions
|
||
|
||
### 1. Keep orchestration in `tasks` for now
|
||
|
||
Do **not** move job lifecycle ownership into `jobs/service.py` in this phase.
|
||
|
||
Reason:
|
||
|
||
- it is a valid long-term direction,
|
||
- but it changes semantic ownership of cancellation and webhook application,
|
||
- which raises regression risk before the current flow is properly typed and decomposed.
|
||
|
||
### 2. Add typed internal payload schemas
|
||
|
||
Add explicit internal schemas in `cpv3/modules/tasks/schemas.py` for per-job stored payloads.
|
||
|
||
These schemas are for internal readability and validation only.
|
||
|
||
They should:
|
||
|
||
- represent the current JSON shape stored in `job.input_data` / `job.output_data`
|
||
- remain compatible with existing persisted data
|
||
- avoid forcing a database migration in this phase
|
||
- be explicit per job type rather than generic
|
||
|
||
Examples of the kind of models expected:
|
||
|
||
- media probe input/output
|
||
- silence detect input/output
|
||
- silence apply input/output
|
||
- media convert input/output
|
||
- transcription generate input/output
|
||
- frame extract input/output
|
||
- captions generate input/output
|
||
|
||
Important: the database can still store JSON, but `TaskService` should stop passing raw dicts around when internal meaning matters.
|
||
|
||
### 3. Split `record_webhook_event()` into explicit stages
|
||
|
||
Refactor `record_webhook_event()` into small private methods with one purpose each.
|
||
|
||
Target shape:
|
||
|
||
- load and validate job state
|
||
- ignore terminal jobs when appropriate
|
||
- apply status update
|
||
- append job event row
|
||
- run job-type-specific done hooks
|
||
- send notification
|
||
- return updated job
|
||
|
||
This keeps behavior in one place while making the sequencing obvious.
|
||
|
||
### 4. Reduce duplication only where it is obvious
|
||
|
||
Allowed helper extraction:
|
||
|
||
- output folder resolution
|
||
- broker reference update
|
||
- source file lookup/create flow
|
||
- artifact persistence helpers
|
||
- notification dispatch helper
|
||
- common actor start/fail envelope if the resulting code is still explicit
|
||
|
||
Do **not** introduce a generic task framework.
|
||
|
||
### 5. Keep domain persistence mostly where it is today
|
||
|
||
Do not push transcription/media/captions result persistence back into their modules yet.
|
||
|
||
Reason:
|
||
|
||
- domain ownership may improve long-term,
|
||
- but doing it before task contracts are typed risks scattering orchestration across several large files.
|
||
|
||
This phase should first make the existing orchestration understandable.
|
||
|
||
## Execution phases
|
||
|
||
### Phase 0 — Lock behavior with tests
|
||
|
||
Before changing the flow, expand tests around the current seams.
|
||
|
||
Priority coverage:
|
||
|
||
- `tests/unit/test_task_service.py`
|
||
- duplicate active job reuse
|
||
- terminal job webhook ignore behavior
|
||
- webhook event update sequencing
|
||
- artifact persistence trigger points
|
||
- cancellation path behavior
|
||
- `tests/unit/test_caption_tasks.py`
|
||
- caption render fallback behavior
|
||
- callback confirmation behavior
|
||
- cancellation-related safety where relevant
|
||
- integration coverage where current behavior is externally visible
|
||
- `tests/integration/test_jobs_endpoints.py`
|
||
- `tests/integration/test_captions_endpoints.py` if affected
|
||
|
||
The goal is not broad test expansion. The goal is to pin the behavior of the exact code being refactored.
|
||
|
||
### Phase 1 — Introduce typed internal task payloads
|
||
|
||
Add internal schemas in `cpv3/modules/tasks/schemas.py` for job payloads.
|
||
|
||
Implementation rules:
|
||
|
||
- keep schema names explicit per job type
|
||
- support current field names
|
||
- make optional fields tolerant enough for existing rows where necessary
|
||
- use these schemas at service boundaries where `input_data` and `output_data` are read or produced
|
||
|
||
Expected outcome:
|
||
|
||
- internal payload meaning becomes visible from types instead of being inferred from raw dict access
|
||
- future changes become easier to review safely
|
||
|
||
### Phase 2 — Decompose webhook application flow
|
||
|
||
Refactor `TaskService.record_webhook_event()` into a sequence of small private methods.
|
||
|
||
Recommended decomposition:
|
||
|
||
- `_load_job_for_webhook(...)`
|
||
- `_should_ignore_webhook(...)`
|
||
- `_apply_job_status_update(...)`
|
||
- `_create_job_event(...)`
|
||
- `_run_done_side_effects(...)`
|
||
- `_send_job_notification(...)`
|
||
|
||
Method names may differ, but the responsibilities should remain this explicit.
|
||
|
||
Important:
|
||
|
||
- keep sequencing behavior unchanged
|
||
- preserve the current order where artifacts are saved before notifications when that order matters
|
||
|
||
### Phase 3 — Normalize submission helpers
|
||
|
||
Refactor repetitive submit methods without hiding behavior.
|
||
|
||
The submit methods should remain explicit, but repeated steps can be centralized where the code becomes clearer:
|
||
|
||
- resolve user output folder
|
||
- prepare actor kwargs
|
||
- create job + webhook
|
||
- enqueue actor
|
||
- persist broker reference
|
||
- build submit response
|
||
|
||
The final submit methods should still read as business flows, not as thin wrappers over a generic registry.
|
||
|
||
### Phase 4 — Clean actor envelope duplication
|
||
|
||
If still useful after earlier phases, standardize only the repeated envelope around actors:
|
||
|
||
- cancellation check
|
||
- running event emission
|
||
- success/failure reporting
|
||
|
||
This should stay simple and local.
|
||
|
||
If the helper makes actor behavior harder to follow, do not extract it.
|
||
|
||
### Phase 5 — Reassess, do not auto-expand scope
|
||
|
||
After the in-place cleanup is done, reassess whether a second phase is justified.
|
||
|
||
Only then consider:
|
||
|
||
- moving lifecycle semantics into `jobs`
|
||
- pushing result persistence into owning modules
|
||
|
||
Those are explicitly **not** part of the current plan.
|
||
|
||
## File-by-file plan
|
||
|
||
### `cpv3/modules/tasks/schemas.py`
|
||
|
||
- Add explicit internal payload schemas for job input/output.
|
||
- Keep public API request/response schemas stable.
|
||
- Prefer additive internal typing over schema churn.
|
||
|
||
### `cpv3/modules/tasks/service.py`
|
||
|
||
- Keep this file as the orchestration center for this phase.
|
||
- Decompose large methods into readable private steps.
|
||
- Replace raw internal dict usage with typed parsing/serialization where the meaning matters.
|
||
- Reduce repeated code only through local, explicit helpers.
|
||
|
||
### `cpv3/modules/tasks/router.py`
|
||
|
||
- Keep routes unchanged.
|
||
- Only adjust delegation or error handling if required to support clearer internal service boundaries.
|
||
|
||
### `cpv3/modules/jobs/router.py`
|
||
|
||
- Keep the current cancel entrypoint behavior.
|
||
- If touched, only make delegation clearer.
|
||
- Do not move cancel semantics into `jobs/service.py` in this phase.
|
||
|
||
### `cpv3/modules/captions/service.py`
|
||
|
||
- Keep current ownership.
|
||
- Do not add more orchestration here in phase 1.
|
||
- Only make supporting adjustments if typed payload work needs a clearer boundary.
|
||
|
||
### `cpv3/modules/media/service.py`
|
||
|
||
- Keep media processing helpers as-is unless task cleanup reveals a very small, obvious extraction need.
|
||
- Avoid expanding this file into a second orchestration center.
|
||
|
||
## Risk controls
|
||
|
||
### Preserve persisted JSON compatibility
|
||
|
||
Existing jobs may already contain payloads in the database.
|
||
|
||
Therefore:
|
||
|
||
- typed internal schemas must support current persisted shapes
|
||
- avoid renaming stored keys unless there is a very strong reason
|
||
- prefer backward-compatible parsing over cleanup for cleanup’s sake
|
||
|
||
### Preserve side-effect ordering
|
||
|
||
Current flow saves some artifacts before notification dispatch.
|
||
|
||
That ordering matters because clients may refetch immediately after notification.
|
||
|
||
Do not change this ordering unless there is a proven reason and test coverage for the new behavior.
|
||
|
||
### Avoid ownership churn during cleanup
|
||
|
||
This phase is about readability and safer edits, not about redistributing the architecture.
|
||
|
||
If a change starts to feel like "this really belongs in another module," mark it for reassessment after phase 1 instead of expanding scope immediately.
|
||
|
||
## Verification strategy
|
||
|
||
For implementation of this plan, verification should include:
|
||
|
||
- targeted unit tests for `TaskService`
|
||
- targeted unit tests for caption task actor behavior
|
||
- relevant integration tests for jobs/tasks endpoints
|
||
- Ruff checks for touched files
|
||
|
||
The most important verification is behavioral:
|
||
|
||
- task submission still works
|
||
- duplicate job reuse still works
|
||
- webhook updates still drive correct job state
|
||
- artifact persistence still happens before notification where required
|
||
- cancellation still behaves as before
|
||
|
||
## Definition of done for this phase
|
||
|
||
Phase 1 is complete when all of the following are true:
|
||
|
||
- internal job payload handling is typed where it matters
|
||
- `record_webhook_event()` is decomposed into explicit, readable steps
|
||
- repetitive submit flow is reduced without introducing hidden abstractions
|
||
- tests protect the main task orchestration seams
|
||
- public APIs remain effectively unchanged
|
||
- the backend is easier to change safely without rediscovering task flow each time
|
||
|
||
## Explicit non-goals
|
||
|
||
This plan does **not** include:
|
||
|
||
- moving lifecycle ownership into `jobs/service.py`
|
||
- redesigning the task system
|
||
- introducing a generic abstraction layer for tasks
|
||
- moving artifact persistence into every owning domain module
|
||
- broad route redesign
|
||
- frontend contract changes
|
||
- module structure changes
|
||
|
||
## Follow-up after this phase
|
||
|
||
Once this plan is implemented and stable, reassess whether there is enough remaining pressure to justify a second phase.
|
||
|
||
If yes, the next candidate should be:
|
||
|
||
- **Option 2** selectively, and only where phase 1 proves that lifecycle ownership is still the main source of friction.
|
||
|
||
Until then, keep the refactor intentionally conservative.
|