13 KiB
CODEX Refactor Plan
Chosen direction
This plan adopts Option 1: in-place cleanup inside current boundaries.
The goal is to make backend feature work safer and the code easier to read without moving far from the current implementation.
This means:
- keep the existing module structure (
models.py,schemas.py,repository.py,service.py,router.py) - keep public HTTP routes and response contracts stable
- avoid hidden abstractions, base classes, registries, or plugin-style indirection
- prefer explicit, local helpers over framework-like refactors
- optimize first for developer readability and safer future changes
Why this option
The biggest current readability and change-risk hotspot is cpv3/modules/tasks/service.py.
Right now it mixes several responsibilities in one place:
- broker/bootstrap concerns
- Dramatiq actor definitions
- task submission flow
- job lifecycle side effects
- webhook application
- artifact persistence
- notification sending
- cancellation side effects
The highest concentration of risk is in:
cpv3/modules/tasks/service.py— especiallyTaskServiceandrecord_webhook_event()cpv3/modules/jobs/router.py— cancellation path delegates into task orchestrationcpv3/modules/captions/service.py— already contains both preset CRUD and Remotion client logic
A bigger ownership move would be possible later, but it is not the safest first step.
Primary objectives
- Make task/job orchestration easier to understand.
- Reduce the blast radius of changes in
TaskService. - Replace raw internal JSON/dict handling with typed, explicit internal contracts.
- Keep behavior stable for current API consumers and existing module boundaries.
- Improve confidence with focused unit and integration coverage before and during refactoring.
Constraints
Must keep
- Current backend module structure.
- Current route layout and public API behavior.
- Current job types and overall task flow.
- Current ownership model at module level.
- Explicit code over clever abstractions.
Allowed
- Small internal contract cleanup.
- Small private helper extraction inside existing files.
- Better naming, smaller methods, and clearer sequencing.
- Add or expand tests.
Not allowed in this phase
- New module subdirectories or extra architecture layers.
- Generic handler registries.
- Base service hierarchies.
- Large movement of lifecycle ownership from
tasksintojobs. - Pushing more orchestration into
captions,media, ortranscriptionbefore task contracts are cleaned up. - Broad request/response or frontend-facing contract changes.
Refactor scope
In scope
cpv3/modules/tasks/service.pycpv3/modules/tasks/schemas.pycpv3/modules/tasks/router.pywhere needed for clarity onlycpv3/modules/jobs/router.pywhere needed for clearer delegation only- tests around task submission, webhook handling, cancellation, and caption task behavior
Nearby modules to touch carefully if needed
cpv3/modules/captions/service.pycpv3/modules/media/service.pycpv3/modules/transcription/repository.pycpv3/modules/jobs/service.py
These are supporting touches only. The refactor should remain centered in tasks.
Target end state
After this phase:
TaskServicestill owns task orchestration,- but each major step is explicit and small,
- stored job payloads are validated through typed internal schemas,
- webhook application is broken into readable stages,
- repeated submission logic is reduced through simple private helpers,
- actor flow is easier to scan,
- tests cover the critical seams that future feature work will depend on.
tasks/service.py may still be large after this phase, but it should stop being a "many responsibilities blurred together" file.
Main decisions
1. Keep orchestration in tasks for now
Do not move job lifecycle ownership into jobs/service.py in this phase.
Reason:
- it is a valid long-term direction,
- but it changes semantic ownership of cancellation and webhook application,
- which raises regression risk before the current flow is properly typed and decomposed.
2. Add typed internal payload schemas
Add explicit internal schemas in cpv3/modules/tasks/schemas.py for per-job stored payloads.
These schemas are for internal readability and validation only.
They should:
- represent the current JSON shape stored in
job.input_data/job.output_data - remain compatible with existing persisted data
- avoid forcing a database migration in this phase
- be explicit per job type rather than generic
Examples of the kind of models expected:
- media probe input/output
- silence detect input/output
- silence apply input/output
- media convert input/output
- transcription generate input/output
- frame extract input/output
- captions generate input/output
Important: the database can still store JSON, but TaskService should stop passing raw dicts around when internal meaning matters.
3. Split record_webhook_event() into explicit stages
Refactor record_webhook_event() into small private methods with one purpose each.
Target shape:
- load and validate job state
- ignore terminal jobs when appropriate
- apply status update
- append job event row
- run job-type-specific done hooks
- send notification
- return updated job
This keeps behavior in one place while making the sequencing obvious.
4. Reduce duplication only where it is obvious
Allowed helper extraction:
- output folder resolution
- broker reference update
- source file lookup/create flow
- artifact persistence helpers
- notification dispatch helper
- common actor start/fail envelope if the resulting code is still explicit
Do not introduce a generic task framework.
5. Keep domain persistence mostly where it is today
Do not push transcription/media/captions result persistence back into their modules yet.
Reason:
- domain ownership may improve long-term,
- but doing it before task contracts are typed risks scattering orchestration across several large files.
This phase should first make the existing orchestration understandable.
Execution phases
Phase 0 — Lock behavior with tests
Before changing the flow, expand tests around the current seams.
Priority coverage:
tests/unit/test_task_service.py- duplicate active job reuse
- terminal job webhook ignore behavior
- webhook event update sequencing
- artifact persistence trigger points
- cancellation path behavior
tests/unit/test_caption_tasks.py- caption render fallback behavior
- callback confirmation behavior
- cancellation-related safety where relevant
- integration coverage where current behavior is externally visible
tests/integration/test_jobs_endpoints.pytests/integration/test_captions_endpoints.pyif affected
The goal is not broad test expansion. The goal is to pin the behavior of the exact code being refactored.
Phase 1 — Introduce typed internal task payloads
Add internal schemas in cpv3/modules/tasks/schemas.py for job payloads.
Implementation rules:
- keep schema names explicit per job type
- support current field names
- make optional fields tolerant enough for existing rows where necessary
- use these schemas at service boundaries where
input_dataandoutput_dataare read or produced
Expected outcome:
- internal payload meaning becomes visible from types instead of being inferred from raw dict access
- future changes become easier to review safely
Phase 2 — Decompose webhook application flow
Refactor TaskService.record_webhook_event() into a sequence of small private methods.
Recommended decomposition:
_load_job_for_webhook(...)_should_ignore_webhook(...)_apply_job_status_update(...)_create_job_event(...)_run_done_side_effects(...)_send_job_notification(...)
Method names may differ, but the responsibilities should remain this explicit.
Important:
- keep sequencing behavior unchanged
- preserve the current order where artifacts are saved before notifications when that order matters
Phase 3 — Normalize submission helpers
Refactor repetitive submit methods without hiding behavior.
The submit methods should remain explicit, but repeated steps can be centralized where the code becomes clearer:
- resolve user output folder
- prepare actor kwargs
- create job + webhook
- enqueue actor
- persist broker reference
- build submit response
The final submit methods should still read as business flows, not as thin wrappers over a generic registry.
Phase 4 — Clean actor envelope duplication
If still useful after earlier phases, standardize only the repeated envelope around actors:
- cancellation check
- running event emission
- success/failure reporting
This should stay simple and local.
If the helper makes actor behavior harder to follow, do not extract it.
Phase 5 — Reassess, do not auto-expand scope
After the in-place cleanup is done, reassess whether a second phase is justified.
Only then consider:
- moving lifecycle semantics into
jobs - pushing result persistence into owning modules
Those are explicitly not part of the current plan.
File-by-file plan
cpv3/modules/tasks/schemas.py
- Add explicit internal payload schemas for job input/output.
- Keep public API request/response schemas stable.
- Prefer additive internal typing over schema churn.
cpv3/modules/tasks/service.py
- Keep this file as the orchestration center for this phase.
- Decompose large methods into readable private steps.
- Replace raw internal dict usage with typed parsing/serialization where the meaning matters.
- Reduce repeated code only through local, explicit helpers.
cpv3/modules/tasks/router.py
- Keep routes unchanged.
- Only adjust delegation or error handling if required to support clearer internal service boundaries.
cpv3/modules/jobs/router.py
- Keep the current cancel entrypoint behavior.
- If touched, only make delegation clearer.
- Do not move cancel semantics into
jobs/service.pyin this phase.
cpv3/modules/captions/service.py
- Keep current ownership.
- Do not add more orchestration here in phase 1.
- Only make supporting adjustments if typed payload work needs a clearer boundary.
cpv3/modules/media/service.py
- Keep media processing helpers as-is unless task cleanup reveals a very small, obvious extraction need.
- Avoid expanding this file into a second orchestration center.
Risk controls
Preserve persisted JSON compatibility
Existing jobs may already contain payloads in the database.
Therefore:
- typed internal schemas must support current persisted shapes
- avoid renaming stored keys unless there is a very strong reason
- prefer backward-compatible parsing over cleanup for cleanup’s sake
Preserve side-effect ordering
Current flow saves some artifacts before notification dispatch.
That ordering matters because clients may refetch immediately after notification.
Do not change this ordering unless there is a proven reason and test coverage for the new behavior.
Avoid ownership churn during cleanup
This phase is about readability and safer edits, not about redistributing the architecture.
If a change starts to feel like "this really belongs in another module," mark it for reassessment after phase 1 instead of expanding scope immediately.
Verification strategy
For implementation of this plan, verification should include:
- targeted unit tests for
TaskService - targeted unit tests for caption task actor behavior
- relevant integration tests for jobs/tasks endpoints
- Ruff checks for touched files
The most important verification is behavioral:
- task submission still works
- duplicate job reuse still works
- webhook updates still drive correct job state
- artifact persistence still happens before notification where required
- cancellation still behaves as before
Definition of done for this phase
Phase 1 is complete when all of the following are true:
- internal job payload handling is typed where it matters
record_webhook_event()is decomposed into explicit, readable steps- repetitive submit flow is reduced without introducing hidden abstractions
- tests protect the main task orchestration seams
- public APIs remain effectively unchanged
- the backend is easier to change safely without rediscovering task flow each time
Explicit non-goals
This plan does not include:
- moving lifecycle ownership into
jobs/service.py - redesigning the task system
- introducing a generic abstraction layer for tasks
- moving artifact persistence into every owning domain module
- broad route redesign
- frontend contract changes
- module structure changes
Follow-up after this phase
Once this plan is implemented and stable, reassess whether there is enough remaining pressure to justify a second phase.
If yes, the next candidate should be:
- Option 2 selectively, and only where phase 1 proves that lifecycle ownership is still the main source of friction.
Until then, keep the refactor intentionally conservative.