Files
main_backend/CODEX_REFACTOR_PLAN.md
2026-03-17 18:11:23 +03:00

13 KiB
Raw Permalink Blame History

CODEX Refactor Plan

Chosen direction

This plan adopts Option 1: in-place cleanup inside current boundaries.

The goal is to make backend feature work safer and the code easier to read without moving far from the current implementation.

This means:

  • keep the existing module structure (models.py, schemas.py, repository.py, service.py, router.py)
  • keep public HTTP routes and response contracts stable
  • avoid hidden abstractions, base classes, registries, or plugin-style indirection
  • prefer explicit, local helpers over framework-like refactors
  • optimize first for developer readability and safer future changes

Why this option

The biggest current readability and change-risk hotspot is cpv3/modules/tasks/service.py.

Right now it mixes several responsibilities in one place:

  • broker/bootstrap concerns
  • Dramatiq actor definitions
  • task submission flow
  • job lifecycle side effects
  • webhook application
  • artifact persistence
  • notification sending
  • cancellation side effects

The highest concentration of risk is in:

  • cpv3/modules/tasks/service.py — especially TaskService and record_webhook_event()
  • cpv3/modules/jobs/router.py — cancellation path delegates into task orchestration
  • cpv3/modules/captions/service.py — already contains both preset CRUD and Remotion client logic

A bigger ownership move would be possible later, but it is not the safest first step.

Primary objectives

  1. Make task/job orchestration easier to understand.
  2. Reduce the blast radius of changes in TaskService.
  3. Replace raw internal JSON/dict handling with typed, explicit internal contracts.
  4. Keep behavior stable for current API consumers and existing module boundaries.
  5. Improve confidence with focused unit and integration coverage before and during refactoring.

Constraints

Must keep

  • Current backend module structure.
  • Current route layout and public API behavior.
  • Current job types and overall task flow.
  • Current ownership model at module level.
  • Explicit code over clever abstractions.

Allowed

  • Small internal contract cleanup.
  • Small private helper extraction inside existing files.
  • Better naming, smaller methods, and clearer sequencing.
  • Add or expand tests.

Not allowed in this phase

  • New module subdirectories or extra architecture layers.
  • Generic handler registries.
  • Base service hierarchies.
  • Large movement of lifecycle ownership from tasks into jobs.
  • Pushing more orchestration into captions, media, or transcription before task contracts are cleaned up.
  • Broad request/response or frontend-facing contract changes.

Refactor scope

In scope

  • cpv3/modules/tasks/service.py
  • cpv3/modules/tasks/schemas.py
  • cpv3/modules/tasks/router.py where needed for clarity only
  • cpv3/modules/jobs/router.py where needed for clearer delegation only
  • tests around task submission, webhook handling, cancellation, and caption task behavior

Nearby modules to touch carefully if needed

  • cpv3/modules/captions/service.py
  • cpv3/modules/media/service.py
  • cpv3/modules/transcription/repository.py
  • cpv3/modules/jobs/service.py

These are supporting touches only. The refactor should remain centered in tasks.

Target end state

After this phase:

  • TaskService still owns task orchestration,
  • but each major step is explicit and small,
  • stored job payloads are validated through typed internal schemas,
  • webhook application is broken into readable stages,
  • repeated submission logic is reduced through simple private helpers,
  • actor flow is easier to scan,
  • tests cover the critical seams that future feature work will depend on.

tasks/service.py may still be large after this phase, but it should stop being a "many responsibilities blurred together" file.

Main decisions

1. Keep orchestration in tasks for now

Do not move job lifecycle ownership into jobs/service.py in this phase.

Reason:

  • it is a valid long-term direction,
  • but it changes semantic ownership of cancellation and webhook application,
  • which raises regression risk before the current flow is properly typed and decomposed.

2. Add typed internal payload schemas

Add explicit internal schemas in cpv3/modules/tasks/schemas.py for per-job stored payloads.

These schemas are for internal readability and validation only.

They should:

  • represent the current JSON shape stored in job.input_data / job.output_data
  • remain compatible with existing persisted data
  • avoid forcing a database migration in this phase
  • be explicit per job type rather than generic

Examples of the kind of models expected:

  • media probe input/output
  • silence detect input/output
  • silence apply input/output
  • media convert input/output
  • transcription generate input/output
  • frame extract input/output
  • captions generate input/output

Important: the database can still store JSON, but TaskService should stop passing raw dicts around when internal meaning matters.

3. Split record_webhook_event() into explicit stages

Refactor record_webhook_event() into small private methods with one purpose each.

Target shape:

  • load and validate job state
  • ignore terminal jobs when appropriate
  • apply status update
  • append job event row
  • run job-type-specific done hooks
  • send notification
  • return updated job

This keeps behavior in one place while making the sequencing obvious.

4. Reduce duplication only where it is obvious

Allowed helper extraction:

  • output folder resolution
  • broker reference update
  • source file lookup/create flow
  • artifact persistence helpers
  • notification dispatch helper
  • common actor start/fail envelope if the resulting code is still explicit

Do not introduce a generic task framework.

5. Keep domain persistence mostly where it is today

Do not push transcription/media/captions result persistence back into their modules yet.

Reason:

  • domain ownership may improve long-term,
  • but doing it before task contracts are typed risks scattering orchestration across several large files.

This phase should first make the existing orchestration understandable.

Execution phases

Phase 0 — Lock behavior with tests

Before changing the flow, expand tests around the current seams.

Priority coverage:

  • tests/unit/test_task_service.py
    • duplicate active job reuse
    • terminal job webhook ignore behavior
    • webhook event update sequencing
    • artifact persistence trigger points
    • cancellation path behavior
  • tests/unit/test_caption_tasks.py
    • caption render fallback behavior
    • callback confirmation behavior
    • cancellation-related safety where relevant
  • integration coverage where current behavior is externally visible
    • tests/integration/test_jobs_endpoints.py
    • tests/integration/test_captions_endpoints.py if affected

The goal is not broad test expansion. The goal is to pin the behavior of the exact code being refactored.

Phase 1 — Introduce typed internal task payloads

Add internal schemas in cpv3/modules/tasks/schemas.py for job payloads.

Implementation rules:

  • keep schema names explicit per job type
  • support current field names
  • make optional fields tolerant enough for existing rows where necessary
  • use these schemas at service boundaries where input_data and output_data are read or produced

Expected outcome:

  • internal payload meaning becomes visible from types instead of being inferred from raw dict access
  • future changes become easier to review safely

Phase 2 — Decompose webhook application flow

Refactor TaskService.record_webhook_event() into a sequence of small private methods.

Recommended decomposition:

  • _load_job_for_webhook(...)
  • _should_ignore_webhook(...)
  • _apply_job_status_update(...)
  • _create_job_event(...)
  • _run_done_side_effects(...)
  • _send_job_notification(...)

Method names may differ, but the responsibilities should remain this explicit.

Important:

  • keep sequencing behavior unchanged
  • preserve the current order where artifacts are saved before notifications when that order matters

Phase 3 — Normalize submission helpers

Refactor repetitive submit methods without hiding behavior.

The submit methods should remain explicit, but repeated steps can be centralized where the code becomes clearer:

  • resolve user output folder
  • prepare actor kwargs
  • create job + webhook
  • enqueue actor
  • persist broker reference
  • build submit response

The final submit methods should still read as business flows, not as thin wrappers over a generic registry.

Phase 4 — Clean actor envelope duplication

If still useful after earlier phases, standardize only the repeated envelope around actors:

  • cancellation check
  • running event emission
  • success/failure reporting

This should stay simple and local.

If the helper makes actor behavior harder to follow, do not extract it.

Phase 5 — Reassess, do not auto-expand scope

After the in-place cleanup is done, reassess whether a second phase is justified.

Only then consider:

  • moving lifecycle semantics into jobs
  • pushing result persistence into owning modules

Those are explicitly not part of the current plan.

File-by-file plan

cpv3/modules/tasks/schemas.py

  • Add explicit internal payload schemas for job input/output.
  • Keep public API request/response schemas stable.
  • Prefer additive internal typing over schema churn.

cpv3/modules/tasks/service.py

  • Keep this file as the orchestration center for this phase.
  • Decompose large methods into readable private steps.
  • Replace raw internal dict usage with typed parsing/serialization where the meaning matters.
  • Reduce repeated code only through local, explicit helpers.

cpv3/modules/tasks/router.py

  • Keep routes unchanged.
  • Only adjust delegation or error handling if required to support clearer internal service boundaries.

cpv3/modules/jobs/router.py

  • Keep the current cancel entrypoint behavior.
  • If touched, only make delegation clearer.
  • Do not move cancel semantics into jobs/service.py in this phase.

cpv3/modules/captions/service.py

  • Keep current ownership.
  • Do not add more orchestration here in phase 1.
  • Only make supporting adjustments if typed payload work needs a clearer boundary.

cpv3/modules/media/service.py

  • Keep media processing helpers as-is unless task cleanup reveals a very small, obvious extraction need.
  • Avoid expanding this file into a second orchestration center.

Risk controls

Preserve persisted JSON compatibility

Existing jobs may already contain payloads in the database.

Therefore:

  • typed internal schemas must support current persisted shapes
  • avoid renaming stored keys unless there is a very strong reason
  • prefer backward-compatible parsing over cleanup for cleanups sake

Preserve side-effect ordering

Current flow saves some artifacts before notification dispatch.

That ordering matters because clients may refetch immediately after notification.

Do not change this ordering unless there is a proven reason and test coverage for the new behavior.

Avoid ownership churn during cleanup

This phase is about readability and safer edits, not about redistributing the architecture.

If a change starts to feel like "this really belongs in another module," mark it for reassessment after phase 1 instead of expanding scope immediately.

Verification strategy

For implementation of this plan, verification should include:

  • targeted unit tests for TaskService
  • targeted unit tests for caption task actor behavior
  • relevant integration tests for jobs/tasks endpoints
  • Ruff checks for touched files

The most important verification is behavioral:

  • task submission still works
  • duplicate job reuse still works
  • webhook updates still drive correct job state
  • artifact persistence still happens before notification where required
  • cancellation still behaves as before

Definition of done for this phase

Phase 1 is complete when all of the following are true:

  • internal job payload handling is typed where it matters
  • record_webhook_event() is decomposed into explicit, readable steps
  • repetitive submit flow is reduced without introducing hidden abstractions
  • tests protect the main task orchestration seams
  • public APIs remain effectively unchanged
  • the backend is easier to change safely without rediscovering task flow each time

Explicit non-goals

This plan does not include:

  • moving lifecycle ownership into jobs/service.py
  • redesigning the task system
  • introducing a generic abstraction layer for tasks
  • moving artifact persistence into every owning domain module
  • broad route redesign
  • frontend contract changes
  • module structure changes

Follow-up after this phase

Once this plan is implemented and stable, reassess whether there is enough remaining pressure to justify a second phase.

If yes, the next candidate should be:

  • Option 2 selectively, and only where phase 1 proves that lifecycle ownership is still the main source of friction.

Until then, keep the refactor intentionally conservative.