faglo/main_backend

Fork 0

Files

T

Daniil a25bf623ea feature: create multitasking

2026-02-04 02:19:50 +03:00

17 KiB

Raw Blame History

AGENTS.md - AI Coding Guidelines for CofeeProject Backend

This document provides guidelines and best practices for AI agents working with this codebase.

Core Principles

1. Code Should Be Simple, Readable, and Well Supported

Write code that humans can understand at first glance
Prefer explicit over implicit behavior
Use clear control flow patterns (avoid deeply nested conditions)
Add docstrings for public functions, classes, and modules
Keep functions short and focused (ideally under 30 lines)

2. Less Overhead Is Better

Avoid unnecessary abstractions and over-engineering
Don't add layers of indirection without clear benefit
Prefer direct solutions over clever ones
Minimize dependencies where possible
Use built-in Python features before reaching for external libraries

3. No Magic Values

Define constants with meaningful names at module level
Use enums or Literal types for fixed sets of values (see ArtifactTypeEnum pattern)
Configuration values belong in Settings class with explicit defaults
Never hardcode timeouts, limits, or thresholds inline

# BAD
if silence_db > 16:
    ...

# GOOD
SILENCE_THRESHOLD_DB = 16

if silence_db > SILENCE_THRESHOLD_DB:
    ...

4. One Function Should Implement One Purpose

Each function should do exactly one thing
If a function needs "and" in its description, split it
Extract helper functions for distinct subtasks
Keep side effects isolated and predictable

# BAD
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
    ...

# GOOD
async def download_media(file_key: str) -> TempFile:
    ...

def validate_media_format(file_path: str) -> bool:
    ...

async def process_media(file_path: str) -> MediaResult:
    ...

5. All Variable Names Should Have Meaning Based on Context

Use descriptive names that explain purpose, not type
Avoid single-letter variables (except for trivial loops)
Prefix boolean variables with is_, has_, can_, should_
Use domain terminology consistently

# BAD
x = await repo.get(id)
flag = x.is_deleted

# GOOD
media_file = await media_repository.get_by_id(media_file_id)
is_soft_deleted = media_file.is_deleted

Project Architecture

Layer Structure

cpv3/
├── api/v1/          # API version routing
├── common/          # Shared schemas and utilities
├── db/              # Database base classes and session
├── infrastructure/  # Cross-cutting concerns (auth, storage, settings)
└── modules/         # Feature modules (domain logic)
    └── <module>/
        ├── models.py      # SQLAlchemy models
        ├── schemas.py     # Pydantic DTOs
        ├── repository.py  # Database access layer
        ├── service.py     # Business logic
        └── router.py      # FastAPI endpoints

Module Responsibilities

Layer	Responsibility	Dependencies
`router.py`	HTTP request/response handling, validation	schemas, service, repository
`service.py`	Business logic, orchestration	repository, external services
`repository.py`	Database queries, CRUD operations	models, session
`schemas.py`	Data transfer objects, validation	pydantic
`models.py`	Database table definitions	SQLAlchemy

Coding Standards

Python Version & Style

Python 3.11+ required
Use from __future__ import annotations for forward references
Line length: 100 characters (configured in ruff)
Use type hints for all function signatures
Async-first approach for I/O operations

Imports

# Standard library
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from typing import Literal

# Third-party
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field

# Local imports (absolute paths)
from cpv3.infrastructure.auth import get_current_user
from cpv3.modules.media.schemas import MediaFileRead
from cpv3.modules.media.repository import MediaFileRepository

Pydantic Schemas

Inherit from cpv3.common.schemas.Schema for consistent config
Use Literal types for enums with string values
Suffix schema names: *Create, *Update, *Read

from cpv3.common.schemas import Schema

class MediaFileRead(Schema):
    id: UUID
    owner_id: UUID
    duration_seconds: float
    is_deleted: bool
    created_at: datetime

SQLAlchemy Models

Inherit from Base and BaseModelMixin
Use explicit column types
Add indexes for frequently queried fields
Use soft deletes (is_deleted flag)

from cpv3.db.base import Base, BaseModelMixin

class MediaFile(Base, BaseModelMixin):
    __tablename__ = "media_files"

    owner_id: Mapped[uuid.UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
    )
    is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)

Repository Pattern

One repository per model
Accept AsyncSession in constructor
Methods should be atomic and focused
Filter soft-deleted records by default

class MediaFileRepository:
    def __init__(self, session: AsyncSession) -> None:
        self._session = session

    async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
        result = await self._session.execute(
            select(MediaFile).where(MediaFile.id == media_file_id)
        )
        media_file = result.scalar_one_or_none()
        if media_file is None or media_file.is_deleted:
            return None
        return media_file

FastAPI Endpoints

Use dependency injection for DB session, auth, and services
Return typed response models
Use appropriate HTTP status codes
Handle errors with HTTPException

@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
async def get_mediafile(
    media_file_id: uuid.UUID,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
) -> MediaFileRead:
    repo = MediaFileRepository(db)
    media_file = await repo.get_by_id(media_file_id)
    if media_file is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
    return MediaFileRead.model_validate(media_file)

Configuration & Settings

Environment Variables

All configuration through Settings class in infrastructure/settings.py
Use Field(default=..., alias="ENV_VAR_NAME") pattern
Provide sensible defaults for local development
Never commit secrets to repository

class Settings(BaseSettings):
    jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
    jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
    jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")

Accessing Settings

from cpv3.infrastructure.settings import get_settings

settings = get_settings()  # Cached via @lru_cache

Testing Guidelines

Test Structure

tests/
├── conftest.py          # Shared fixtures
├── unit/                # Unit tests (isolated)
└── integration/         # Integration tests (with DB/services)

Fixtures

Use pytest-asyncio for async tests
Create isolated database sessions per test
Mock external services (storage, APIs)

@pytest.fixture
async def test_user(test_db_session: AsyncSession) -> User:
    user = User(
        id=uuid.uuid4(),
        username="testuser",
        email="test@example.com",
        password_hash=hash_password("testpassword"),
        is_active=True,
    )
    test_db_session.add(user)
    await test_db_session.commit()
    return user

Test Naming

# Pattern: test_<action>_<condition>_<expected_result>
async def test_get_mediafile_when_not_found_returns_404():
    ...

async def test_create_mediafile_with_valid_data_returns_201():
    ...

Common Patterns

Error Handling

# Use specific HTTP exceptions
raise HTTPException(
    status_code=status.HTTP_404_NOT_FOUND,
    detail="Media file not found"
)

# Re-raise with context
try:
    result = await external_service.call()
except ExternalError as e:
    raise HTTPException(
        status_code=status.HTTP_502_BAD_GATEWAY,
        detail="External service unavailable"
    ) from e

Async Operations

# For CPU-bound work in async context
import anyio

result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)

# For subprocess calls
proc = await asyncio.create_subprocess_exec(
    "ffprobe", "-v", "error", file_path,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()

Temporary Files

from tempfile import NamedTemporaryFile

with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
    tmp_path = tmp.name
try:
    # Use tmp_path
    ...
finally:
    # Clean up
    Path(tmp_path).unlink(missing_ok=True)

Do's and Don'ts

✅ DO

Use type hints everywhere
Write async code for I/O operations
Use dependency injection
Keep modules self-contained
Write tests for new features
Use meaningful commit messages
Follow existing patterns in the codebase

❌ DON'T

Use global mutable state
Put business logic in routers
Hardcode configuration values
Ignore type checker warnings
Write overly clever code
Skip error handling
Mix sync and async DB operations

Quick Reference

Task	Location
Add new endpoint	`modules/<module>/router.py`
Add database model	`modules/<module>/models.py`
Add validation schema	`modules/<module>/schemas.py`
Add business logic	`modules/<module>/service.py`
Add database query	`modules/<module>/repository.py`
Add configuration	`infrastructure/settings.py`
Add shared utility	`common/`
Add migration	Run `alembic revision --autogenerate`

Package Management

This project uses uv as the package manager - a fast Python package installer and resolver written in Rust.

Common Commands

# Install all dependencies
uv sync

# Add a new dependency
uv add <package-name>

# Add a dev dependency
uv add --group dev <package-name>

# Run a command in the virtual environment
uv run <command>

# Run the development server
uv run uvicorn cpv3.main:app --reload

# Run tests
uv run pytest

Why uv?

Speed - 10-100x faster than pip
Reliable - Deterministic dependency resolution
Compatible - Works with standard pyproject.toml

Dependencies

Key dependencies used in this project:

FastAPI - Web framework
SQLAlchemy 2.0 - ORM (async mode)
Pydantic 2.x - Data validation
asyncpg - PostgreSQL async driver
Alembic - Database migrations
pytest-asyncio - Async testing
boto3 - AWS S3 storage
pydub - Audio processing
openai-whisper - Transcription
Dramatiq - Background task queue (with Redis broker)

Common AI Agent Mistakes to Avoid

This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.

1. Over-Engineering and Breaking Module Structure

What happened: When asked to implement background tasks, the agent created excessive files:

# BAD - What was created
cpv3/modules/tasks/
├── __init__.py
├── actors.py           # ❌ Non-standard
├── base.py             # ❌ Non-standard
├── db_helpers.py       # ❌ Non-standard
├── webhook_dispatch.py # ❌ Non-standard
├── handlers/           # ❌ Non-standard directory
│   ├── __init__.py
│   ├── base.py
│   ├── media_probe.py
│   ├── silence_remove.py
│   └── ...
├── schemas.py
├── service.py
└── router.py

# GOOD - Standard module structure
cpv3/modules/tasks/
├── __init__.py
├── schemas.py    # DTOs only
├── service.py    # All business logic including actors
└── router.py     # Endpoints only

Why it's wrong:

Ignored existing module patterns in the codebase
Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
Created cognitive overhead for maintainers

Advice:

ALWAYS examine existing modules first before creating new ones
Match the existing file naming conventions exactly
Standard module files: __init__.py, models.py, schemas.py, repository.py, service.py, router.py
Only create files from this list; consolidate everything else into service.py

2. Misinterpreting "Make It Flexible" or "Apply SRP"

What happened: When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:

Abstract base classes (BaseTaskHandler, BaseTaskSubmitter)
A registry pattern with dynamic handler registration
Separate files for each handler implementation
Complex inheritance hierarchies

Why it's wrong:

SRP doesn't mean "one class per file" or "maximum abstraction"
Flexibility doesn't mean "prepare for every possible future change"
This violates the project's core principle: "Less Overhead Is Better"

Advice:

SRP = one function does one thing, NOT one file per concept
"Flexible" = easy to modify, NOT infinitely extensible
When in doubt, keep it in one file and refactor later if needed
Abstract base classes are rarely needed; prefer composition

# BAD - Over-abstracted
class BaseTaskHandler(ABC):
    @abstractmethod
    async def validate(self, request): ...
    @abstractmethod
    async def execute(self, job_id): ...
    @abstractmethod
    async def on_complete(self, result): ...

class MediaProbeHandler(BaseTaskHandler):
    ...

# GOOD - Simple and direct
@dramatiq.actor
def media_probe_actor(job_id: str, media_file_id: str) -> None:
    """Probe media file for metadata."""
    # All logic here, no inheritance needed
    ...

3. Not Reading AGENTS.md Before Starting

What happened: The agent proceeded with implementation without fully considering the documented principles, particularly:

"Avoid unnecessary abstractions and over-engineering"
"Don't add layers of indirection without clear benefit"
"Prefer direct solutions over clever ones"

Advice:

Read AGENTS.md completely before any implementation
Re-read relevant sections when making architectural decisions
When the user's request conflicts with AGENTS.md principles, ask for clarification

4. Creating Files Without Checking Existing Patterns

What happened: The agent created handlers/ subdirectory and multiple utility files without checking how other modules handle similar needs.

Advice:

Before creating ANY new file, run: ls cpv3/modules/<similar_module>/
Check if the functionality can fit into existing standard files
If you need a helper function, put it in service.py, not a new file
Subdirectories within modules are almost never appropriate

5. Ignoring the "Quick Reference" Table

The AGENTS.md contains a clear reference:

Task	Location
Add new endpoint	`modules/<module>/router.py`
Add database model	`modules/<module>/models.py`
Add validation schema	`modules/<module>/schemas.py`
Add business logic	`modules/<module>/service.py`
Add database query	`modules/<module>/repository.py`

Advice:

Use this table as the ONLY guide for file placement
If something doesn't fit these categories, it probably belongs in service.py
Cross-cutting concerns go in infrastructure/, not in module subdirectories

Summary: The Golden Rules

Check existing patterns first - Look at 2-3 similar modules before creating anything
Standard files only - __init__.py, models.py, schemas.py, repository.py, service.py, router.py
No subdirectories in modules - Everything fits in the standard files
Consolidate, don't split - When unsure, put it in service.py
Simple > Clever - Direct code beats abstract patterns
YAGNI - Don't build for hypothetical future requirements

Last updated: February 2026

17 KiB Raw Blame History

AGENTS.md - AI Coding Guidelines for CofeeProject Backend

Core Principles

1. Code Should Be Simple, Readable, and Well Supported

2. Less Overhead Is Better

3. No Magic Values

4. One Function Should Implement One Purpose

5. All Variable Names Should Have Meaning Based on Context

Project Architecture

Layer Structure

Module Responsibilities

Coding Standards

Python Version & Style

Imports

Pydantic Schemas

SQLAlchemy Models

Repository Pattern

FastAPI Endpoints

Configuration & Settings

Environment Variables

Accessing Settings

Testing Guidelines

Test Structure

Fixtures

Test Naming

Common Patterns

Error Handling

Async Operations

Temporary Files

Do's and Don'ts

✅ DO

❌ DON'T

Quick Reference

Package Management

Common Commands

Why uv?

Dependencies

Common AI Agent Mistakes to Avoid

1. Over-Engineering and Breaking Module Structure

2. Misinterpreting "Make It Flexible" or "Apply SRP"

3. Not Reading AGENTS.md Before Starting

4. Creating Files Without Checking Existing Patterns

5. Ignoring the "Quick Reference" Table

Summary: The Golden Rules

17 KiB

Raw Blame History