Files
main_backend/AGENTS.md
T
2026-02-04 02:19:50 +03:00

17 KiB

AGENTS.md - AI Coding Guidelines for CofeeProject Backend

This document provides guidelines and best practices for AI agents working with this codebase.


Core Principles

1. Code Should Be Simple, Readable, and Well Supported

  • Write code that humans can understand at first glance
  • Prefer explicit over implicit behavior
  • Use clear control flow patterns (avoid deeply nested conditions)
  • Add docstrings for public functions, classes, and modules
  • Keep functions short and focused (ideally under 30 lines)

2. Less Overhead Is Better

  • Avoid unnecessary abstractions and over-engineering
  • Don't add layers of indirection without clear benefit
  • Prefer direct solutions over clever ones
  • Minimize dependencies where possible
  • Use built-in Python features before reaching for external libraries

3. No Magic Values

  • Define constants with meaningful names at module level
  • Use enums or Literal types for fixed sets of values (see ArtifactTypeEnum pattern)
  • Configuration values belong in Settings class with explicit defaults
  • Never hardcode timeouts, limits, or thresholds inline
# BAD
if silence_db > 16:
    ...

# GOOD
SILENCE_THRESHOLD_DB = 16

if silence_db > SILENCE_THRESHOLD_DB:
    ...

4. One Function Should Implement One Purpose

  • Each function should do exactly one thing
  • If a function needs "and" in its description, split it
  • Extract helper functions for distinct subtasks
  • Keep side effects isolated and predictable
# BAD
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
    ...

# GOOD
async def download_media(file_key: str) -> TempFile:
    ...

def validate_media_format(file_path: str) -> bool:
    ...

async def process_media(file_path: str) -> MediaResult:
    ...

5. All Variable Names Should Have Meaning Based on Context

  • Use descriptive names that explain purpose, not type
  • Avoid single-letter variables (except for trivial loops)
  • Prefix boolean variables with is_, has_, can_, should_
  • Use domain terminology consistently
# BAD
x = await repo.get(id)
flag = x.is_deleted

# GOOD
media_file = await media_repository.get_by_id(media_file_id)
is_soft_deleted = media_file.is_deleted

Project Architecture

Layer Structure

cpv3/
├── api/v1/          # API version routing
├── common/          # Shared schemas and utilities
├── db/              # Database base classes and session
├── infrastructure/  # Cross-cutting concerns (auth, storage, settings)
└── modules/         # Feature modules (domain logic)
    └── <module>/
        ├── models.py      # SQLAlchemy models
        ├── schemas.py     # Pydantic DTOs
        ├── repository.py  # Database access layer
        ├── service.py     # Business logic
        └── router.py      # FastAPI endpoints

Module Responsibilities

Layer Responsibility Dependencies
router.py HTTP request/response handling, validation schemas, service, repository
service.py Business logic, orchestration repository, external services
repository.py Database queries, CRUD operations models, session
schemas.py Data transfer objects, validation pydantic
models.py Database table definitions SQLAlchemy

Coding Standards

Python Version & Style

  • Python 3.11+ required
  • Use from __future__ import annotations for forward references
  • Line length: 100 characters (configured in ruff)
  • Use type hints for all function signatures
  • Async-first approach for I/O operations

Imports

# Standard library
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from typing import Literal

# Third-party
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field

# Local imports (absolute paths)
from cpv3.infrastructure.auth import get_current_user
from cpv3.modules.media.schemas import MediaFileRead
from cpv3.modules.media.repository import MediaFileRepository

Pydantic Schemas

  • Inherit from cpv3.common.schemas.Schema for consistent config
  • Use Literal types for enums with string values
  • Suffix schema names: *Create, *Update, *Read
from cpv3.common.schemas import Schema

class MediaFileRead(Schema):
    id: UUID
    owner_id: UUID
    duration_seconds: float
    is_deleted: bool
    created_at: datetime

SQLAlchemy Models

  • Inherit from Base and BaseModelMixin
  • Use explicit column types
  • Add indexes for frequently queried fields
  • Use soft deletes (is_deleted flag)
from cpv3.db.base import Base, BaseModelMixin

class MediaFile(Base, BaseModelMixin):
    __tablename__ = "media_files"

    owner_id: Mapped[uuid.UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
    )
    is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)

Repository Pattern

  • One repository per model
  • Accept AsyncSession in constructor
  • Methods should be atomic and focused
  • Filter soft-deleted records by default
class MediaFileRepository:
    def __init__(self, session: AsyncSession) -> None:
        self._session = session

    async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
        result = await self._session.execute(
            select(MediaFile).where(MediaFile.id == media_file_id)
        )
        media_file = result.scalar_one_or_none()
        if media_file is None or media_file.is_deleted:
            return None
        return media_file

FastAPI Endpoints

  • Use dependency injection for DB session, auth, and services
  • Return typed response models
  • Use appropriate HTTP status codes
  • Handle errors with HTTPException
@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
async def get_mediafile(
    media_file_id: uuid.UUID,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
) -> MediaFileRead:
    repo = MediaFileRepository(db)
    media_file = await repo.get_by_id(media_file_id)
    if media_file is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
    return MediaFileRead.model_validate(media_file)

Configuration & Settings

Environment Variables

  • All configuration through Settings class in infrastructure/settings.py
  • Use Field(default=..., alias="ENV_VAR_NAME") pattern
  • Provide sensible defaults for local development
  • Never commit secrets to repository
class Settings(BaseSettings):
    jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
    jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
    jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")

Accessing Settings

from cpv3.infrastructure.settings import get_settings

settings = get_settings()  # Cached via @lru_cache

Testing Guidelines

Test Structure

tests/
├── conftest.py          # Shared fixtures
├── unit/                # Unit tests (isolated)
└── integration/         # Integration tests (with DB/services)

Fixtures

  • Use pytest-asyncio for async tests
  • Create isolated database sessions per test
  • Mock external services (storage, APIs)
@pytest.fixture
async def test_user(test_db_session: AsyncSession) -> User:
    user = User(
        id=uuid.uuid4(),
        username="testuser",
        email="test@example.com",
        password_hash=hash_password("testpassword"),
        is_active=True,
    )
    test_db_session.add(user)
    await test_db_session.commit()
    return user

Test Naming

# Pattern: test_<action>_<condition>_<expected_result>
async def test_get_mediafile_when_not_found_returns_404():
    ...

async def test_create_mediafile_with_valid_data_returns_201():
    ...

Common Patterns

Error Handling

# Use specific HTTP exceptions
raise HTTPException(
    status_code=status.HTTP_404_NOT_FOUND,
    detail="Media file not found"
)

# Re-raise with context
try:
    result = await external_service.call()
except ExternalError as e:
    raise HTTPException(
        status_code=status.HTTP_502_BAD_GATEWAY,
        detail="External service unavailable"
    ) from e

Async Operations

# For CPU-bound work in async context
import anyio

result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)

# For subprocess calls
proc = await asyncio.create_subprocess_exec(
    "ffprobe", "-v", "error", file_path,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()

Temporary Files

from tempfile import NamedTemporaryFile

with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
    tmp_path = tmp.name
try:
    # Use tmp_path
    ...
finally:
    # Clean up
    Path(tmp_path).unlink(missing_ok=True)

Do's and Don'ts

DO

  • Use type hints everywhere
  • Write async code for I/O operations
  • Use dependency injection
  • Keep modules self-contained
  • Write tests for new features
  • Use meaningful commit messages
  • Follow existing patterns in the codebase

DON'T

  • Use global mutable state
  • Put business logic in routers
  • Hardcode configuration values
  • Ignore type checker warnings
  • Write overly clever code
  • Skip error handling
  • Mix sync and async DB operations

Quick Reference

Task Location
Add new endpoint modules/<module>/router.py
Add database model modules/<module>/models.py
Add validation schema modules/<module>/schemas.py
Add business logic modules/<module>/service.py
Add database query modules/<module>/repository.py
Add configuration infrastructure/settings.py
Add shared utility common/
Add migration Run alembic revision --autogenerate

Package Management

This project uses uv as the package manager - a fast Python package installer and resolver written in Rust.

Common Commands

# Install all dependencies
uv sync

# Add a new dependency
uv add <package-name>

# Add a dev dependency
uv add --group dev <package-name>

# Run a command in the virtual environment
uv run <command>

# Run the development server
uv run uvicorn cpv3.main:app --reload

# Run tests
uv run pytest

Why uv?

  • Speed - 10-100x faster than pip
  • Reliable - Deterministic dependency resolution
  • Compatible - Works with standard pyproject.toml

Dependencies

Key dependencies used in this project:

  • FastAPI - Web framework
  • SQLAlchemy 2.0 - ORM (async mode)
  • Pydantic 2.x - Data validation
  • asyncpg - PostgreSQL async driver
  • Alembic - Database migrations
  • pytest-asyncio - Async testing
  • boto3 - AWS S3 storage
  • pydub - Audio processing
  • openai-whisper - Transcription
  • Dramatiq - Background task queue (with Redis broker)

Common AI Agent Mistakes to Avoid

This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.

1. Over-Engineering and Breaking Module Structure

What happened: When asked to implement background tasks, the agent created excessive files:

# BAD - What was created
cpv3/modules/tasks/
├── __init__.py
├── actors.py           # ❌ Non-standard
├── base.py             # ❌ Non-standard
├── db_helpers.py       # ❌ Non-standard
├── webhook_dispatch.py # ❌ Non-standard
├── handlers/           # ❌ Non-standard directory
│   ├── __init__.py
│   ├── base.py
│   ├── media_probe.py
│   ├── silence_remove.py
│   └── ...
├── schemas.py
├── service.py
└── router.py

# GOOD - Standard module structure
cpv3/modules/tasks/
├── __init__.py
├── schemas.py    # DTOs only
├── service.py    # All business logic including actors
└── router.py     # Endpoints only

Why it's wrong:

  • Ignored existing module patterns in the codebase
  • Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
  • Created cognitive overhead for maintainers

Advice:

  • ALWAYS examine existing modules first before creating new ones
  • Match the existing file naming conventions exactly
  • Standard module files: __init__.py, models.py, schemas.py, repository.py, service.py, router.py
  • Only create files from this list; consolidate everything else into service.py

2. Misinterpreting "Make It Flexible" or "Apply SRP"

What happened: When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:

  • Abstract base classes (BaseTaskHandler, BaseTaskSubmitter)
  • A registry pattern with dynamic handler registration
  • Separate files for each handler implementation
  • Complex inheritance hierarchies

Why it's wrong:

  • SRP doesn't mean "one class per file" or "maximum abstraction"
  • Flexibility doesn't mean "prepare for every possible future change"
  • This violates the project's core principle: "Less Overhead Is Better"

Advice:

  • SRP = one function does one thing, NOT one file per concept
  • "Flexible" = easy to modify, NOT infinitely extensible
  • When in doubt, keep it in one file and refactor later if needed
  • Abstract base classes are rarely needed; prefer composition
# BAD - Over-abstracted
class BaseTaskHandler(ABC):
    @abstractmethod
    async def validate(self, request): ...
    @abstractmethod
    async def execute(self, job_id): ...
    @abstractmethod
    async def on_complete(self, result): ...

class MediaProbeHandler(BaseTaskHandler):
    ...

# GOOD - Simple and direct
@dramatiq.actor
def media_probe_actor(job_id: str, media_file_id: str) -> None:
    """Probe media file for metadata."""
    # All logic here, no inheritance needed
    ...

3. Not Reading AGENTS.md Before Starting

What happened: The agent proceeded with implementation without fully considering the documented principles, particularly:

  • "Avoid unnecessary abstractions and over-engineering"
  • "Don't add layers of indirection without clear benefit"
  • "Prefer direct solutions over clever ones"

Advice:

  • Read AGENTS.md completely before any implementation
  • Re-read relevant sections when making architectural decisions
  • When the user's request conflicts with AGENTS.md principles, ask for clarification

4. Creating Files Without Checking Existing Patterns

What happened: The agent created handlers/ subdirectory and multiple utility files without checking how other modules handle similar needs.

Advice:

  • Before creating ANY new file, run: ls cpv3/modules/<similar_module>/
  • Check if the functionality can fit into existing standard files
  • If you need a helper function, put it in service.py, not a new file
  • Subdirectories within modules are almost never appropriate

5. Ignoring the "Quick Reference" Table

The AGENTS.md contains a clear reference:

Task Location
Add new endpoint modules/<module>/router.py
Add database model modules/<module>/models.py
Add validation schema modules/<module>/schemas.py
Add business logic modules/<module>/service.py
Add database query modules/<module>/repository.py

Advice:

  • Use this table as the ONLY guide for file placement
  • If something doesn't fit these categories, it probably belongs in service.py
  • Cross-cutting concerns go in infrastructure/, not in module subdirectories

Summary: The Golden Rules

  1. Check existing patterns first - Look at 2-3 similar modules before creating anything
  2. Standard files only - __init__.py, models.py, schemas.py, repository.py, service.py, router.py
  3. No subdirectories in modules - Everything fits in the standard files
  4. Consolidate, don't split - When unsure, put it in service.py
  5. Simple > Clever - Direct code beats abstract patterns
  6. YAGNI - Don't build for hypothetical future requirements

Last updated: February 2026