# AGENTS.md - AI Coding Guidelines for CofeeProject Backend This document provides guidelines and best practices for AI agents working with this codebase. --- ## Core Principles ### 1. Code Should Be Simple, Readable, and Well Supported - Write code that humans can understand at first glance - Prefer explicit over implicit behavior - Use clear control flow patterns (avoid deeply nested conditions) - Add docstrings for public functions, classes, and modules - Keep functions short and focused (ideally under 30 lines) ### 2. Less Overhead Is Better - Avoid unnecessary abstractions and over-engineering - Don't add layers of indirection without clear benefit - Prefer direct solutions over clever ones - Minimize dependencies where possible - Use built-in Python features before reaching for external libraries ### 3. No Magic Values - Define constants with meaningful names at module level - Use enums or `Literal` types for fixed sets of values (see `ArtifactTypeEnum` pattern) - Configuration values belong in `Settings` class with explicit defaults - Never hardcode timeouts, limits, or thresholds inline - Store user-facing error messages as module-level constants with `ERROR_` prefix - Example: `ERROR_NO_AUDIO_STREAM = "Файл не содержит аудиодорожки"` ```python # BAD if silence_db > 16: ... # GOOD SILENCE_THRESHOLD_DB = 16 if silence_db > SILENCE_THRESHOLD_DB: ... ``` ### 4. One Function Should Implement One Purpose - Each function should do exactly one thing - If a function needs "and" in its description, split it - Extract helper functions for distinct subtasks - Keep side effects isolated and predictable ```python # BAD async def get_and_validate_and_process_media(file_key: str) -> MediaResult: ... # GOOD async def download_media(file_key: str) -> TempFile: ... def validate_media_format(file_path: str) -> bool: ... async def process_media(file_path: str) -> MediaResult: ... ``` ### 5. All Variable Names Should Have Meaning Based on Context - Use descriptive names that explain purpose, not type - Avoid single-letter variables (except for trivial loops) - Prefix boolean variables with `is_`, `has_`, `can_`, `should_` - Use domain terminology consistently ```python # BAD x = await repo.get(id) flag = x.is_deleted # GOOD media_file = await media_repository.get_by_id(media_file_id) is_soft_deleted = media_file.is_deleted ``` --- ## Project Architecture ### Layer Structure ``` cpv3/ ├── api/v1/ # API version routing ├── common/ # Shared schemas and utilities ├── db/ # Database base classes and session ├── infrastructure/ # Cross-cutting concerns (auth, storage, settings) └── modules/ # Feature modules (domain logic) └── / ├── models.py # SQLAlchemy models ├── schemas.py # Pydantic DTOs ├── repository.py # Database access layer ├── service.py # Business logic └── router.py # FastAPI endpoints ``` ### Module Responsibilities | Layer | Responsibility | Dependencies | | --------------- | ------------------------------------------ | ----------------------------- | | `router.py` | HTTP request/response handling, validation | schemas, service, repository | | `service.py` | Business logic, orchestration | repository, external services | | `repository.py` | Database queries, CRUD operations | models, session | | `schemas.py` | Data transfer objects, validation | pydantic | | `models.py` | Database table definitions | SQLAlchemy | --- ## Coding Standards ### Python Version & Style - **Python 3.11+** required - Use `from __future__ import annotations` for forward references - Line length: **100 characters** (configured in ruff) - Use type hints for all function signatures - Async-first approach for I/O operations ### Imports ```python # Standard library from __future__ import annotations import uuid from datetime import datetime, timezone from typing import Literal # Third-party from fastapi import APIRouter, Depends, HTTPException, status from sqlalchemy.ext.asyncio import AsyncSession from pydantic import BaseModel, Field # Local imports (absolute paths) from cpv3.infrastructure.auth import get_current_user from cpv3.modules.media.schemas import MediaFileRead from cpv3.modules.media.repository import MediaFileRepository ``` ### Pydantic Schemas - Inherit from `cpv3.common.schemas.Schema` for consistent config - Use `Literal` types for enums with string values - Suffix schema names: `*Create`, `*Update`, `*Read` ```python from cpv3.common.schemas import Schema class MediaFileRead(Schema): id: UUID owner_id: UUID duration_seconds: float is_deleted: bool created_at: datetime ``` ### SQLAlchemy Models - Inherit from `Base` and `BaseModelMixin` - Use explicit column types - Add indexes for frequently queried fields - Use soft deletes (`is_deleted` flag) ```python from cpv3.db.base import Base, BaseModelMixin class MediaFile(Base, BaseModelMixin): __tablename__ = "media_files" owner_id: Mapped[uuid.UUID] = mapped_column( UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True ) is_deleted: Mapped[bool] = mapped_column(Boolean, default=False) ``` ### Repository Pattern - One repository per model - Accept `AsyncSession` in constructor - Methods should be atomic and focused - Filter soft-deleted records by default ```python class MediaFileRepository: def __init__(self, session: AsyncSession) -> None: self._session = session async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None: result = await self._session.execute( select(MediaFile).where(MediaFile.id == media_file_id) ) media_file = result.scalar_one_or_none() if media_file is None or media_file.is_deleted: return None return media_file ``` ### FastAPI Endpoints - Use dependency injection for DB session, auth, and services - Return typed response models - Use appropriate HTTP status codes - Handle errors with `HTTPException` ```python @router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead) async def get_mediafile( media_file_id: uuid.UUID, current_user: User = Depends(get_current_user), db: AsyncSession = Depends(get_db), ) -> MediaFileRead: repo = MediaFileRepository(db) media_file = await repo.get_by_id(media_file_id) if media_file is None: raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found") return MediaFileRead.model_validate(media_file) ``` --- ## Configuration & Settings ### Environment Variables - All configuration through `Settings` class in `infrastructure/settings.py` - Use `Field(default=..., alias="ENV_VAR_NAME")` pattern - Provide sensible defaults for local development - Never commit secrets to repository ```python class Settings(BaseSettings): jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY") jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM") jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES") ``` ### Accessing Settings ```python from cpv3.infrastructure.settings import get_settings settings = get_settings() # Cached via @lru_cache ``` --- ## Testing Guidelines ### Test Structure ``` tests/ ├── conftest.py # Shared fixtures ├── unit/ # Unit tests (isolated) └── integration/ # Integration tests (with DB/services) ``` ### Fixtures - Use `pytest-asyncio` for async tests - Create isolated database sessions per test - Mock external services (storage, APIs) ```python @pytest.fixture async def test_user(test_db_session: AsyncSession) -> User: user = User( id=uuid.uuid4(), username="testuser", email="test@example.com", password_hash=hash_password("testpassword"), is_active=True, ) test_db_session.add(user) await test_db_session.commit() return user ``` ### Test Naming ```python # Pattern: test___ async def test_get_mediafile_when_not_found_returns_404(): ... async def test_create_mediafile_with_valid_data_returns_201(): ... ``` --- ## Common Patterns ### Error Handling ```python # Use specific HTTP exceptions raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail="Media file not found" ) # Re-raise with context try: result = await external_service.call() except ExternalError as e: raise HTTPException( status_code=status.HTTP_502_BAD_GATEWAY, detail="External service unavailable" ) from e ``` ### Async Operations ```python # For CPU-bound work in async context import anyio result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2) # For subprocess calls proc = await asyncio.create_subprocess_exec( "ffprobe", "-v", "error", file_path, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, ) stdout, stderr = await proc.communicate() ``` ### Temporary Files ```python from tempfile import NamedTemporaryFile with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp: tmp_path = tmp.name try: # Use tmp_path ... finally: # Clean up Path(tmp_path).unlink(missing_ok=True) ``` --- ## Do's and Don'ts ### ✅ DO - Use type hints everywhere - Write async code for I/O operations - Use dependency injection - Keep modules self-contained - Write tests for new features - Use meaningful commit messages - Follow existing patterns in the codebase ### ❌ DON'T - Use global mutable state - Put business logic in routers - Hardcode configuration values - Ignore type checker warnings - Write overly clever code - Skip error handling - Mix sync and async DB operations --- ## Quick Reference | Task | Location | | --------------------- | ------------------------------------- | | Add new endpoint | `modules//router.py` | | Add database model | `modules//models.py` | | Add validation schema | `modules//schemas.py` | | Add business logic | `modules//service.py` | | Add database query | `modules//repository.py` | | Add configuration | `infrastructure/settings.py` | | Add shared utility | `common/` | | Add migration | Run `alembic revision --autogenerate` | --- ## Package Management This project uses **[uv](https://docs.astral.sh/uv/)** as the package manager - a fast Python package installer and resolver written in Rust. ### Common Commands ```bash # Install all dependencies uv sync # Add a new dependency uv add # Add a dev dependency uv add --group dev # Run a command in the virtual environment uv run # Run the development server uv run uvicorn cpv3.main:app --reload # Run tests uv run pytest ``` ### Why uv? - **Speed** - 10-100x faster than pip - **Reliable** - Deterministic dependency resolution - **Compatible** - Works with standard `pyproject.toml` --- ## Dependencies Key dependencies used in this project: - **FastAPI** - Web framework - **SQLAlchemy 2.0** - ORM (async mode) - **Pydantic 2.x** - Data validation - **asyncpg** - PostgreSQL async driver - **Alembic** - Database migrations - **pytest-asyncio** - Async testing - **boto3** - AWS S3 storage - **pydub** - Audio processing - **openai-whisper** - Transcription - **Dramatiq** - Background task queue (with Redis broker) --- ## Common AI Agent Mistakes to Avoid This section documents real errors made during AI-assisted development sessions. Learn from these mistakes. ### 1. Over-Engineering and Breaking Module Structure **What happened:** When asked to implement background tasks, the agent created excessive files: ``` # BAD - What was created cpv3/modules/tasks/ ├── __init__.py ├── actors.py # ❌ Non-standard ├── base.py # ❌ Non-standard ├── db_helpers.py # ❌ Non-standard ├── webhook_dispatch.py # ❌ Non-standard ├── handlers/ # ❌ Non-standard directory │ ├── __init__.py │ ├── base.py │ ├── media_probe.py │ ├── silence_remove.py │ └── ... ├── schemas.py ├── service.py └── router.py # GOOD - Standard module structure cpv3/modules/tasks/ ├── __init__.py ├── schemas.py # DTOs only ├── service.py # All business logic including actors └── router.py # Endpoints only ``` **Why it's wrong:** - Ignored existing module patterns in the codebase - Added unnecessary abstraction layers (BaseTaskHandler, registry pattern) - Created cognitive overhead for maintainers **Advice:** - **ALWAYS examine existing modules first** before creating new ones - **Match the existing file naming conventions exactly** - Standard module files: `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py` - Only create files from this list; consolidate everything else into `service.py` --- ### 2. Misinterpreting "Make It Flexible" or "Apply SRP" **What happened:** When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating: - Abstract base classes (`BaseTaskHandler`, `BaseTaskSubmitter`) - A registry pattern with dynamic handler registration - Separate files for each handler implementation - Complex inheritance hierarchies **Why it's wrong:** - SRP doesn't mean "one class per file" or "maximum abstraction" - Flexibility doesn't mean "prepare for every possible future change" - This violates the project's core principle: **"Less Overhead Is Better"** **Advice:** - SRP = one function does one thing, NOT one file per concept - "Flexible" = easy to modify, NOT infinitely extensible - When in doubt, keep it in one file and refactor later if needed - Abstract base classes are rarely needed; prefer composition ```python # BAD - Over-abstracted class BaseTaskHandler(ABC): @abstractmethod async def validate(self, request): ... @abstractmethod async def execute(self, job_id): ... @abstractmethod async def on_complete(self, result): ... class MediaProbeHandler(BaseTaskHandler): ... # GOOD - Simple and direct @dramatiq.actor def media_probe_actor(job_id: str, media_file_id: str) -> None: """Probe media file for metadata.""" # All logic here, no inheritance needed ... ``` --- ### 3. Not Reading AGENTS.md Before Starting **What happened:** The agent proceeded with implementation without fully considering the documented principles, particularly: - "Avoid unnecessary abstractions and over-engineering" - "Don't add layers of indirection without clear benefit" - "Prefer direct solutions over clever ones" **Advice:** - **Read AGENTS.md completely before any implementation** - Re-read relevant sections when making architectural decisions - When the user's request conflicts with AGENTS.md principles, ask for clarification --- ### 4. Creating Files Without Checking Existing Patterns **What happened:** The agent created `handlers/` subdirectory and multiple utility files without checking how other modules handle similar needs. **Advice:** - Before creating ANY new file, run: `ls cpv3/modules//` - Check if the functionality can fit into existing standard files - If you need a helper function, put it in `service.py`, not a new file - Subdirectories within modules are almost never appropriate --- ### 5. Ignoring the "Quick Reference" Table The AGENTS.md contains a clear reference: | Task | Location | | --------------------- | -------------------------------- | | Add new endpoint | `modules//router.py` | | Add database model | `modules//models.py` | | Add validation schema | `modules//schemas.py` | | Add business logic | `modules//service.py` | | Add database query | `modules//repository.py` | **Advice:** - Use this table as the ONLY guide for file placement - If something doesn't fit these categories, it probably belongs in `service.py` - Cross-cutting concerns go in `infrastructure/`, not in module subdirectories --- ### Summary: The Golden Rules 1. **Check existing patterns first** - Look at 2-3 similar modules before creating anything 2. **Standard files only** - `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py` 3. **No subdirectories in modules** - Everything fits in the standard files 4. **Consolidate, don't split** - When unsure, put it in `service.py` 5. **Simple > Clever** - Direct code beats abstract patterns 6. **YAGNI** - Don't build for hypothetical future requirements --- _Last updated: February 2026_