17 KiB
AGENTS.md - AI Coding Guidelines for CofeeProject Backend
This document provides guidelines and best practices for AI agents working with this codebase.
Core Principles
1. Code Should Be Simple, Readable, and Well Supported
- Write code that humans can understand at first glance
- Prefer explicit over implicit behavior
- Use clear control flow patterns (avoid deeply nested conditions)
- Add docstrings for public functions, classes, and modules
- Keep functions short and focused (ideally under 30 lines)
2. Less Overhead Is Better
- Avoid unnecessary abstractions and over-engineering
- Don't add layers of indirection without clear benefit
- Prefer direct solutions over clever ones
- Minimize dependencies where possible
- Use built-in Python features before reaching for external libraries
3. No Magic Values
- Define constants with meaningful names at module level
- Use enums or
Literaltypes for fixed sets of values (seeArtifactTypeEnumpattern) - Configuration values belong in
Settingsclass with explicit defaults - Never hardcode timeouts, limits, or thresholds inline
# BAD
if silence_db > 16:
...
# GOOD
SILENCE_THRESHOLD_DB = 16
if silence_db > SILENCE_THRESHOLD_DB:
...
4. One Function Should Implement One Purpose
- Each function should do exactly one thing
- If a function needs "and" in its description, split it
- Extract helper functions for distinct subtasks
- Keep side effects isolated and predictable
# BAD
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
...
# GOOD
async def download_media(file_key: str) -> TempFile:
...
def validate_media_format(file_path: str) -> bool:
...
async def process_media(file_path: str) -> MediaResult:
...
5. All Variable Names Should Have Meaning Based on Context
- Use descriptive names that explain purpose, not type
- Avoid single-letter variables (except for trivial loops)
- Prefix boolean variables with
is_,has_,can_,should_ - Use domain terminology consistently
# BAD
x = await repo.get(id)
flag = x.is_deleted
# GOOD
media_file = await media_repository.get_by_id(media_file_id)
is_soft_deleted = media_file.is_deleted
Project Architecture
Layer Structure
cpv3/
├── api/v1/ # API version routing
├── common/ # Shared schemas and utilities
├── db/ # Database base classes and session
├── infrastructure/ # Cross-cutting concerns (auth, storage, settings)
└── modules/ # Feature modules (domain logic)
└── <module>/
├── models.py # SQLAlchemy models
├── schemas.py # Pydantic DTOs
├── repository.py # Database access layer
├── service.py # Business logic
└── router.py # FastAPI endpoints
Module Responsibilities
| Layer | Responsibility | Dependencies |
|---|---|---|
router.py |
HTTP request/response handling, validation | schemas, service, repository |
service.py |
Business logic, orchestration | repository, external services |
repository.py |
Database queries, CRUD operations | models, session |
schemas.py |
Data transfer objects, validation | pydantic |
models.py |
Database table definitions | SQLAlchemy |
Coding Standards
Python Version & Style
- Python 3.11+ required
- Use
from __future__ import annotationsfor forward references - Line length: 100 characters (configured in ruff)
- Use type hints for all function signatures
- Async-first approach for I/O operations
Imports
# Standard library
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from typing import Literal
# Third-party
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field
# Local imports (absolute paths)
from cpv3.infrastructure.auth import get_current_user
from cpv3.modules.media.schemas import MediaFileRead
from cpv3.modules.media.repository import MediaFileRepository
Pydantic Schemas
- Inherit from
cpv3.common.schemas.Schemafor consistent config - Use
Literaltypes for enums with string values - Suffix schema names:
*Create,*Update,*Read
from cpv3.common.schemas import Schema
class MediaFileRead(Schema):
id: UUID
owner_id: UUID
duration_seconds: float
is_deleted: bool
created_at: datetime
SQLAlchemy Models
- Inherit from
BaseandBaseModelMixin - Use explicit column types
- Add indexes for frequently queried fields
- Use soft deletes (
is_deletedflag)
from cpv3.db.base import Base, BaseModelMixin
class MediaFile(Base, BaseModelMixin):
__tablename__ = "media_files"
owner_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
)
is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)
Repository Pattern
- One repository per model
- Accept
AsyncSessionin constructor - Methods should be atomic and focused
- Filter soft-deleted records by default
class MediaFileRepository:
def __init__(self, session: AsyncSession) -> None:
self._session = session
async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
result = await self._session.execute(
select(MediaFile).where(MediaFile.id == media_file_id)
)
media_file = result.scalar_one_or_none()
if media_file is None or media_file.is_deleted:
return None
return media_file
FastAPI Endpoints
- Use dependency injection for DB session, auth, and services
- Return typed response models
- Use appropriate HTTP status codes
- Handle errors with
HTTPException
@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
async def get_mediafile(
media_file_id: uuid.UUID,
current_user: User = Depends(get_current_user),
db: AsyncSession = Depends(get_db),
) -> MediaFileRead:
repo = MediaFileRepository(db)
media_file = await repo.get_by_id(media_file_id)
if media_file is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
return MediaFileRead.model_validate(media_file)
Configuration & Settings
Environment Variables
- All configuration through
Settingsclass ininfrastructure/settings.py - Use
Field(default=..., alias="ENV_VAR_NAME")pattern - Provide sensible defaults for local development
- Never commit secrets to repository
class Settings(BaseSettings):
jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")
Accessing Settings
from cpv3.infrastructure.settings import get_settings
settings = get_settings() # Cached via @lru_cache
Testing Guidelines
Test Structure
tests/
├── conftest.py # Shared fixtures
├── unit/ # Unit tests (isolated)
└── integration/ # Integration tests (with DB/services)
Fixtures
- Use
pytest-asynciofor async tests - Create isolated database sessions per test
- Mock external services (storage, APIs)
@pytest.fixture
async def test_user(test_db_session: AsyncSession) -> User:
user = User(
id=uuid.uuid4(),
username="testuser",
email="test@example.com",
password_hash=hash_password("testpassword"),
is_active=True,
)
test_db_session.add(user)
await test_db_session.commit()
return user
Test Naming
# Pattern: test_<action>_<condition>_<expected_result>
async def test_get_mediafile_when_not_found_returns_404():
...
async def test_create_mediafile_with_valid_data_returns_201():
...
Common Patterns
Error Handling
# Use specific HTTP exceptions
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Media file not found"
)
# Re-raise with context
try:
result = await external_service.call()
except ExternalError as e:
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="External service unavailable"
) from e
Async Operations
# For CPU-bound work in async context
import anyio
result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)
# For subprocess calls
proc = await asyncio.create_subprocess_exec(
"ffprobe", "-v", "error", file_path,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
Temporary Files
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
tmp_path = tmp.name
try:
# Use tmp_path
...
finally:
# Clean up
Path(tmp_path).unlink(missing_ok=True)
Do's and Don'ts
✅ DO
- Use type hints everywhere
- Write async code for I/O operations
- Use dependency injection
- Keep modules self-contained
- Write tests for new features
- Use meaningful commit messages
- Follow existing patterns in the codebase
❌ DON'T
- Use global mutable state
- Put business logic in routers
- Hardcode configuration values
- Ignore type checker warnings
- Write overly clever code
- Skip error handling
- Mix sync and async DB operations
Quick Reference
| Task | Location |
|---|---|
| Add new endpoint | modules/<module>/router.py |
| Add database model | modules/<module>/models.py |
| Add validation schema | modules/<module>/schemas.py |
| Add business logic | modules/<module>/service.py |
| Add database query | modules/<module>/repository.py |
| Add configuration | infrastructure/settings.py |
| Add shared utility | common/ |
| Add migration | Run alembic revision --autogenerate |
Package Management
This project uses uv as the package manager - a fast Python package installer and resolver written in Rust.
Common Commands
# Install all dependencies
uv sync
# Add a new dependency
uv add <package-name>
# Add a dev dependency
uv add --group dev <package-name>
# Run a command in the virtual environment
uv run <command>
# Run the development server
uv run uvicorn cpv3.main:app --reload
# Run tests
uv run pytest
Why uv?
- Speed - 10-100x faster than pip
- Reliable - Deterministic dependency resolution
- Compatible - Works with standard
pyproject.toml
Dependencies
Key dependencies used in this project:
- FastAPI - Web framework
- SQLAlchemy 2.0 - ORM (async mode)
- Pydantic 2.x - Data validation
- asyncpg - PostgreSQL async driver
- Alembic - Database migrations
- pytest-asyncio - Async testing
- boto3 - AWS S3 storage
- pydub - Audio processing
- openai-whisper - Transcription
- Dramatiq - Background task queue (with Redis broker)
Common AI Agent Mistakes to Avoid
This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.
1. Over-Engineering and Breaking Module Structure
What happened: When asked to implement background tasks, the agent created excessive files:
# BAD - What was created
cpv3/modules/tasks/
├── __init__.py
├── actors.py # ❌ Non-standard
├── base.py # ❌ Non-standard
├── db_helpers.py # ❌ Non-standard
├── webhook_dispatch.py # ❌ Non-standard
├── handlers/ # ❌ Non-standard directory
│ ├── __init__.py
│ ├── base.py
│ ├── media_probe.py
│ ├── silence_remove.py
│ └── ...
├── schemas.py
├── service.py
└── router.py
# GOOD - Standard module structure
cpv3/modules/tasks/
├── __init__.py
├── schemas.py # DTOs only
├── service.py # All business logic including actors
└── router.py # Endpoints only
Why it's wrong:
- Ignored existing module patterns in the codebase
- Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
- Created cognitive overhead for maintainers
Advice:
- ALWAYS examine existing modules first before creating new ones
- Match the existing file naming conventions exactly
- Standard module files:
__init__.py,models.py,schemas.py,repository.py,service.py,router.py - Only create files from this list; consolidate everything else into
service.py
2. Misinterpreting "Make It Flexible" or "Apply SRP"
What happened: When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:
- Abstract base classes (
BaseTaskHandler,BaseTaskSubmitter) - A registry pattern with dynamic handler registration
- Separate files for each handler implementation
- Complex inheritance hierarchies
Why it's wrong:
- SRP doesn't mean "one class per file" or "maximum abstraction"
- Flexibility doesn't mean "prepare for every possible future change"
- This violates the project's core principle: "Less Overhead Is Better"
Advice:
- SRP = one function does one thing, NOT one file per concept
- "Flexible" = easy to modify, NOT infinitely extensible
- When in doubt, keep it in one file and refactor later if needed
- Abstract base classes are rarely needed; prefer composition
# BAD - Over-abstracted
class BaseTaskHandler(ABC):
@abstractmethod
async def validate(self, request): ...
@abstractmethod
async def execute(self, job_id): ...
@abstractmethod
async def on_complete(self, result): ...
class MediaProbeHandler(BaseTaskHandler):
...
# GOOD - Simple and direct
@dramatiq.actor
def media_probe_actor(job_id: str, media_file_id: str) -> None:
"""Probe media file for metadata."""
# All logic here, no inheritance needed
...
3. Not Reading AGENTS.md Before Starting
What happened: The agent proceeded with implementation without fully considering the documented principles, particularly:
- "Avoid unnecessary abstractions and over-engineering"
- "Don't add layers of indirection without clear benefit"
- "Prefer direct solutions over clever ones"
Advice:
- Read AGENTS.md completely before any implementation
- Re-read relevant sections when making architectural decisions
- When the user's request conflicts with AGENTS.md principles, ask for clarification
4. Creating Files Without Checking Existing Patterns
What happened: The agent created handlers/ subdirectory and multiple utility files without checking how other modules handle similar needs.
Advice:
- Before creating ANY new file, run:
ls cpv3/modules/<similar_module>/ - Check if the functionality can fit into existing standard files
- If you need a helper function, put it in
service.py, not a new file - Subdirectories within modules are almost never appropriate
5. Ignoring the "Quick Reference" Table
The AGENTS.md contains a clear reference:
| Task | Location |
|---|---|
| Add new endpoint | modules/<module>/router.py |
| Add database model | modules/<module>/models.py |
| Add validation schema | modules/<module>/schemas.py |
| Add business logic | modules/<module>/service.py |
| Add database query | modules/<module>/repository.py |
Advice:
- Use this table as the ONLY guide for file placement
- If something doesn't fit these categories, it probably belongs in
service.py - Cross-cutting concerns go in
infrastructure/, not in module subdirectories
Summary: The Golden Rules
- Check existing patterns first - Look at 2-3 similar modules before creating anything
- Standard files only -
__init__.py,models.py,schemas.py,repository.py,service.py,router.py - No subdirectories in modules - Everything fits in the standard files
- Consolidate, don't split - When unsure, put it in
service.py - Simple > Clever - Direct code beats abstract patterns
- YAGNI - Don't build for hypothetical future requirements
Last updated: February 2026