main_backend/AGENTS.md

# AGENTS.md - AI Coding Guidelines for CofeeProject Backend

This document provides guidelines and best practices for AI agents working with this codebase.

---

## Core Principles

### 1. Code Should Be Simple, Readable, and Well Supported

- Write code that humans can understand at first glance
- Prefer explicit over implicit behavior
- Use clear control flow patterns (avoid deeply nested conditions)
- Add docstrings for public functions, classes, and modules
- Keep functions short and focused (ideally under 30 lines)

### 2. Less Overhead Is Better

- Avoid unnecessary abstractions and over-engineering
- Don't add layers of indirection without clear benefit
- Prefer direct solutions over clever ones
- Minimize dependencies where possible
- Use built-in Python features before reaching for external libraries

### 3. No Magic Values

- Define constants with meaningful names at module level
- Use enums or `Literal` types for fixed sets of values (see `ArtifactTypeEnum` pattern)
- Configuration values belong in `Settings` class with explicit defaults
- Never hardcode timeouts, limits, or thresholds inline

```python
# BAD
if silence_db > 16:
    ...

# GOOD
SILENCE_THRESHOLD_DB = 16

if silence_db > SILENCE_THRESHOLD_DB:
    ...
```

### 4. One Function Should Implement One Purpose

- Each function should do exactly one thing
- If a function needs "and" in its description, split it
- Extract helper functions for distinct subtasks
- Keep side effects isolated and predictable

```python
# BAD
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
    ...

# GOOD
async def download_media(file_key: str) -> TempFile:
    ...

def validate_media_format(file_path: str) -> bool:
    ...

async def process_media(file_path: str) -> MediaResult:
    ...
```

### 5. All Variable Names Should Have Meaning Based on Context

- Use descriptive names that explain purpose, not type
- Avoid single-letter variables (except for trivial loops)
- Prefix boolean variables with `is_`, `has_`, `can_`, `should_`
- Use domain terminology consistently

```python
# BAD
x = await repo.get(id)
flag = x.is_deleted

# GOOD
media_file = await media_repository.get_by_id(media_file_id)
is_soft_deleted = media_file.is_deleted
```

---

## Project Architecture

### Layer Structure

```
cpv3/
├── api/v1/          # API version routing
├── common/          # Shared schemas and utilities
├── db/              # Database base classes and session
├── infrastructure/  # Cross-cutting concerns (auth, storage, settings)
└── modules/         # Feature modules (domain logic)
    └── <module>/
        ├── models.py      # SQLAlchemy models
        ├── schemas.py     # Pydantic DTOs
        ├── repository.py  # Database access layer
        ├── service.py     # Business logic
        └── router.py      # FastAPI endpoints
```

### Module Responsibilities

| Layer           | Responsibility                             | Dependencies                  |
| --------------- | ------------------------------------------ | ----------------------------- |
| `router.py`     | HTTP request/response handling, validation | schemas, service, repository  |
| `service.py`    | Business logic, orchestration              | repository, external services |
| `repository.py` | Database queries, CRUD operations          | models, session               |
| `schemas.py`    | Data transfer objects, validation          | pydantic                      |
| `models.py`     | Database table definitions                 | SQLAlchemy                    |

---

## Coding Standards

### Python Version & Style

- **Python 3.11+** required
- Use `from __future__ import annotations` for forward references
- Line length: **100 characters** (configured in ruff)
- Use type hints for all function signatures
- Async-first approach for I/O operations

### Imports

```python
# Standard library
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from typing import Literal

# Third-party
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field

# Local imports (absolute paths)
from cpv3.infrastructure.auth import get_current_user
from cpv3.modules.media.schemas import MediaFileRead
from cpv3.modules.media.repository import MediaFileRepository
```

### Pydantic Schemas

- Inherit from `cpv3.common.schemas.Schema` for consistent config
- Use `Literal` types for enums with string values
- Suffix schema names: `*Create`, `*Update`, `*Read`

```python
from cpv3.common.schemas import Schema

class MediaFileRead(Schema):
    id: UUID
    owner_id: UUID
    duration_seconds: float
    is_deleted: bool
    created_at: datetime
```

### SQLAlchemy Models

- Inherit from `Base` and `BaseModelMixin`
- Use explicit column types
- Add indexes for frequently queried fields
- Use soft deletes (`is_deleted` flag)

```python
from cpv3.db.base import Base, BaseModelMixin

class MediaFile(Base, BaseModelMixin):
    __tablename__ = "media_files"

    owner_id: Mapped[uuid.UUID] = mapped_column(
        UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
    )
    is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)
```

### Repository Pattern

- One repository per model
- Accept `AsyncSession` in constructor
- Methods should be atomic and focused
- Filter soft-deleted records by default

```python
class MediaFileRepository:
    def __init__(self, session: AsyncSession) -> None:
        self._session = session

    async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
        result = await self._session.execute(
            select(MediaFile).where(MediaFile.id == media_file_id)
        )
        media_file = result.scalar_one_or_none()
        if media_file is None or media_file.is_deleted:
            return None
        return media_file
```

### FastAPI Endpoints

- Use dependency injection for DB session, auth, and services
- Return typed response models
- Use appropriate HTTP status codes
- Handle errors with `HTTPException`

```python
@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
async def get_mediafile(
    media_file_id: uuid.UUID,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
) -> MediaFileRead:
    repo = MediaFileRepository(db)
    media_file = await repo.get_by_id(media_file_id)
    if media_file is None:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
    return MediaFileRead.model_validate(media_file)
```

---

## Configuration & Settings

### Environment Variables

- All configuration through `Settings` class in `infrastructure/settings.py`
- Use `Field(default=..., alias="ENV_VAR_NAME")` pattern
- Provide sensible defaults for local development
- Never commit secrets to repository

```python
class Settings(BaseSettings):
    jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
    jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
    jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")
```

### Accessing Settings

```python
from cpv3.infrastructure.settings import get_settings

settings = get_settings()  # Cached via @lru_cache
```

---

## Testing Guidelines

### Test Structure

```
tests/
├── conftest.py          # Shared fixtures
├── unit/                # Unit tests (isolated)
└── integration/         # Integration tests (with DB/services)
```

### Fixtures

- Use `pytest-asyncio` for async tests
- Create isolated database sessions per test
- Mock external services (storage, APIs)

```python
@pytest.fixture
async def test_user(test_db_session: AsyncSession) -> User:
    user = User(
        id=uuid.uuid4(),
        username="testuser",
        email="test@example.com",
        password_hash=hash_password("testpassword"),
        is_active=True,
    )
    test_db_session.add(user)
    await test_db_session.commit()
    return user
```

### Test Naming

```python
# Pattern: test_<action>_<condition>_<expected_result>
async def test_get_mediafile_when_not_found_returns_404():
    ...

async def test_create_mediafile_with_valid_data_returns_201():
    ...
```

---

## Common Patterns

### Error Handling

```python
# Use specific HTTP exceptions
raise HTTPException(
    status_code=status.HTTP_404_NOT_FOUND,
    detail="Media file not found"
)

# Re-raise with context
try:
    result = await external_service.call()
except ExternalError as e:
    raise HTTPException(
        status_code=status.HTTP_502_BAD_GATEWAY,
        detail="External service unavailable"
    ) from e
```

### Async Operations

```python
# For CPU-bound work in async context
import anyio

result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)

# For subprocess calls
proc = await asyncio.create_subprocess_exec(
    "ffprobe", "-v", "error", file_path,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
```

### Temporary Files

```python
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
    tmp_path = tmp.name
try:
    # Use tmp_path
    ...
finally:
    # Clean up
    Path(tmp_path).unlink(missing_ok=True)
```

---

## Do's and Don'ts

### ✅ DO

- Use type hints everywhere
- Write async code for I/O operations
- Use dependency injection
- Keep modules self-contained
- Write tests for new features
- Use meaningful commit messages
- Follow existing patterns in the codebase

### ❌ DON'T

- Use global mutable state
- Put business logic in routers
- Hardcode configuration values
- Ignore type checker warnings
- Write overly clever code
- Skip error handling
- Mix sync and async DB operations

---

## Quick Reference

| Task                  | Location                              |
| --------------------- | ------------------------------------- |
| Add new endpoint      | `modules/<module>/router.py`          |
| Add database model    | `modules/<module>/models.py`          |
| Add validation schema | `modules/<module>/schemas.py`         |
| Add business logic    | `modules/<module>/service.py`         |
| Add database query    | `modules/<module>/repository.py`      |
| Add configuration     | `infrastructure/settings.py`          |
| Add shared utility    | `common/`                             |
| Add migration         | Run `alembic revision --autogenerate` |

---

## Package Management

This project uses **[uv](https://docs.astral.sh/uv/)** as the package manager - a fast Python package installer and resolver written in Rust.

### Common Commands

```bash
# Install all dependencies
uv sync

# Add a new dependency
uv add <package-name>

# Add a dev dependency
uv add --group dev <package-name>

# Run a command in the virtual environment
uv run <command>

# Run the development server
uv run uvicorn cpv3.main:app --reload

# Run tests
uv run pytest
```

### Why uv?

- **Speed** - 10-100x faster than pip
- **Reliable** - Deterministic dependency resolution
- **Compatible** - Works with standard `pyproject.toml`

---

## Dependencies

Key dependencies used in this project:

- **FastAPI** - Web framework
- **SQLAlchemy 2.0** - ORM (async mode)
- **Pydantic 2.x** - Data validation
- **asyncpg** - PostgreSQL async driver
- **Alembic** - Database migrations
- **pytest-asyncio** - Async testing
- **boto3** - AWS S3 storage
- **pydub** - Audio processing
- **openai-whisper** - Transcription
- **Dramatiq** - Background task queue (with Redis broker)

---

## Common AI Agent Mistakes to Avoid

This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.

### 1. Over-Engineering and Breaking Module Structure

**What happened:** When asked to implement background tasks, the agent created excessive files:

```
# BAD - What was created
cpv3/modules/tasks/
├── __init__.py
├── actors.py           # ❌ Non-standard
├── base.py             # ❌ Non-standard
├── db_helpers.py       # ❌ Non-standard
├── webhook_dispatch.py # ❌ Non-standard
├── handlers/           # ❌ Non-standard directory
│   ├── __init__.py
│   ├── base.py
│   ├── media_probe.py
│   ├── silence_remove.py
│   └── ...
├── schemas.py
├── service.py
└── router.py

# GOOD - Standard module structure
cpv3/modules/tasks/
├── __init__.py
├── schemas.py    # DTOs only
├── service.py    # All business logic including actors
└── router.py     # Endpoints only
```

**Why it's wrong:**

- Ignored existing module patterns in the codebase
- Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
- Created cognitive overhead for maintainers

**Advice:**

- **ALWAYS examine existing modules first** before creating new ones
- **Match the existing file naming conventions exactly**
- Standard module files: `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
- Only create files from this list; consolidate everything else into `service.py`

---

### 2. Misinterpreting "Make It Flexible" or "Apply SRP"

**What happened:** When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:

- Abstract base classes (`BaseTaskHandler`, `BaseTaskSubmitter`)
- A registry pattern with dynamic handler registration
- Separate files for each handler implementation
- Complex inheritance hierarchies

**Why it's wrong:**

- SRP doesn't mean "one class per file" or "maximum abstraction"
- Flexibility doesn't mean "prepare for every possible future change"
- This violates the project's core principle: **"Less Overhead Is Better"**

**Advice:**

- SRP = one function does one thing, NOT one file per concept
- "Flexible" = easy to modify, NOT infinitely extensible
- When in doubt, keep it in one file and refactor later if needed
- Abstract base classes are rarely needed; prefer composition

```python
# BAD - Over-abstracted
class BaseTaskHandler(ABC):
    @abstractmethod
    async def validate(self, request): ...
    @abstractmethod
    async def execute(self, job_id): ...
    @abstractmethod
    async def on_complete(self, result): ...

class MediaProbeHandler(BaseTaskHandler):
    ...

# GOOD - Simple and direct
@dramatiq.actor
def media_probe_actor(job_id: str, media_file_id: str) -> None:
    """Probe media file for metadata."""
    # All logic here, no inheritance needed
    ...
```

---

### 3. Not Reading AGENTS.md Before Starting

**What happened:** The agent proceeded with implementation without fully considering the documented principles, particularly:

- "Avoid unnecessary abstractions and over-engineering"
- "Don't add layers of indirection without clear benefit"
- "Prefer direct solutions over clever ones"

**Advice:**

- **Read AGENTS.md completely before any implementation**
- Re-read relevant sections when making architectural decisions
- When the user's request conflicts with AGENTS.md principles, ask for clarification

---

### 4. Creating Files Without Checking Existing Patterns

**What happened:** The agent created `handlers/` subdirectory and multiple utility files without checking how other modules handle similar needs.

**Advice:**

- Before creating ANY new file, run: `ls cpv3/modules/<similar_module>/`
- Check if the functionality can fit into existing standard files
- If you need a helper function, put it in `service.py`, not a new file
- Subdirectories within modules are almost never appropriate

---

### 5. Ignoring the "Quick Reference" Table

The AGENTS.md contains a clear reference:

| Task                  | Location                         |
| --------------------- | -------------------------------- |
| Add new endpoint      | `modules/<module>/router.py`     |
| Add database model    | `modules/<module>/models.py`     |
| Add validation schema | `modules/<module>/schemas.py`    |
| Add business logic    | `modules/<module>/service.py`    |
| Add database query    | `modules/<module>/repository.py` |

**Advice:**

- Use this table as the ONLY guide for file placement
- If something doesn't fit these categories, it probably belongs in `service.py`
- Cross-cutting concerns go in `infrastructure/`, not in module subdirectories

---

### Summary: The Golden Rules

1. **Check existing patterns first** - Look at 2-3 similar modules before creating anything
2. **Standard files only** - `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
3. **No subdirectories in modules** - Everything fits in the standard files
4. **Consolidate, don't split** - When unsure, put it in `service.py`
5. **Simple > Clever** - Direct code beats abstract patterns
6. **YAGNI** - Don't build for hypothetical future requirements

---

_Last updated: February 2026_