599 lines
17 KiB
Markdown
599 lines
17 KiB
Markdown
# AGENTS.md - AI Coding Guidelines for CofeeProject Backend
|
|
|
|
This document provides guidelines and best practices for AI agents working with this codebase.
|
|
|
|
---
|
|
|
|
## Core Principles
|
|
|
|
### 1. Code Should Be Simple, Readable, and Well Supported
|
|
|
|
- Write code that humans can understand at first glance
|
|
- Prefer explicit over implicit behavior
|
|
- Use clear control flow patterns (avoid deeply nested conditions)
|
|
- Add docstrings for public functions, classes, and modules
|
|
- Keep functions short and focused (ideally under 30 lines)
|
|
|
|
### 2. Less Overhead Is Better
|
|
|
|
- Avoid unnecessary abstractions and over-engineering
|
|
- Don't add layers of indirection without clear benefit
|
|
- Prefer direct solutions over clever ones
|
|
- Minimize dependencies where possible
|
|
- Use built-in Python features before reaching for external libraries
|
|
|
|
### 3. No Magic Values
|
|
|
|
- Define constants with meaningful names at module level
|
|
- Use enums or `Literal` types for fixed sets of values (see `ArtifactTypeEnum` pattern)
|
|
- Configuration values belong in `Settings` class with explicit defaults
|
|
- Never hardcode timeouts, limits, or thresholds inline
|
|
|
|
```python
|
|
# BAD
|
|
if silence_db > 16:
|
|
...
|
|
|
|
# GOOD
|
|
SILENCE_THRESHOLD_DB = 16
|
|
|
|
if silence_db > SILENCE_THRESHOLD_DB:
|
|
...
|
|
```
|
|
|
|
### 4. One Function Should Implement One Purpose
|
|
|
|
- Each function should do exactly one thing
|
|
- If a function needs "and" in its description, split it
|
|
- Extract helper functions for distinct subtasks
|
|
- Keep side effects isolated and predictable
|
|
|
|
```python
|
|
# BAD
|
|
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
|
|
...
|
|
|
|
# GOOD
|
|
async def download_media(file_key: str) -> TempFile:
|
|
...
|
|
|
|
def validate_media_format(file_path: str) -> bool:
|
|
...
|
|
|
|
async def process_media(file_path: str) -> MediaResult:
|
|
...
|
|
```
|
|
|
|
### 5. All Variable Names Should Have Meaning Based on Context
|
|
|
|
- Use descriptive names that explain purpose, not type
|
|
- Avoid single-letter variables (except for trivial loops)
|
|
- Prefix boolean variables with `is_`, `has_`, `can_`, `should_`
|
|
- Use domain terminology consistently
|
|
|
|
```python
|
|
# BAD
|
|
x = await repo.get(id)
|
|
flag = x.is_deleted
|
|
|
|
# GOOD
|
|
media_file = await media_repository.get_by_id(media_file_id)
|
|
is_soft_deleted = media_file.is_deleted
|
|
```
|
|
|
|
---
|
|
|
|
## Project Architecture
|
|
|
|
### Layer Structure
|
|
|
|
```
|
|
cpv3/
|
|
├── api/v1/ # API version routing
|
|
├── common/ # Shared schemas and utilities
|
|
├── db/ # Database base classes and session
|
|
├── infrastructure/ # Cross-cutting concerns (auth, storage, settings)
|
|
└── modules/ # Feature modules (domain logic)
|
|
└── <module>/
|
|
├── models.py # SQLAlchemy models
|
|
├── schemas.py # Pydantic DTOs
|
|
├── repository.py # Database access layer
|
|
├── service.py # Business logic
|
|
└── router.py # FastAPI endpoints
|
|
```
|
|
|
|
### Module Responsibilities
|
|
|
|
| Layer | Responsibility | Dependencies |
|
|
| --------------- | ------------------------------------------ | ----------------------------- |
|
|
| `router.py` | HTTP request/response handling, validation | schemas, service, repository |
|
|
| `service.py` | Business logic, orchestration | repository, external services |
|
|
| `repository.py` | Database queries, CRUD operations | models, session |
|
|
| `schemas.py` | Data transfer objects, validation | pydantic |
|
|
| `models.py` | Database table definitions | SQLAlchemy |
|
|
|
|
---
|
|
|
|
## Coding Standards
|
|
|
|
### Python Version & Style
|
|
|
|
- **Python 3.11+** required
|
|
- Use `from __future__ import annotations` for forward references
|
|
- Line length: **100 characters** (configured in ruff)
|
|
- Use type hints for all function signatures
|
|
- Async-first approach for I/O operations
|
|
|
|
### Imports
|
|
|
|
```python
|
|
# Standard library
|
|
from __future__ import annotations
|
|
import uuid
|
|
from datetime import datetime, timezone
|
|
from typing import Literal
|
|
|
|
# Third-party
|
|
from fastapi import APIRouter, Depends, HTTPException, status
|
|
from sqlalchemy.ext.asyncio import AsyncSession
|
|
from pydantic import BaseModel, Field
|
|
|
|
# Local imports (absolute paths)
|
|
from cpv3.infrastructure.auth import get_current_user
|
|
from cpv3.modules.media.schemas import MediaFileRead
|
|
from cpv3.modules.media.repository import MediaFileRepository
|
|
```
|
|
|
|
### Pydantic Schemas
|
|
|
|
- Inherit from `cpv3.common.schemas.Schema` for consistent config
|
|
- Use `Literal` types for enums with string values
|
|
- Suffix schema names: `*Create`, `*Update`, `*Read`
|
|
|
|
```python
|
|
from cpv3.common.schemas import Schema
|
|
|
|
class MediaFileRead(Schema):
|
|
id: UUID
|
|
owner_id: UUID
|
|
duration_seconds: float
|
|
is_deleted: bool
|
|
created_at: datetime
|
|
```
|
|
|
|
### SQLAlchemy Models
|
|
|
|
- Inherit from `Base` and `BaseModelMixin`
|
|
- Use explicit column types
|
|
- Add indexes for frequently queried fields
|
|
- Use soft deletes (`is_deleted` flag)
|
|
|
|
```python
|
|
from cpv3.db.base import Base, BaseModelMixin
|
|
|
|
class MediaFile(Base, BaseModelMixin):
|
|
__tablename__ = "media_files"
|
|
|
|
owner_id: Mapped[uuid.UUID] = mapped_column(
|
|
UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
|
|
)
|
|
is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)
|
|
```
|
|
|
|
### Repository Pattern
|
|
|
|
- One repository per model
|
|
- Accept `AsyncSession` in constructor
|
|
- Methods should be atomic and focused
|
|
- Filter soft-deleted records by default
|
|
|
|
```python
|
|
class MediaFileRepository:
|
|
def __init__(self, session: AsyncSession) -> None:
|
|
self._session = session
|
|
|
|
async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
|
|
result = await self._session.execute(
|
|
select(MediaFile).where(MediaFile.id == media_file_id)
|
|
)
|
|
media_file = result.scalar_one_or_none()
|
|
if media_file is None or media_file.is_deleted:
|
|
return None
|
|
return media_file
|
|
```
|
|
|
|
### FastAPI Endpoints
|
|
|
|
- Use dependency injection for DB session, auth, and services
|
|
- Return typed response models
|
|
- Use appropriate HTTP status codes
|
|
- Handle errors with `HTTPException`
|
|
|
|
```python
|
|
@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
|
|
async def get_mediafile(
|
|
media_file_id: uuid.UUID,
|
|
current_user: User = Depends(get_current_user),
|
|
db: AsyncSession = Depends(get_db),
|
|
) -> MediaFileRead:
|
|
repo = MediaFileRepository(db)
|
|
media_file = await repo.get_by_id(media_file_id)
|
|
if media_file is None:
|
|
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
|
|
return MediaFileRead.model_validate(media_file)
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration & Settings
|
|
|
|
### Environment Variables
|
|
|
|
- All configuration through `Settings` class in `infrastructure/settings.py`
|
|
- Use `Field(default=..., alias="ENV_VAR_NAME")` pattern
|
|
- Provide sensible defaults for local development
|
|
- Never commit secrets to repository
|
|
|
|
```python
|
|
class Settings(BaseSettings):
|
|
jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
|
|
jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
|
|
jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")
|
|
```
|
|
|
|
### Accessing Settings
|
|
|
|
```python
|
|
from cpv3.infrastructure.settings import get_settings
|
|
|
|
settings = get_settings() # Cached via @lru_cache
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Guidelines
|
|
|
|
### Test Structure
|
|
|
|
```
|
|
tests/
|
|
├── conftest.py # Shared fixtures
|
|
├── unit/ # Unit tests (isolated)
|
|
└── integration/ # Integration tests (with DB/services)
|
|
```
|
|
|
|
### Fixtures
|
|
|
|
- Use `pytest-asyncio` for async tests
|
|
- Create isolated database sessions per test
|
|
- Mock external services (storage, APIs)
|
|
|
|
```python
|
|
@pytest.fixture
|
|
async def test_user(test_db_session: AsyncSession) -> User:
|
|
user = User(
|
|
id=uuid.uuid4(),
|
|
username="testuser",
|
|
email="test@example.com",
|
|
password_hash=hash_password("testpassword"),
|
|
is_active=True,
|
|
)
|
|
test_db_session.add(user)
|
|
await test_db_session.commit()
|
|
return user
|
|
```
|
|
|
|
### Test Naming
|
|
|
|
```python
|
|
# Pattern: test_<action>_<condition>_<expected_result>
|
|
async def test_get_mediafile_when_not_found_returns_404():
|
|
...
|
|
|
|
async def test_create_mediafile_with_valid_data_returns_201():
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Common Patterns
|
|
|
|
### Error Handling
|
|
|
|
```python
|
|
# Use specific HTTP exceptions
|
|
raise HTTPException(
|
|
status_code=status.HTTP_404_NOT_FOUND,
|
|
detail="Media file not found"
|
|
)
|
|
|
|
# Re-raise with context
|
|
try:
|
|
result = await external_service.call()
|
|
except ExternalError as e:
|
|
raise HTTPException(
|
|
status_code=status.HTTP_502_BAD_GATEWAY,
|
|
detail="External service unavailable"
|
|
) from e
|
|
```
|
|
|
|
### Async Operations
|
|
|
|
```python
|
|
# For CPU-bound work in async context
|
|
import anyio
|
|
|
|
result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)
|
|
|
|
# For subprocess calls
|
|
proc = await asyncio.create_subprocess_exec(
|
|
"ffprobe", "-v", "error", file_path,
|
|
stdout=asyncio.subprocess.PIPE,
|
|
stderr=asyncio.subprocess.PIPE,
|
|
)
|
|
stdout, stderr = await proc.communicate()
|
|
```
|
|
|
|
### Temporary Files
|
|
|
|
```python
|
|
from tempfile import NamedTemporaryFile
|
|
|
|
with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
|
|
tmp_path = tmp.name
|
|
try:
|
|
# Use tmp_path
|
|
...
|
|
finally:
|
|
# Clean up
|
|
Path(tmp_path).unlink(missing_ok=True)
|
|
```
|
|
|
|
---
|
|
|
|
## Do's and Don'ts
|
|
|
|
### ✅ DO
|
|
|
|
- Use type hints everywhere
|
|
- Write async code for I/O operations
|
|
- Use dependency injection
|
|
- Keep modules self-contained
|
|
- Write tests for new features
|
|
- Use meaningful commit messages
|
|
- Follow existing patterns in the codebase
|
|
|
|
### ❌ DON'T
|
|
|
|
- Use global mutable state
|
|
- Put business logic in routers
|
|
- Hardcode configuration values
|
|
- Ignore type checker warnings
|
|
- Write overly clever code
|
|
- Skip error handling
|
|
- Mix sync and async DB operations
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
| Task | Location |
|
|
| --------------------- | ------------------------------------- |
|
|
| Add new endpoint | `modules/<module>/router.py` |
|
|
| Add database model | `modules/<module>/models.py` |
|
|
| Add validation schema | `modules/<module>/schemas.py` |
|
|
| Add business logic | `modules/<module>/service.py` |
|
|
| Add database query | `modules/<module>/repository.py` |
|
|
| Add configuration | `infrastructure/settings.py` |
|
|
| Add shared utility | `common/` |
|
|
| Add migration | Run `alembic revision --autogenerate` |
|
|
|
|
---
|
|
|
|
## Package Management
|
|
|
|
This project uses **[uv](https://docs.astral.sh/uv/)** as the package manager - a fast Python package installer and resolver written in Rust.
|
|
|
|
### Common Commands
|
|
|
|
```bash
|
|
# Install all dependencies
|
|
uv sync
|
|
|
|
# Add a new dependency
|
|
uv add <package-name>
|
|
|
|
# Add a dev dependency
|
|
uv add --group dev <package-name>
|
|
|
|
# Run a command in the virtual environment
|
|
uv run <command>
|
|
|
|
# Run the development server
|
|
uv run uvicorn cpv3.main:app --reload
|
|
|
|
# Run tests
|
|
uv run pytest
|
|
```
|
|
|
|
### Why uv?
|
|
|
|
- **Speed** - 10-100x faster than pip
|
|
- **Reliable** - Deterministic dependency resolution
|
|
- **Compatible** - Works with standard `pyproject.toml`
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
Key dependencies used in this project:
|
|
|
|
- **FastAPI** - Web framework
|
|
- **SQLAlchemy 2.0** - ORM (async mode)
|
|
- **Pydantic 2.x** - Data validation
|
|
- **asyncpg** - PostgreSQL async driver
|
|
- **Alembic** - Database migrations
|
|
- **pytest-asyncio** - Async testing
|
|
- **boto3** - AWS S3 storage
|
|
- **pydub** - Audio processing
|
|
- **openai-whisper** - Transcription
|
|
- **Dramatiq** - Background task queue (with Redis broker)
|
|
|
|
---
|
|
|
|
## Common AI Agent Mistakes to Avoid
|
|
|
|
This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.
|
|
|
|
### 1. Over-Engineering and Breaking Module Structure
|
|
|
|
**What happened:** When asked to implement background tasks, the agent created excessive files:
|
|
|
|
```
|
|
# BAD - What was created
|
|
cpv3/modules/tasks/
|
|
├── __init__.py
|
|
├── actors.py # ❌ Non-standard
|
|
├── base.py # ❌ Non-standard
|
|
├── db_helpers.py # ❌ Non-standard
|
|
├── webhook_dispatch.py # ❌ Non-standard
|
|
├── handlers/ # ❌ Non-standard directory
|
|
│ ├── __init__.py
|
|
│ ├── base.py
|
|
│ ├── media_probe.py
|
|
│ ├── silence_remove.py
|
|
│ └── ...
|
|
├── schemas.py
|
|
├── service.py
|
|
└── router.py
|
|
|
|
# GOOD - Standard module structure
|
|
cpv3/modules/tasks/
|
|
├── __init__.py
|
|
├── schemas.py # DTOs only
|
|
├── service.py # All business logic including actors
|
|
└── router.py # Endpoints only
|
|
```
|
|
|
|
**Why it's wrong:**
|
|
|
|
- Ignored existing module patterns in the codebase
|
|
- Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
|
|
- Created cognitive overhead for maintainers
|
|
|
|
**Advice:**
|
|
|
|
- **ALWAYS examine existing modules first** before creating new ones
|
|
- **Match the existing file naming conventions exactly**
|
|
- Standard module files: `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
|
|
- Only create files from this list; consolidate everything else into `service.py`
|
|
|
|
---
|
|
|
|
### 2. Misinterpreting "Make It Flexible" or "Apply SRP"
|
|
|
|
**What happened:** When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:
|
|
|
|
- Abstract base classes (`BaseTaskHandler`, `BaseTaskSubmitter`)
|
|
- A registry pattern with dynamic handler registration
|
|
- Separate files for each handler implementation
|
|
- Complex inheritance hierarchies
|
|
|
|
**Why it's wrong:**
|
|
|
|
- SRP doesn't mean "one class per file" or "maximum abstraction"
|
|
- Flexibility doesn't mean "prepare for every possible future change"
|
|
- This violates the project's core principle: **"Less Overhead Is Better"**
|
|
|
|
**Advice:**
|
|
|
|
- SRP = one function does one thing, NOT one file per concept
|
|
- "Flexible" = easy to modify, NOT infinitely extensible
|
|
- When in doubt, keep it in one file and refactor later if needed
|
|
- Abstract base classes are rarely needed; prefer composition
|
|
|
|
```python
|
|
# BAD - Over-abstracted
|
|
class BaseTaskHandler(ABC):
|
|
@abstractmethod
|
|
async def validate(self, request): ...
|
|
@abstractmethod
|
|
async def execute(self, job_id): ...
|
|
@abstractmethod
|
|
async def on_complete(self, result): ...
|
|
|
|
class MediaProbeHandler(BaseTaskHandler):
|
|
...
|
|
|
|
# GOOD - Simple and direct
|
|
@dramatiq.actor
|
|
def media_probe_actor(job_id: str, media_file_id: str) -> None:
|
|
"""Probe media file for metadata."""
|
|
# All logic here, no inheritance needed
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Not Reading AGENTS.md Before Starting
|
|
|
|
**What happened:** The agent proceeded with implementation without fully considering the documented principles, particularly:
|
|
|
|
- "Avoid unnecessary abstractions and over-engineering"
|
|
- "Don't add layers of indirection without clear benefit"
|
|
- "Prefer direct solutions over clever ones"
|
|
|
|
**Advice:**
|
|
|
|
- **Read AGENTS.md completely before any implementation**
|
|
- Re-read relevant sections when making architectural decisions
|
|
- When the user's request conflicts with AGENTS.md principles, ask for clarification
|
|
|
|
---
|
|
|
|
### 4. Creating Files Without Checking Existing Patterns
|
|
|
|
**What happened:** The agent created `handlers/` subdirectory and multiple utility files without checking how other modules handle similar needs.
|
|
|
|
**Advice:**
|
|
|
|
- Before creating ANY new file, run: `ls cpv3/modules/<similar_module>/`
|
|
- Check if the functionality can fit into existing standard files
|
|
- If you need a helper function, put it in `service.py`, not a new file
|
|
- Subdirectories within modules are almost never appropriate
|
|
|
|
---
|
|
|
|
### 5. Ignoring the "Quick Reference" Table
|
|
|
|
The AGENTS.md contains a clear reference:
|
|
|
|
| Task | Location |
|
|
| --------------------- | -------------------------------- |
|
|
| Add new endpoint | `modules/<module>/router.py` |
|
|
| Add database model | `modules/<module>/models.py` |
|
|
| Add validation schema | `modules/<module>/schemas.py` |
|
|
| Add business logic | `modules/<module>/service.py` |
|
|
| Add database query | `modules/<module>/repository.py` |
|
|
|
|
**Advice:**
|
|
|
|
- Use this table as the ONLY guide for file placement
|
|
- If something doesn't fit these categories, it probably belongs in `service.py`
|
|
- Cross-cutting concerns go in `infrastructure/`, not in module subdirectories
|
|
|
|
---
|
|
|
|
### Summary: The Golden Rules
|
|
|
|
1. **Check existing patterns first** - Look at 2-3 similar modules before creating anything
|
|
2. **Standard files only** - `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
|
|
3. **No subdirectories in modules** - Everything fits in the standard files
|
|
4. **Consolidate, don't split** - When unsure, put it in `service.py`
|
|
5. **Simple > Clever** - Direct code beats abstract patterns
|
|
6. **YAGNI** - Don't build for hypothetical future requirements
|
|
|
|
---
|
|
|
|
_Last updated: February 2026_
|