Files
main_backend/AGENTS.md
T
2026-02-04 02:19:50 +03:00

599 lines
17 KiB
Markdown

# AGENTS.md - AI Coding Guidelines for CofeeProject Backend
This document provides guidelines and best practices for AI agents working with this codebase.
---
## Core Principles
### 1. Code Should Be Simple, Readable, and Well Supported
- Write code that humans can understand at first glance
- Prefer explicit over implicit behavior
- Use clear control flow patterns (avoid deeply nested conditions)
- Add docstrings for public functions, classes, and modules
- Keep functions short and focused (ideally under 30 lines)
### 2. Less Overhead Is Better
- Avoid unnecessary abstractions and over-engineering
- Don't add layers of indirection without clear benefit
- Prefer direct solutions over clever ones
- Minimize dependencies where possible
- Use built-in Python features before reaching for external libraries
### 3. No Magic Values
- Define constants with meaningful names at module level
- Use enums or `Literal` types for fixed sets of values (see `ArtifactTypeEnum` pattern)
- Configuration values belong in `Settings` class with explicit defaults
- Never hardcode timeouts, limits, or thresholds inline
```python
# BAD
if silence_db > 16:
...
# GOOD
SILENCE_THRESHOLD_DB = 16
if silence_db > SILENCE_THRESHOLD_DB:
...
```
### 4. One Function Should Implement One Purpose
- Each function should do exactly one thing
- If a function needs "and" in its description, split it
- Extract helper functions for distinct subtasks
- Keep side effects isolated and predictable
```python
# BAD
async def get_and_validate_and_process_media(file_key: str) -> MediaResult:
...
# GOOD
async def download_media(file_key: str) -> TempFile:
...
def validate_media_format(file_path: str) -> bool:
...
async def process_media(file_path: str) -> MediaResult:
...
```
### 5. All Variable Names Should Have Meaning Based on Context
- Use descriptive names that explain purpose, not type
- Avoid single-letter variables (except for trivial loops)
- Prefix boolean variables with `is_`, `has_`, `can_`, `should_`
- Use domain terminology consistently
```python
# BAD
x = await repo.get(id)
flag = x.is_deleted
# GOOD
media_file = await media_repository.get_by_id(media_file_id)
is_soft_deleted = media_file.is_deleted
```
---
## Project Architecture
### Layer Structure
```
cpv3/
├── api/v1/ # API version routing
├── common/ # Shared schemas and utilities
├── db/ # Database base classes and session
├── infrastructure/ # Cross-cutting concerns (auth, storage, settings)
└── modules/ # Feature modules (domain logic)
└── <module>/
├── models.py # SQLAlchemy models
├── schemas.py # Pydantic DTOs
├── repository.py # Database access layer
├── service.py # Business logic
└── router.py # FastAPI endpoints
```
### Module Responsibilities
| Layer | Responsibility | Dependencies |
| --------------- | ------------------------------------------ | ----------------------------- |
| `router.py` | HTTP request/response handling, validation | schemas, service, repository |
| `service.py` | Business logic, orchestration | repository, external services |
| `repository.py` | Database queries, CRUD operations | models, session |
| `schemas.py` | Data transfer objects, validation | pydantic |
| `models.py` | Database table definitions | SQLAlchemy |
---
## Coding Standards
### Python Version & Style
- **Python 3.11+** required
- Use `from __future__ import annotations` for forward references
- Line length: **100 characters** (configured in ruff)
- Use type hints for all function signatures
- Async-first approach for I/O operations
### Imports
```python
# Standard library
from __future__ import annotations
import uuid
from datetime import datetime, timezone
from typing import Literal
# Third-party
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field
# Local imports (absolute paths)
from cpv3.infrastructure.auth import get_current_user
from cpv3.modules.media.schemas import MediaFileRead
from cpv3.modules.media.repository import MediaFileRepository
```
### Pydantic Schemas
- Inherit from `cpv3.common.schemas.Schema` for consistent config
- Use `Literal` types for enums with string values
- Suffix schema names: `*Create`, `*Update`, `*Read`
```python
from cpv3.common.schemas import Schema
class MediaFileRead(Schema):
id: UUID
owner_id: UUID
duration_seconds: float
is_deleted: bool
created_at: datetime
```
### SQLAlchemy Models
- Inherit from `Base` and `BaseModelMixin`
- Use explicit column types
- Add indexes for frequently queried fields
- Use soft deletes (`is_deleted` flag)
```python
from cpv3.db.base import Base, BaseModelMixin
class MediaFile(Base, BaseModelMixin):
__tablename__ = "media_files"
owner_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="RESTRICT"), index=True
)
is_deleted: Mapped[bool] = mapped_column(Boolean, default=False)
```
### Repository Pattern
- One repository per model
- Accept `AsyncSession` in constructor
- Methods should be atomic and focused
- Filter soft-deleted records by default
```python
class MediaFileRepository:
def __init__(self, session: AsyncSession) -> None:
self._session = session
async def get_by_id(self, media_file_id: uuid.UUID) -> MediaFile | None:
result = await self._session.execute(
select(MediaFile).where(MediaFile.id == media_file_id)
)
media_file = result.scalar_one_or_none()
if media_file is None or media_file.is_deleted:
return None
return media_file
```
### FastAPI Endpoints
- Use dependency injection for DB session, auth, and services
- Return typed response models
- Use appropriate HTTP status codes
- Handle errors with `HTTPException`
```python
@router.get("/mediafiles/{media_file_id}", response_model=MediaFileRead)
async def get_mediafile(
media_file_id: uuid.UUID,
current_user: User = Depends(get_current_user),
db: AsyncSession = Depends(get_db),
) -> MediaFileRead:
repo = MediaFileRepository(db)
media_file = await repo.get_by_id(media_file_id)
if media_file is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Not found")
return MediaFileRead.model_validate(media_file)
```
---
## Configuration & Settings
### Environment Variables
- All configuration through `Settings` class in `infrastructure/settings.py`
- Use `Field(default=..., alias="ENV_VAR_NAME")` pattern
- Provide sensible defaults for local development
- Never commit secrets to repository
```python
class Settings(BaseSettings):
jwt_secret_key: str = Field(default="dev-secret", alias="JWT_SECRET_KEY")
jwt_algorithm: str = Field(default="HS256", alias="JWT_ALGORITHM")
jwt_access_ttl_minutes: int = Field(default=60, alias="JWT_ACCESS_TTL_MINUTES")
```
### Accessing Settings
```python
from cpv3.infrastructure.settings import get_settings
settings = get_settings() # Cached via @lru_cache
```
---
## Testing Guidelines
### Test Structure
```
tests/
├── conftest.py # Shared fixtures
├── unit/ # Unit tests (isolated)
└── integration/ # Integration tests (with DB/services)
```
### Fixtures
- Use `pytest-asyncio` for async tests
- Create isolated database sessions per test
- Mock external services (storage, APIs)
```python
@pytest.fixture
async def test_user(test_db_session: AsyncSession) -> User:
user = User(
id=uuid.uuid4(),
username="testuser",
email="test@example.com",
password_hash=hash_password("testpassword"),
is_active=True,
)
test_db_session.add(user)
await test_db_session.commit()
return user
```
### Test Naming
```python
# Pattern: test_<action>_<condition>_<expected_result>
async def test_get_mediafile_when_not_found_returns_404():
...
async def test_create_mediafile_with_valid_data_returns_201():
...
```
---
## Common Patterns
### Error Handling
```python
# Use specific HTTP exceptions
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Media file not found"
)
# Re-raise with context
try:
result = await external_service.call()
except ExternalError as e:
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail="External service unavailable"
) from e
```
### Async Operations
```python
# For CPU-bound work in async context
import anyio
result = await anyio.to_thread.run_sync(cpu_intensive_function, arg1, arg2)
# For subprocess calls
proc = await asyncio.create_subprocess_exec(
"ffprobe", "-v", "error", file_path,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
```
### Temporary Files
```python
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
tmp_path = tmp.name
try:
# Use tmp_path
...
finally:
# Clean up
Path(tmp_path).unlink(missing_ok=True)
```
---
## Do's and Don'ts
### ✅ DO
- Use type hints everywhere
- Write async code for I/O operations
- Use dependency injection
- Keep modules self-contained
- Write tests for new features
- Use meaningful commit messages
- Follow existing patterns in the codebase
### ❌ DON'T
- Use global mutable state
- Put business logic in routers
- Hardcode configuration values
- Ignore type checker warnings
- Write overly clever code
- Skip error handling
- Mix sync and async DB operations
---
## Quick Reference
| Task | Location |
| --------------------- | ------------------------------------- |
| Add new endpoint | `modules/<module>/router.py` |
| Add database model | `modules/<module>/models.py` |
| Add validation schema | `modules/<module>/schemas.py` |
| Add business logic | `modules/<module>/service.py` |
| Add database query | `modules/<module>/repository.py` |
| Add configuration | `infrastructure/settings.py` |
| Add shared utility | `common/` |
| Add migration | Run `alembic revision --autogenerate` |
---
## Package Management
This project uses **[uv](https://docs.astral.sh/uv/)** as the package manager - a fast Python package installer and resolver written in Rust.
### Common Commands
```bash
# Install all dependencies
uv sync
# Add a new dependency
uv add <package-name>
# Add a dev dependency
uv add --group dev <package-name>
# Run a command in the virtual environment
uv run <command>
# Run the development server
uv run uvicorn cpv3.main:app --reload
# Run tests
uv run pytest
```
### Why uv?
- **Speed** - 10-100x faster than pip
- **Reliable** - Deterministic dependency resolution
- **Compatible** - Works with standard `pyproject.toml`
---
## Dependencies
Key dependencies used in this project:
- **FastAPI** - Web framework
- **SQLAlchemy 2.0** - ORM (async mode)
- **Pydantic 2.x** - Data validation
- **asyncpg** - PostgreSQL async driver
- **Alembic** - Database migrations
- **pytest-asyncio** - Async testing
- **boto3** - AWS S3 storage
- **pydub** - Audio processing
- **openai-whisper** - Transcription
- **Dramatiq** - Background task queue (with Redis broker)
---
## Common AI Agent Mistakes to Avoid
This section documents real errors made during AI-assisted development sessions. Learn from these mistakes.
### 1. Over-Engineering and Breaking Module Structure
**What happened:** When asked to implement background tasks, the agent created excessive files:
```
# BAD - What was created
cpv3/modules/tasks/
├── __init__.py
├── actors.py # ❌ Non-standard
├── base.py # ❌ Non-standard
├── db_helpers.py # ❌ Non-standard
├── webhook_dispatch.py # ❌ Non-standard
├── handlers/ # ❌ Non-standard directory
│ ├── __init__.py
│ ├── base.py
│ ├── media_probe.py
│ ├── silence_remove.py
│ └── ...
├── schemas.py
├── service.py
└── router.py
# GOOD - Standard module structure
cpv3/modules/tasks/
├── __init__.py
├── schemas.py # DTOs only
├── service.py # All business logic including actors
└── router.py # Endpoints only
```
**Why it's wrong:**
- Ignored existing module patterns in the codebase
- Added unnecessary abstraction layers (BaseTaskHandler, registry pattern)
- Created cognitive overhead for maintainers
**Advice:**
- **ALWAYS examine existing modules first** before creating new ones
- **Match the existing file naming conventions exactly**
- Standard module files: `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
- Only create files from this list; consolidate everything else into `service.py`
---
### 2. Misinterpreting "Make It Flexible" or "Apply SRP"
**What happened:** When asked to "make tasks module more flexible with SRP compliance", the agent interpreted this as creating:
- Abstract base classes (`BaseTaskHandler`, `BaseTaskSubmitter`)
- A registry pattern with dynamic handler registration
- Separate files for each handler implementation
- Complex inheritance hierarchies
**Why it's wrong:**
- SRP doesn't mean "one class per file" or "maximum abstraction"
- Flexibility doesn't mean "prepare for every possible future change"
- This violates the project's core principle: **"Less Overhead Is Better"**
**Advice:**
- SRP = one function does one thing, NOT one file per concept
- "Flexible" = easy to modify, NOT infinitely extensible
- When in doubt, keep it in one file and refactor later if needed
- Abstract base classes are rarely needed; prefer composition
```python
# BAD - Over-abstracted
class BaseTaskHandler(ABC):
@abstractmethod
async def validate(self, request): ...
@abstractmethod
async def execute(self, job_id): ...
@abstractmethod
async def on_complete(self, result): ...
class MediaProbeHandler(BaseTaskHandler):
...
# GOOD - Simple and direct
@dramatiq.actor
def media_probe_actor(job_id: str, media_file_id: str) -> None:
"""Probe media file for metadata."""
# All logic here, no inheritance needed
...
```
---
### 3. Not Reading AGENTS.md Before Starting
**What happened:** The agent proceeded with implementation without fully considering the documented principles, particularly:
- "Avoid unnecessary abstractions and over-engineering"
- "Don't add layers of indirection without clear benefit"
- "Prefer direct solutions over clever ones"
**Advice:**
- **Read AGENTS.md completely before any implementation**
- Re-read relevant sections when making architectural decisions
- When the user's request conflicts with AGENTS.md principles, ask for clarification
---
### 4. Creating Files Without Checking Existing Patterns
**What happened:** The agent created `handlers/` subdirectory and multiple utility files without checking how other modules handle similar needs.
**Advice:**
- Before creating ANY new file, run: `ls cpv3/modules/<similar_module>/`
- Check if the functionality can fit into existing standard files
- If you need a helper function, put it in `service.py`, not a new file
- Subdirectories within modules are almost never appropriate
---
### 5. Ignoring the "Quick Reference" Table
The AGENTS.md contains a clear reference:
| Task | Location |
| --------------------- | -------------------------------- |
| Add new endpoint | `modules/<module>/router.py` |
| Add database model | `modules/<module>/models.py` |
| Add validation schema | `modules/<module>/schemas.py` |
| Add business logic | `modules/<module>/service.py` |
| Add database query | `modules/<module>/repository.py` |
**Advice:**
- Use this table as the ONLY guide for file placement
- If something doesn't fit these categories, it probably belongs in `service.py`
- Cross-cutting concerns go in `infrastructure/`, not in module subdirectories
---
### Summary: The Golden Rules
1. **Check existing patterns first** - Look at 2-3 similar modules before creating anything
2. **Standard files only** - `__init__.py`, `models.py`, `schemas.py`, `repository.py`, `service.py`, `router.py`
3. **No subdirectories in modules** - Everything fits in the standard files
4. **Consolidate, don't split** - When unsure, put it in `service.py`
5. **Simple > Clever** - Direct code beats abstract patterns
6. **YAGNI** - Don't build for hypothetical future requirements
---
_Last updated: February 2026_