docs initial
This commit is contained in:
@@ -0,0 +1,888 @@
|
||||
# Docker Infrastructure Hardening — Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Harden all Docker infrastructure across the monorepo — security, build optimization, service organization, health checks, and networking.
|
||||
|
||||
**Architecture:** 4-phase approach: quick config fixes first (no code changes), then Dockerfile improvements, then health endpoints + networking, then resource limits. Each phase produces a working stack.
|
||||
|
||||
**Tech Stack:** Docker, Docker Compose, FastAPI (Python), ElysiaJS (Bun/TypeScript), PostgreSQL, Redis, MinIO
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add .env to .gitignore files
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/.gitignore`
|
||||
- Modify: `cofee_frontend/.gitignore`
|
||||
|
||||
- [ ] **Step 1: Add .env exclusion to backend .gitignore**
|
||||
|
||||
Append to `cofee_backend/.gitignore`:
|
||||
```
|
||||
# Environment
|
||||
.env
|
||||
.env.*
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add .env exclusion to frontend .gitignore**
|
||||
|
||||
The frontend `.gitignore` has `.env*.local` but not `.env` itself. Add before the `# local env files` section in `cofee_frontend/.gitignore`:
|
||||
```
|
||||
# Environment
|
||||
.env
|
||||
```
|
||||
Note: Keep the existing `.env*.local` line too.
|
||||
|
||||
- [ ] **Step 3: Verify .env files are not tracked**
|
||||
|
||||
Run: `git ls-files | grep '\.env'`
|
||||
Expected: no output. If any .env files are tracked, run `git rm --cached <file>` for each.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/.gitignore cofee_frontend/.gitignore
|
||||
git commit -m "fix(infra): add .env to backend and frontend .gitignore"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Add .env to backend .dockerignore
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/.dockerignore`
|
||||
|
||||
- [ ] **Step 1: Add .env exclusion**
|
||||
|
||||
Add to `cofee_backend/.dockerignore`:
|
||||
```
|
||||
.env
|
||||
.env.*
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/.dockerignore
|
||||
git commit -m "fix(infra): exclude .env from backend Docker build context"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: DRY up docker-compose env vars with YAML anchor
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/docker-compose.yml`
|
||||
|
||||
The `api` and `worker` services share 14 identical env vars. Extract into an `x-backend-env` anchor. Also adds the missing `JWT_SECRET_KEY` to worker.
|
||||
|
||||
- [ ] **Step 1: Add x-backend-env anchor and refactor services**
|
||||
|
||||
Replace the entire `cofee_backend/docker-compose.yml` with:
|
||||
|
||||
```yaml
|
||||
x-backend-image: &backend-image
|
||||
image: cpv3-backend:dev
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
target: dev
|
||||
|
||||
x-backend-env: &backend-env
|
||||
DEBUG: ${DEBUG:-1}
|
||||
JWT_SECRET_KEY: ${JWT_SECRET_KEY:-dev-secret}
|
||||
|
||||
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
||||
POSTGRES_HOST: db
|
||||
POSTGRES_PORT: 5432
|
||||
POSTGRES_DATABASE: ${POSTGRES_DATABASE:-coffee_project_db}
|
||||
|
||||
STORAGE_BACKEND: ${STORAGE_BACKEND:-S3}
|
||||
|
||||
S3_ACCESS_KEY: ${MINIO_ROOT_USER:-minioadmin}
|
||||
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-minioadmin}
|
||||
S3_BUCKET_NAME: ${S3_BUCKET_NAME:-coffee-bucket}
|
||||
S3_ENDPOINT_URL_INTERNAL: http://minio:9000
|
||||
S3_ENDPOINT_URL_PUBLIC: http://localhost:9000
|
||||
|
||||
REDIS_URL: redis://redis:6379/0
|
||||
WEBHOOK_BASE_URL: http://api:8000
|
||||
|
||||
REMOTION_SERVICE_URL: ${REMOTION_SERVICE_URL:-http://remotion:3001}
|
||||
|
||||
services:
|
||||
db:
|
||||
container_name: cpv3_postgres
|
||||
image: postgres:16
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
||||
POSTGRES_DB: ${POSTGRES_DATABASE:-coffee_project_db}
|
||||
ports:
|
||||
- "127.0.0.1:5332:5432"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-coffee_project_db}"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 20
|
||||
volumes:
|
||||
- cpv3_db:/var/lib/postgresql/data
|
||||
|
||||
minio:
|
||||
container_name: cpv3_minio
|
||||
image: minio/minio:RELEASE.2024-11-07T00-52-20Z
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "127.0.0.1:9000:9000"
|
||||
- "127.0.0.1:9001:9001"
|
||||
environment:
|
||||
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minioadmin}
|
||||
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minioadmin}
|
||||
command: server /data --console-address ":9001"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
volumes:
|
||||
- cpv3_minio:/data
|
||||
|
||||
redis:
|
||||
container_name: cpv3_redis
|
||||
image: redis:7-alpine
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "127.0.0.1:6379:6379"
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 10
|
||||
volumes:
|
||||
- cpv3_redis:/data
|
||||
|
||||
api:
|
||||
container_name: cpv3_api
|
||||
<<: *backend-image
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
<<: *backend-env
|
||||
ports:
|
||||
- "127.0.0.1:8000:8000"
|
||||
volumes:
|
||||
- ./cpv3:/app/cpv3
|
||||
- ./alembic:/app/alembic
|
||||
- ./alembic.ini:/app/alembic.ini
|
||||
|
||||
worker:
|
||||
container_name: cpv3_worker
|
||||
<<: *backend-image
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
<<: *backend-env
|
||||
command: >
|
||||
watchfiles --filter python 'dramatiq cpv3.modules.tasks.service --processes 1 --threads 2' /app/cpv3
|
||||
volumes:
|
||||
- ./cpv3:/app/cpv3
|
||||
|
||||
volumes:
|
||||
cpv3_db:
|
||||
cpv3_minio:
|
||||
cpv3_redis:
|
||||
```
|
||||
|
||||
Key changes in this file:
|
||||
- `x-backend-env` anchor with all shared env vars (DRY)
|
||||
- `JWT_SECRET_KEY` added to worker (was missing)
|
||||
- `restart: unless-stopped` on all services
|
||||
- All ports bound to `127.0.0.1` (not `0.0.0.0`)
|
||||
- MinIO pinned to `RELEASE.2024-11-07T00-52-20Z`
|
||||
- MinIO health check added (`curl` on `/minio/health/live`)
|
||||
- Removed inline comments for cleanliness
|
||||
|
||||
- [ ] **Step 2: Validate compose syntax**
|
||||
|
||||
Run: `cd cofee_backend && docker compose config > /dev/null`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 3: Test stack starts**
|
||||
|
||||
Run: `cd cofee_backend && docker compose up -d`
|
||||
Wait 30s, then: `docker compose ps`
|
||||
Expected: all services `Up` or `Up (healthy)`.
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/docker-compose.yml
|
||||
git commit -m "refactor(infra): DRY env vars, pin images, bind localhost, add restart policies"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Move build-essential out of base stage in backend Dockerfile
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/Dockerfile`
|
||||
|
||||
`build-essential` is only needed during `uv sync` (compiling C extensions). Moving it from `base` to `deps` saves ~200MB in the prod image since the `prod` stage inherits from `deps` but the compiled artifacts are in `.venv`, not the system packages.
|
||||
|
||||
- [ ] **Step 1: Restructure Dockerfile stages**
|
||||
|
||||
Replace the entire `cofee_backend/Dockerfile` with:
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1.7
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 1: base — minimal runtime dependencies (shared by dev and prod)
|
||||
# ---------------------------------------------------------------------------
|
||||
FROM python:3.11-slim AS base
|
||||
|
||||
COPY --from=ghcr.io/astral-sh/uv:0.8.15 /uv /uvx /bin/
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONUNBUFFERED=1 \
|
||||
PATH="/app/.venv/bin:${PATH}"
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
||||
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
||||
apt-get update && apt-get install -y --no-install-recommends \
|
||||
ffmpeg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2: deps — install Python dependencies (build-essential here only)
|
||||
# ---------------------------------------------------------------------------
|
||||
FROM base AS deps
|
||||
|
||||
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
||||
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
||||
apt-get update && apt-get install -y --no-install-recommends \
|
||||
build-essential \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY pyproject.toml uv.lock ./
|
||||
RUN --mount=type=cache,target=/root/.cache/uv \
|
||||
uv sync --frozen --no-dev --no-install-project
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 3: dev — development target (used by docker-compose)
|
||||
# ---------------------------------------------------------------------------
|
||||
FROM deps AS dev
|
||||
|
||||
ENV PYTHONPATH=/app
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir /app/cpv3"]
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 4: prod — production target (no build-essential, non-root user)
|
||||
# ---------------------------------------------------------------------------
|
||||
FROM base AS prod
|
||||
|
||||
RUN groupadd --gid 1000 app && \
|
||||
useradd --uid 1000 --gid app --create-home app
|
||||
|
||||
COPY --from=deps /app/.venv /app/.venv
|
||||
COPY pyproject.toml uv.lock ./
|
||||
|
||||
ENV UV_LINK_MODE=copy
|
||||
|
||||
COPY cpv3 ./cpv3
|
||||
COPY alembic ./alembic
|
||||
COPY alembic.ini ./
|
||||
RUN --mount=type=cache,target=/root/.cache/uv \
|
||||
uv sync --frozen --no-dev
|
||||
|
||||
RUN chown -R app:app /app
|
||||
USER app
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000"]
|
||||
```
|
||||
|
||||
Key changes:
|
||||
- `build-essential` moved from `base` to `deps` — prod image is ~200MB smaller
|
||||
- `prod` stage inherits from `base` (not `deps`) — no compiler in production
|
||||
- `prod` copies only `.venv` from `deps` stage — gets compiled packages without build tools
|
||||
- Non-root `app` user (uid 1000) added to `prod` stage
|
||||
- `dev` stage still inherits from `deps` (has build-essential for potential ad-hoc installs)
|
||||
|
||||
- [ ] **Step 2: Build and verify prod stage**
|
||||
|
||||
Run: `cd cofee_backend && docker build --target prod -t cpv3-backend:prod-test .`
|
||||
Expected: builds successfully.
|
||||
|
||||
- [ ] **Step 3: Build and verify dev stage**
|
||||
|
||||
Run: `cd cofee_backend && docker build --target dev -t cpv3-backend:dev-test .`
|
||||
Expected: builds successfully.
|
||||
|
||||
- [ ] **Step 4: Verify dev stack still works**
|
||||
|
||||
Run: `cd cofee_backend && docker compose up -d --build`
|
||||
Wait 30s, then: `docker compose ps`
|
||||
Expected: all services running.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/Dockerfile
|
||||
git commit -m "perf(infra): move build-essential to deps stage, add non-root user to prod"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Add BuildKit cache mounts and non-root user to Remotion Dockerfile
|
||||
|
||||
**Files:**
|
||||
- Modify: `remotion_service/Dockerfile`
|
||||
|
||||
- [ ] **Step 1: Update Remotion Dockerfile**
|
||||
|
||||
Replace the entire `remotion_service/Dockerfile` with:
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1.7-labs
|
||||
FROM oven/bun:1.3.10 AS base
|
||||
|
||||
ENV APP_HOME=/app \
|
||||
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 \
|
||||
REMOTION_PUPPETEER_NO_SANDBOX=1 \
|
||||
NODE_ENV=production
|
||||
|
||||
WORKDIR ${APP_HOME}
|
||||
|
||||
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
||||
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
||||
apt-get update && \
|
||||
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
ffmpeg \
|
||||
chromium \
|
||||
libglib2.0-0 \
|
||||
libnss3 \
|
||||
libatk1.0-0 \
|
||||
libatk-bridge2.0-0 \
|
||||
libdrm2 \
|
||||
libxkbcommon0 \
|
||||
libgbm1 \
|
||||
fonts-noto-color-emoji \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
FROM base AS deps
|
||||
WORKDIR ${APP_HOME}
|
||||
COPY package.json bun.lock ./
|
||||
RUN NODE_ENV=development bun install --frozen-lockfile
|
||||
|
||||
FROM base AS runner
|
||||
WORKDIR ${APP_HOME}
|
||||
|
||||
RUN groupadd --gid 1000 app && \
|
||||
useradd --uid 1000 --gid app --create-home app
|
||||
|
||||
COPY --from=deps ${APP_HOME}/node_modules ./node_modules
|
||||
COPY package.json bun.lock ./
|
||||
COPY tsconfig.json remotion.config.ts ./
|
||||
COPY public ./public
|
||||
COPY src ./src
|
||||
COPY server ./server
|
||||
|
||||
RUN mkdir -p out && chown -R app:app /app
|
||||
|
||||
USER app
|
||||
|
||||
EXPOSE 3001
|
||||
|
||||
CMD ["bun", "run", "server"]
|
||||
```
|
||||
|
||||
Key changes:
|
||||
- BuildKit apt cache mounts added (matches backend pattern)
|
||||
- Non-root `app` user (uid 1000) in runner stage
|
||||
- `chown` before `USER app` so the app owns all files including `out/`
|
||||
|
||||
- [ ] **Step 2: Build and verify**
|
||||
|
||||
Run: `cd remotion_service && docker build --target runner -t remotion:test .`
|
||||
Expected: builds successfully.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add remotion_service/Dockerfile
|
||||
git commit -m "perf(infra): add BuildKit cache mounts and non-root user to Remotion Dockerfile"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Add resource limits and cap_drop to Remotion docker-compose
|
||||
|
||||
**Files:**
|
||||
- Modify: `remotion_service/docker-compose.yml`
|
||||
|
||||
- [ ] **Step 1: Update Remotion docker-compose.yml**
|
||||
|
||||
Replace the entire `remotion_service/docker-compose.yml` with:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
remotion:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
target: runner
|
||||
command: >
|
||||
sh -lc "NODE_ENV=development bun install --frozen-lockfile && bun run server"
|
||||
restart: unless-stopped
|
||||
env_file: .env
|
||||
environment:
|
||||
S3_ENDPOINT_URL: http://minio:9000
|
||||
REDIS_URL: redis://redis:6379/0
|
||||
ports:
|
||||
- "127.0.0.1:3001:3001"
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4g
|
||||
cpus: "2"
|
||||
reservations:
|
||||
memory: 1g
|
||||
cpus: "0.5"
|
||||
cap_drop:
|
||||
- ALL
|
||||
cap_add:
|
||||
- SYS_ADMIN
|
||||
volumes:
|
||||
- .:/app:cached
|
||||
- remotion_node_modules:/app/node_modules
|
||||
networks:
|
||||
- backend
|
||||
stdin_open: true
|
||||
tty: true
|
||||
|
||||
volumes:
|
||||
remotion_node_modules:
|
||||
|
||||
networks:
|
||||
backend:
|
||||
external: true
|
||||
name: cofee_backend_default
|
||||
```
|
||||
|
||||
Key changes:
|
||||
- `restart: unless-stopped`
|
||||
- Port bound to `127.0.0.1`
|
||||
- Resource limits: 4GB memory / 2 CPUs (Chromium + FFmpeg need this)
|
||||
- Resource reservations: 1GB / 0.5 CPU (scheduling guarantees)
|
||||
- `cap_drop: ALL` + `cap_add: SYS_ADMIN` (SYS_ADMIN needed for Chromium sandbox)
|
||||
|
||||
- [ ] **Step 2: Validate compose syntax**
|
||||
|
||||
Run: `cd remotion_service && docker compose config > /dev/null`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add remotion_service/docker-compose.yml
|
||||
git commit -m "fix(infra): add resource limits, cap_drop, restart policy to Remotion compose"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Add resource limits and cap_drop to backend docker-compose
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/docker-compose.yml`
|
||||
|
||||
- [ ] **Step 1: Add deploy and cap_drop sections to each service**
|
||||
|
||||
Add to the `db` service after `volumes`:
|
||||
```yaml
|
||||
cap_drop:
|
||||
- ALL
|
||||
cap_add:
|
||||
- CHOWN
|
||||
- DAC_OVERRIDE
|
||||
- FOWNER
|
||||
- SETGID
|
||||
- SETUID
|
||||
```
|
||||
|
||||
Add to the `minio` service after `volumes`:
|
||||
```yaml
|
||||
cap_drop:
|
||||
- ALL
|
||||
cap_add:
|
||||
- CHOWN
|
||||
- DAC_OVERRIDE
|
||||
- FOWNER
|
||||
- SETGID
|
||||
- SETUID
|
||||
```
|
||||
|
||||
Add to the `redis` service after `volumes`:
|
||||
```yaml
|
||||
cap_drop:
|
||||
- ALL
|
||||
```
|
||||
|
||||
Add to the `api` service after `volumes`:
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 512m
|
||||
cpus: "1"
|
||||
cap_drop:
|
||||
- ALL
|
||||
```
|
||||
|
||||
Add to the `worker` service after `volumes`:
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 1g
|
||||
cpus: "1"
|
||||
cap_drop:
|
||||
- ALL
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Validate compose syntax**
|
||||
|
||||
Run: `cd cofee_backend && docker compose config > /dev/null`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/docker-compose.yml
|
||||
git commit -m "fix(infra): add resource limits and capability dropping to backend compose"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 8: Add health check endpoint to backend API
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/cpv3/modules/system/router.py`
|
||||
|
||||
The existing `/api/ping/` only returns a static response. We need a `/api/health/` endpoint that checks DB and Redis connectivity for Docker health checks.
|
||||
|
||||
- [ ] **Step 1: Add health endpoint to system router**
|
||||
|
||||
Replace the contents of `cofee_backend/cpv3/modules/system/router.py` with:
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
from fastapi import APIRouter, Depends
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from cpv3.db.session import get_db
|
||||
from cpv3.infrastructure.settings import get_settings
|
||||
|
||||
router = APIRouter(prefix="/api", tags=["System"])
|
||||
|
||||
_settings = get_settings()
|
||||
|
||||
|
||||
@router.get("/ping/")
|
||||
async def ping() -> dict[str, str]:
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
@router.get("/health/")
|
||||
async def health(db: AsyncSession = Depends(get_db)) -> dict[str, str]:
|
||||
"""Health check for Docker/K8s probes. Verifies DB connectivity."""
|
||||
try:
|
||||
await db.execute(text("SELECT 1"))
|
||||
db_status = "connected"
|
||||
except Exception:
|
||||
db_status = "disconnected"
|
||||
|
||||
status = "ok" if db_status == "connected" else "degraded"
|
||||
return {"status": status, "database": db_status}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run linter**
|
||||
|
||||
Run: `cd cofee_backend && uv run ruff check cpv3/modules/system/router.py`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 3: Run existing tests**
|
||||
|
||||
Run: `cd cofee_backend && uv run pytest -x -q 2>&1 | tail -10`
|
||||
Expected: all tests pass (health endpoint is additive, no breaking changes).
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/cpv3/modules/system/router.py
|
||||
git commit -m "feat(backend): add /api/health/ endpoint for Docker health checks"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 9: Add health check endpoint to Remotion service
|
||||
|
||||
**Files:**
|
||||
- Modify: `remotion_service/server/index.ts`
|
||||
|
||||
- [ ] **Step 1: Add /health endpoint before app.listen**
|
||||
|
||||
Add before the `app.listen(...)` line (around line 138) in `remotion_service/server/index.ts`:
|
||||
|
||||
```typescript
|
||||
app.get("/health", async () => {
|
||||
return { status: "ok" };
|
||||
});
|
||||
```
|
||||
|
||||
Note: This is outside the `/api` prefix since it's at the Elysia instance level. The endpoint will be available at `GET /api/health` because the Elysia instance has `prefix: "/api"`.
|
||||
|
||||
- [ ] **Step 2: Type check**
|
||||
|
||||
Run: `cd remotion_service && bunx tsc --noEmit`
|
||||
Expected: no new errors.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add remotion_service/server/index.ts
|
||||
git commit -m "feat(remotion): add /api/health endpoint for Docker health checks"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 10: Add health checks for api, worker, and remotion in compose files
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/docker-compose.yml`
|
||||
- Modify: `remotion_service/docker-compose.yml`
|
||||
|
||||
- [ ] **Step 1: Add healthcheck to api service**
|
||||
|
||||
Add to `api` service in `cofee_backend/docker-compose.yml` (after `depends_on`):
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health/')"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add healthcheck to worker service**
|
||||
|
||||
The worker has no HTTP port. Use a process check. Add to `worker` service:
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pgrep -f dramatiq || exit 1"]
|
||||
interval: 15s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Add healthcheck to remotion service**
|
||||
|
||||
Add to `remotion` service in `remotion_service/docker-compose.yml` (after `environment`):
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:3001/api/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 15s
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Validate both compose files**
|
||||
|
||||
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
|
||||
git commit -m "feat(infra): add health checks to api, worker, and remotion services"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 11: Add network segmentation to backend compose
|
||||
|
||||
**Files:**
|
||||
- Modify: `cofee_backend/docker-compose.yml`
|
||||
|
||||
Currently all services share one flat network. Separate into `db-net` (data stores) and `app-net` (application services). This prevents Remotion from reaching DB/Redis directly.
|
||||
|
||||
- [ ] **Step 1: Add networks to compose**
|
||||
|
||||
Add at the bottom of `cofee_backend/docker-compose.yml`, replacing the existing `volumes:` section:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
cpv3_db:
|
||||
cpv3_minio:
|
||||
cpv3_redis:
|
||||
|
||||
networks:
|
||||
db-net:
|
||||
driver: bridge
|
||||
app-net:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add network assignments to each service**
|
||||
|
||||
Add to `db`:
|
||||
```yaml
|
||||
networks:
|
||||
- db-net
|
||||
```
|
||||
|
||||
Add to `redis`:
|
||||
```yaml
|
||||
networks:
|
||||
- db-net
|
||||
```
|
||||
|
||||
Add to `minio`:
|
||||
```yaml
|
||||
networks:
|
||||
- db-net
|
||||
- app-net
|
||||
```
|
||||
|
||||
Add to `api`:
|
||||
```yaml
|
||||
networks:
|
||||
- db-net
|
||||
- app-net
|
||||
```
|
||||
|
||||
Add to `worker`:
|
||||
```yaml
|
||||
networks:
|
||||
- db-net
|
||||
- app-net
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update Remotion compose to use app-net**
|
||||
|
||||
In `remotion_service/docker-compose.yml`, change the networks section:
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
backend:
|
||||
external: true
|
||||
name: cofee_backend_app-net
|
||||
```
|
||||
|
||||
This ensures Remotion can reach MinIO and API (on `app-net`) but NOT PostgreSQL or Redis (on `db-net`).
|
||||
|
||||
- [ ] **Step 4: Validate both compose files**
|
||||
|
||||
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 5: Test full stack connectivity**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd cofee_backend && docker compose down && docker compose up -d
|
||||
# Wait for healthy
|
||||
cd ../remotion_service && docker compose down && docker compose up -d
|
||||
```
|
||||
|
||||
Verify API can reach DB, Redis, MinIO. Verify Remotion can reach MinIO but NOT DB.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
|
||||
git commit -m "feat(infra): add network segmentation — db-net and app-net isolation"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 12: Final verification
|
||||
|
||||
- [ ] **Step 1: Bring down everything**
|
||||
|
||||
```bash
|
||||
cd cofee_backend && docker compose down
|
||||
cd ../remotion_service && docker compose down
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Clean build**
|
||||
|
||||
```bash
|
||||
cd cofee_backend && docker compose build --no-cache
|
||||
cd ../remotion_service && docker compose build --no-cache
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Start backend stack**
|
||||
|
||||
```bash
|
||||
cd cofee_backend && docker compose up -d
|
||||
```
|
||||
|
||||
Wait for: `docker compose ps` shows all services healthy.
|
||||
|
||||
- [ ] **Step 4: Start Remotion stack**
|
||||
|
||||
```bash
|
||||
cd remotion_service && docker compose up -d
|
||||
```
|
||||
|
||||
Wait for: `docker compose ps` shows remotion healthy.
|
||||
|
||||
- [ ] **Step 5: Test API health**
|
||||
|
||||
Run: `curl http://127.0.0.1:8000/api/health/`
|
||||
Expected: `{"status":"ok","database":"connected"}`
|
||||
|
||||
- [ ] **Step 6: Test Remotion health**
|
||||
|
||||
Run: `curl http://127.0.0.1:3001/api/health`
|
||||
Expected: `{"status":"ok"}`
|
||||
|
||||
- [ ] **Step 7: Verify port binding**
|
||||
|
||||
Run: `docker compose -f cofee_backend/docker-compose.yml ps --format '{{.Name}} {{.Ports}}'`
|
||||
Expected: all ports show `127.0.0.1:XXXX->YYYY/tcp` (not `0.0.0.0`).
|
||||
|
||||
- [ ] **Step 8: Verify resource limits**
|
||||
|
||||
Run: `docker inspect cpv3_api --format '{{.HostConfig.Memory}}'`
|
||||
Expected: `536870912` (512MB).
|
||||
|
||||
Run: `docker inspect remotion --format '{{.HostConfig.Memory}}'`
|
||||
Expected: `4294967296` (4GB).
|
||||
Reference in New Issue
Block a user