889 lines
22 KiB
Markdown
889 lines
22 KiB
Markdown
# Docker Infrastructure Hardening — Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Harden all Docker infrastructure across the monorepo — security, build optimization, service organization, health checks, and networking.
|
|
|
|
**Architecture:** 4-phase approach: quick config fixes first (no code changes), then Dockerfile improvements, then health endpoints + networking, then resource limits. Each phase produces a working stack.
|
|
|
|
**Tech Stack:** Docker, Docker Compose, FastAPI (Python), ElysiaJS (Bun/TypeScript), PostgreSQL, Redis, MinIO
|
|
|
|
---
|
|
|
|
### Task 1: Add .env to .gitignore files
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/.gitignore`
|
|
- Modify: `cofee_frontend/.gitignore`
|
|
|
|
- [ ] **Step 1: Add .env exclusion to backend .gitignore**
|
|
|
|
Append to `cofee_backend/.gitignore`:
|
|
```
|
|
# Environment
|
|
.env
|
|
.env.*
|
|
```
|
|
|
|
- [ ] **Step 2: Add .env exclusion to frontend .gitignore**
|
|
|
|
The frontend `.gitignore` has `.env*.local` but not `.env` itself. Add before the `# local env files` section in `cofee_frontend/.gitignore`:
|
|
```
|
|
# Environment
|
|
.env
|
|
```
|
|
Note: Keep the existing `.env*.local` line too.
|
|
|
|
- [ ] **Step 3: Verify .env files are not tracked**
|
|
|
|
Run: `git ls-files | grep '\.env'`
|
|
Expected: no output. If any .env files are tracked, run `git rm --cached <file>` for each.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/.gitignore cofee_frontend/.gitignore
|
|
git commit -m "fix(infra): add .env to backend and frontend .gitignore"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 2: Add .env to backend .dockerignore
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/.dockerignore`
|
|
|
|
- [ ] **Step 1: Add .env exclusion**
|
|
|
|
Add to `cofee_backend/.dockerignore`:
|
|
```
|
|
.env
|
|
.env.*
|
|
```
|
|
|
|
- [ ] **Step 2: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/.dockerignore
|
|
git commit -m "fix(infra): exclude .env from backend Docker build context"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3: DRY up docker-compose env vars with YAML anchor
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/docker-compose.yml`
|
|
|
|
The `api` and `worker` services share 14 identical env vars. Extract into an `x-backend-env` anchor. Also adds the missing `JWT_SECRET_KEY` to worker.
|
|
|
|
- [ ] **Step 1: Add x-backend-env anchor and refactor services**
|
|
|
|
Replace the entire `cofee_backend/docker-compose.yml` with:
|
|
|
|
```yaml
|
|
x-backend-image: &backend-image
|
|
image: cpv3-backend:dev
|
|
build:
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
target: dev
|
|
|
|
x-backend-env: &backend-env
|
|
DEBUG: ${DEBUG:-1}
|
|
JWT_SECRET_KEY: ${JWT_SECRET_KEY:-dev-secret}
|
|
|
|
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
|
POSTGRES_HOST: db
|
|
POSTGRES_PORT: 5432
|
|
POSTGRES_DATABASE: ${POSTGRES_DATABASE:-coffee_project_db}
|
|
|
|
STORAGE_BACKEND: ${STORAGE_BACKEND:-S3}
|
|
|
|
S3_ACCESS_KEY: ${MINIO_ROOT_USER:-minioadmin}
|
|
S3_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-minioadmin}
|
|
S3_BUCKET_NAME: ${S3_BUCKET_NAME:-coffee-bucket}
|
|
S3_ENDPOINT_URL_INTERNAL: http://minio:9000
|
|
S3_ENDPOINT_URL_PUBLIC: http://localhost:9000
|
|
|
|
REDIS_URL: redis://redis:6379/0
|
|
WEBHOOK_BASE_URL: http://api:8000
|
|
|
|
REMOTION_SERVICE_URL: ${REMOTION_SERVICE_URL:-http://remotion:3001}
|
|
|
|
services:
|
|
db:
|
|
container_name: cpv3_postgres
|
|
image: postgres:16
|
|
restart: unless-stopped
|
|
environment:
|
|
POSTGRES_USER: ${POSTGRES_USER:-postgres}
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
|
POSTGRES_DB: ${POSTGRES_DATABASE:-coffee_project_db}
|
|
ports:
|
|
- "127.0.0.1:5332:5432"
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-coffee_project_db}"]
|
|
interval: 5s
|
|
timeout: 3s
|
|
retries: 20
|
|
volumes:
|
|
- cpv3_db:/var/lib/postgresql/data
|
|
|
|
minio:
|
|
container_name: cpv3_minio
|
|
image: minio/minio:RELEASE.2024-11-07T00-52-20Z
|
|
restart: unless-stopped
|
|
ports:
|
|
- "127.0.0.1:9000:9000"
|
|
- "127.0.0.1:9001:9001"
|
|
environment:
|
|
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minioadmin}
|
|
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minioadmin}
|
|
command: server /data --console-address ":9001"
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
volumes:
|
|
- cpv3_minio:/data
|
|
|
|
redis:
|
|
container_name: cpv3_redis
|
|
image: redis:7-alpine
|
|
restart: unless-stopped
|
|
ports:
|
|
- "127.0.0.1:6379:6379"
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 5s
|
|
timeout: 3s
|
|
retries: 10
|
|
volumes:
|
|
- cpv3_redis:/data
|
|
|
|
api:
|
|
container_name: cpv3_api
|
|
<<: *backend-image
|
|
restart: unless-stopped
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
environment:
|
|
<<: *backend-env
|
|
ports:
|
|
- "127.0.0.1:8000:8000"
|
|
volumes:
|
|
- ./cpv3:/app/cpv3
|
|
- ./alembic:/app/alembic
|
|
- ./alembic.ini:/app/alembic.ini
|
|
|
|
worker:
|
|
container_name: cpv3_worker
|
|
<<: *backend-image
|
|
restart: unless-stopped
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
environment:
|
|
<<: *backend-env
|
|
command: >
|
|
watchfiles --filter python 'dramatiq cpv3.modules.tasks.service --processes 1 --threads 2' /app/cpv3
|
|
volumes:
|
|
- ./cpv3:/app/cpv3
|
|
|
|
volumes:
|
|
cpv3_db:
|
|
cpv3_minio:
|
|
cpv3_redis:
|
|
```
|
|
|
|
Key changes in this file:
|
|
- `x-backend-env` anchor with all shared env vars (DRY)
|
|
- `JWT_SECRET_KEY` added to worker (was missing)
|
|
- `restart: unless-stopped` on all services
|
|
- All ports bound to `127.0.0.1` (not `0.0.0.0`)
|
|
- MinIO pinned to `RELEASE.2024-11-07T00-52-20Z`
|
|
- MinIO health check added (`curl` on `/minio/health/live`)
|
|
- Removed inline comments for cleanliness
|
|
|
|
- [ ] **Step 2: Validate compose syntax**
|
|
|
|
Run: `cd cofee_backend && docker compose config > /dev/null`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 3: Test stack starts**
|
|
|
|
Run: `cd cofee_backend && docker compose up -d`
|
|
Wait 30s, then: `docker compose ps`
|
|
Expected: all services `Up` or `Up (healthy)`.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/docker-compose.yml
|
|
git commit -m "refactor(infra): DRY env vars, pin images, bind localhost, add restart policies"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 4: Move build-essential out of base stage in backend Dockerfile
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/Dockerfile`
|
|
|
|
`build-essential` is only needed during `uv sync` (compiling C extensions). Moving it from `base` to `deps` saves ~200MB in the prod image since the `prod` stage inherits from `deps` but the compiled artifacts are in `.venv`, not the system packages.
|
|
|
|
- [ ] **Step 1: Restructure Dockerfile stages**
|
|
|
|
Replace the entire `cofee_backend/Dockerfile` with:
|
|
|
|
```dockerfile
|
|
# syntax=docker/dockerfile:1.7
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Stage 1: base — minimal runtime dependencies (shared by dev and prod)
|
|
# ---------------------------------------------------------------------------
|
|
FROM python:3.11-slim AS base
|
|
|
|
COPY --from=ghcr.io/astral-sh/uv:0.8.15 /uv /uvx /bin/
|
|
|
|
ENV PYTHONDONTWRITEBYTECODE=1 \
|
|
PYTHONUNBUFFERED=1 \
|
|
PATH="/app/.venv/bin:${PATH}"
|
|
|
|
WORKDIR /app
|
|
|
|
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
|
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
|
apt-get update && apt-get install -y --no-install-recommends \
|
|
ffmpeg \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Stage 2: deps — install Python dependencies (build-essential here only)
|
|
# ---------------------------------------------------------------------------
|
|
FROM base AS deps
|
|
|
|
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
|
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
|
apt-get update && apt-get install -y --no-install-recommends \
|
|
build-essential \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
COPY pyproject.toml uv.lock ./
|
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
|
uv sync --frozen --no-dev --no-install-project
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Stage 3: dev — development target (used by docker-compose)
|
|
# ---------------------------------------------------------------------------
|
|
FROM deps AS dev
|
|
|
|
ENV PYTHONPATH=/app
|
|
|
|
EXPOSE 8000
|
|
|
|
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000 --reload --reload-dir /app/cpv3"]
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Stage 4: prod — production target (no build-essential, non-root user)
|
|
# ---------------------------------------------------------------------------
|
|
FROM base AS prod
|
|
|
|
RUN groupadd --gid 1000 app && \
|
|
useradd --uid 1000 --gid app --create-home app
|
|
|
|
COPY --from=deps /app/.venv /app/.venv
|
|
COPY pyproject.toml uv.lock ./
|
|
|
|
ENV UV_LINK_MODE=copy
|
|
|
|
COPY cpv3 ./cpv3
|
|
COPY alembic ./alembic
|
|
COPY alembic.ini ./
|
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
|
uv sync --frozen --no-dev
|
|
|
|
RUN chown -R app:app /app
|
|
USER app
|
|
|
|
EXPOSE 8000
|
|
|
|
CMD ["sh", "-c", "alembic upgrade head && uvicorn cpv3.main:app --host 0.0.0.0 --port 8000"]
|
|
```
|
|
|
|
Key changes:
|
|
- `build-essential` moved from `base` to `deps` — prod image is ~200MB smaller
|
|
- `prod` stage inherits from `base` (not `deps`) — no compiler in production
|
|
- `prod` copies only `.venv` from `deps` stage — gets compiled packages without build tools
|
|
- Non-root `app` user (uid 1000) added to `prod` stage
|
|
- `dev` stage still inherits from `deps` (has build-essential for potential ad-hoc installs)
|
|
|
|
- [ ] **Step 2: Build and verify prod stage**
|
|
|
|
Run: `cd cofee_backend && docker build --target prod -t cpv3-backend:prod-test .`
|
|
Expected: builds successfully.
|
|
|
|
- [ ] **Step 3: Build and verify dev stage**
|
|
|
|
Run: `cd cofee_backend && docker build --target dev -t cpv3-backend:dev-test .`
|
|
Expected: builds successfully.
|
|
|
|
- [ ] **Step 4: Verify dev stack still works**
|
|
|
|
Run: `cd cofee_backend && docker compose up -d --build`
|
|
Wait 30s, then: `docker compose ps`
|
|
Expected: all services running.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/Dockerfile
|
|
git commit -m "perf(infra): move build-essential to deps stage, add non-root user to prod"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 5: Add BuildKit cache mounts and non-root user to Remotion Dockerfile
|
|
|
|
**Files:**
|
|
- Modify: `remotion_service/Dockerfile`
|
|
|
|
- [ ] **Step 1: Update Remotion Dockerfile**
|
|
|
|
Replace the entire `remotion_service/Dockerfile` with:
|
|
|
|
```dockerfile
|
|
# syntax=docker/dockerfile:1.7-labs
|
|
FROM oven/bun:1.3.10 AS base
|
|
|
|
ENV APP_HOME=/app \
|
|
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 \
|
|
REMOTION_PUPPETEER_NO_SANDBOX=1 \
|
|
NODE_ENV=production
|
|
|
|
WORKDIR ${APP_HOME}
|
|
|
|
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
|
--mount=type=cache,target=/var/lib/apt,sharing=locked \
|
|
apt-get update && \
|
|
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
|
|
ca-certificates \
|
|
ffmpeg \
|
|
chromium \
|
|
libglib2.0-0 \
|
|
libnss3 \
|
|
libatk1.0-0 \
|
|
libatk-bridge2.0-0 \
|
|
libdrm2 \
|
|
libxkbcommon0 \
|
|
libgbm1 \
|
|
fonts-noto-color-emoji \
|
|
curl \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
FROM base AS deps
|
|
WORKDIR ${APP_HOME}
|
|
COPY package.json bun.lock ./
|
|
RUN NODE_ENV=development bun install --frozen-lockfile
|
|
|
|
FROM base AS runner
|
|
WORKDIR ${APP_HOME}
|
|
|
|
RUN groupadd --gid 1000 app && \
|
|
useradd --uid 1000 --gid app --create-home app
|
|
|
|
COPY --from=deps ${APP_HOME}/node_modules ./node_modules
|
|
COPY package.json bun.lock ./
|
|
COPY tsconfig.json remotion.config.ts ./
|
|
COPY public ./public
|
|
COPY src ./src
|
|
COPY server ./server
|
|
|
|
RUN mkdir -p out && chown -R app:app /app
|
|
|
|
USER app
|
|
|
|
EXPOSE 3001
|
|
|
|
CMD ["bun", "run", "server"]
|
|
```
|
|
|
|
Key changes:
|
|
- BuildKit apt cache mounts added (matches backend pattern)
|
|
- Non-root `app` user (uid 1000) in runner stage
|
|
- `chown` before `USER app` so the app owns all files including `out/`
|
|
|
|
- [ ] **Step 2: Build and verify**
|
|
|
|
Run: `cd remotion_service && docker build --target runner -t remotion:test .`
|
|
Expected: builds successfully.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add remotion_service/Dockerfile
|
|
git commit -m "perf(infra): add BuildKit cache mounts and non-root user to Remotion Dockerfile"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 6: Add resource limits and cap_drop to Remotion docker-compose
|
|
|
|
**Files:**
|
|
- Modify: `remotion_service/docker-compose.yml`
|
|
|
|
- [ ] **Step 1: Update Remotion docker-compose.yml**
|
|
|
|
Replace the entire `remotion_service/docker-compose.yml` with:
|
|
|
|
```yaml
|
|
services:
|
|
remotion:
|
|
build:
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
target: runner
|
|
command: >
|
|
sh -lc "NODE_ENV=development bun install --frozen-lockfile && bun run server"
|
|
restart: unless-stopped
|
|
env_file: .env
|
|
environment:
|
|
S3_ENDPOINT_URL: http://minio:9000
|
|
REDIS_URL: redis://redis:6379/0
|
|
ports:
|
|
- "127.0.0.1:3001:3001"
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 4g
|
|
cpus: "2"
|
|
reservations:
|
|
memory: 1g
|
|
cpus: "0.5"
|
|
cap_drop:
|
|
- ALL
|
|
cap_add:
|
|
- SYS_ADMIN
|
|
volumes:
|
|
- .:/app:cached
|
|
- remotion_node_modules:/app/node_modules
|
|
networks:
|
|
- backend
|
|
stdin_open: true
|
|
tty: true
|
|
|
|
volumes:
|
|
remotion_node_modules:
|
|
|
|
networks:
|
|
backend:
|
|
external: true
|
|
name: cofee_backend_default
|
|
```
|
|
|
|
Key changes:
|
|
- `restart: unless-stopped`
|
|
- Port bound to `127.0.0.1`
|
|
- Resource limits: 4GB memory / 2 CPUs (Chromium + FFmpeg need this)
|
|
- Resource reservations: 1GB / 0.5 CPU (scheduling guarantees)
|
|
- `cap_drop: ALL` + `cap_add: SYS_ADMIN` (SYS_ADMIN needed for Chromium sandbox)
|
|
|
|
- [ ] **Step 2: Validate compose syntax**
|
|
|
|
Run: `cd remotion_service && docker compose config > /dev/null`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add remotion_service/docker-compose.yml
|
|
git commit -m "fix(infra): add resource limits, cap_drop, restart policy to Remotion compose"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 7: Add resource limits and cap_drop to backend docker-compose
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/docker-compose.yml`
|
|
|
|
- [ ] **Step 1: Add deploy and cap_drop sections to each service**
|
|
|
|
Add to the `db` service after `volumes`:
|
|
```yaml
|
|
cap_drop:
|
|
- ALL
|
|
cap_add:
|
|
- CHOWN
|
|
- DAC_OVERRIDE
|
|
- FOWNER
|
|
- SETGID
|
|
- SETUID
|
|
```
|
|
|
|
Add to the `minio` service after `volumes`:
|
|
```yaml
|
|
cap_drop:
|
|
- ALL
|
|
cap_add:
|
|
- CHOWN
|
|
- DAC_OVERRIDE
|
|
- FOWNER
|
|
- SETGID
|
|
- SETUID
|
|
```
|
|
|
|
Add to the `redis` service after `volumes`:
|
|
```yaml
|
|
cap_drop:
|
|
- ALL
|
|
```
|
|
|
|
Add to the `api` service after `volumes`:
|
|
```yaml
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 512m
|
|
cpus: "1"
|
|
cap_drop:
|
|
- ALL
|
|
```
|
|
|
|
Add to the `worker` service after `volumes`:
|
|
```yaml
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 1g
|
|
cpus: "1"
|
|
cap_drop:
|
|
- ALL
|
|
```
|
|
|
|
- [ ] **Step 2: Validate compose syntax**
|
|
|
|
Run: `cd cofee_backend && docker compose config > /dev/null`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/docker-compose.yml
|
|
git commit -m "fix(infra): add resource limits and capability dropping to backend compose"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 8: Add health check endpoint to backend API
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/cpv3/modules/system/router.py`
|
|
|
|
The existing `/api/ping/` only returns a static response. We need a `/api/health/` endpoint that checks DB and Redis connectivity for Docker health checks.
|
|
|
|
- [ ] **Step 1: Add health endpoint to system router**
|
|
|
|
Replace the contents of `cofee_backend/cpv3/modules/system/router.py` with:
|
|
|
|
```python
|
|
from __future__ import annotations
|
|
|
|
from fastapi import APIRouter, Depends
|
|
from sqlalchemy import text
|
|
from sqlalchemy.ext.asyncio import AsyncSession
|
|
|
|
from cpv3.db.session import get_db
|
|
from cpv3.infrastructure.settings import get_settings
|
|
|
|
router = APIRouter(prefix="/api", tags=["System"])
|
|
|
|
_settings = get_settings()
|
|
|
|
|
|
@router.get("/ping/")
|
|
async def ping() -> dict[str, str]:
|
|
return {"status": "ok"}
|
|
|
|
|
|
@router.get("/health/")
|
|
async def health(db: AsyncSession = Depends(get_db)) -> dict[str, str]:
|
|
"""Health check for Docker/K8s probes. Verifies DB connectivity."""
|
|
try:
|
|
await db.execute(text("SELECT 1"))
|
|
db_status = "connected"
|
|
except Exception:
|
|
db_status = "disconnected"
|
|
|
|
status = "ok" if db_status == "connected" else "degraded"
|
|
return {"status": status, "database": db_status}
|
|
```
|
|
|
|
- [ ] **Step 2: Run linter**
|
|
|
|
Run: `cd cofee_backend && uv run ruff check cpv3/modules/system/router.py`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 3: Run existing tests**
|
|
|
|
Run: `cd cofee_backend && uv run pytest -x -q 2>&1 | tail -10`
|
|
Expected: all tests pass (health endpoint is additive, no breaking changes).
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/cpv3/modules/system/router.py
|
|
git commit -m "feat(backend): add /api/health/ endpoint for Docker health checks"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 9: Add health check endpoint to Remotion service
|
|
|
|
**Files:**
|
|
- Modify: `remotion_service/server/index.ts`
|
|
|
|
- [ ] **Step 1: Add /health endpoint before app.listen**
|
|
|
|
Add before the `app.listen(...)` line (around line 138) in `remotion_service/server/index.ts`:
|
|
|
|
```typescript
|
|
app.get("/health", async () => {
|
|
return { status: "ok" };
|
|
});
|
|
```
|
|
|
|
Note: This is outside the `/api` prefix since it's at the Elysia instance level. The endpoint will be available at `GET /api/health` because the Elysia instance has `prefix: "/api"`.
|
|
|
|
- [ ] **Step 2: Type check**
|
|
|
|
Run: `cd remotion_service && bunx tsc --noEmit`
|
|
Expected: no new errors.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add remotion_service/server/index.ts
|
|
git commit -m "feat(remotion): add /api/health endpoint for Docker health checks"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 10: Add health checks for api, worker, and remotion in compose files
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/docker-compose.yml`
|
|
- Modify: `remotion_service/docker-compose.yml`
|
|
|
|
- [ ] **Step 1: Add healthcheck to api service**
|
|
|
|
Add to `api` service in `cofee_backend/docker-compose.yml` (after `depends_on`):
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health/')"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 30s
|
|
```
|
|
|
|
- [ ] **Step 2: Add healthcheck to worker service**
|
|
|
|
The worker has no HTTP port. Use a process check. Add to `worker` service:
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pgrep -f dramatiq || exit 1"]
|
|
interval: 15s
|
|
timeout: 5s
|
|
retries: 3
|
|
```
|
|
|
|
- [ ] **Step 3: Add healthcheck to remotion service**
|
|
|
|
Add to `remotion` service in `remotion_service/docker-compose.yml` (after `environment`):
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:3001/api/health"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 15s
|
|
```
|
|
|
|
- [ ] **Step 4: Validate both compose files**
|
|
|
|
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
|
|
git commit -m "feat(infra): add health checks to api, worker, and remotion services"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 11: Add network segmentation to backend compose
|
|
|
|
**Files:**
|
|
- Modify: `cofee_backend/docker-compose.yml`
|
|
|
|
Currently all services share one flat network. Separate into `db-net` (data stores) and `app-net` (application services). This prevents Remotion from reaching DB/Redis directly.
|
|
|
|
- [ ] **Step 1: Add networks to compose**
|
|
|
|
Add at the bottom of `cofee_backend/docker-compose.yml`, replacing the existing `volumes:` section:
|
|
|
|
```yaml
|
|
volumes:
|
|
cpv3_db:
|
|
cpv3_minio:
|
|
cpv3_redis:
|
|
|
|
networks:
|
|
db-net:
|
|
driver: bridge
|
|
app-net:
|
|
driver: bridge
|
|
```
|
|
|
|
- [ ] **Step 2: Add network assignments to each service**
|
|
|
|
Add to `db`:
|
|
```yaml
|
|
networks:
|
|
- db-net
|
|
```
|
|
|
|
Add to `redis`:
|
|
```yaml
|
|
networks:
|
|
- db-net
|
|
```
|
|
|
|
Add to `minio`:
|
|
```yaml
|
|
networks:
|
|
- db-net
|
|
- app-net
|
|
```
|
|
|
|
Add to `api`:
|
|
```yaml
|
|
networks:
|
|
- db-net
|
|
- app-net
|
|
```
|
|
|
|
Add to `worker`:
|
|
```yaml
|
|
networks:
|
|
- db-net
|
|
- app-net
|
|
```
|
|
|
|
- [ ] **Step 3: Update Remotion compose to use app-net**
|
|
|
|
In `remotion_service/docker-compose.yml`, change the networks section:
|
|
|
|
```yaml
|
|
networks:
|
|
backend:
|
|
external: true
|
|
name: cofee_backend_app-net
|
|
```
|
|
|
|
This ensures Remotion can reach MinIO and API (on `app-net`) but NOT PostgreSQL or Redis (on `db-net`).
|
|
|
|
- [ ] **Step 4: Validate both compose files**
|
|
|
|
Run: `cd cofee_backend && docker compose config > /dev/null && cd ../remotion_service && docker compose config > /dev/null`
|
|
Expected: no errors.
|
|
|
|
- [ ] **Step 5: Test full stack connectivity**
|
|
|
|
Run:
|
|
```bash
|
|
cd cofee_backend && docker compose down && docker compose up -d
|
|
# Wait for healthy
|
|
cd ../remotion_service && docker compose down && docker compose up -d
|
|
```
|
|
|
|
Verify API can reach DB, Redis, MinIO. Verify Remotion can reach MinIO but NOT DB.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add cofee_backend/docker-compose.yml remotion_service/docker-compose.yml
|
|
git commit -m "feat(infra): add network segmentation — db-net and app-net isolation"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 12: Final verification
|
|
|
|
- [ ] **Step 1: Bring down everything**
|
|
|
|
```bash
|
|
cd cofee_backend && docker compose down
|
|
cd ../remotion_service && docker compose down
|
|
```
|
|
|
|
- [ ] **Step 2: Clean build**
|
|
|
|
```bash
|
|
cd cofee_backend && docker compose build --no-cache
|
|
cd ../remotion_service && docker compose build --no-cache
|
|
```
|
|
|
|
- [ ] **Step 3: Start backend stack**
|
|
|
|
```bash
|
|
cd cofee_backend && docker compose up -d
|
|
```
|
|
|
|
Wait for: `docker compose ps` shows all services healthy.
|
|
|
|
- [ ] **Step 4: Start Remotion stack**
|
|
|
|
```bash
|
|
cd remotion_service && docker compose up -d
|
|
```
|
|
|
|
Wait for: `docker compose ps` shows remotion healthy.
|
|
|
|
- [ ] **Step 5: Test API health**
|
|
|
|
Run: `curl http://127.0.0.1:8000/api/health/`
|
|
Expected: `{"status":"ok","database":"connected"}`
|
|
|
|
- [ ] **Step 6: Test Remotion health**
|
|
|
|
Run: `curl http://127.0.0.1:3001/api/health`
|
|
Expected: `{"status":"ok"}`
|
|
|
|
- [ ] **Step 7: Verify port binding**
|
|
|
|
Run: `docker compose -f cofee_backend/docker-compose.yml ps --format '{{.Name}} {{.Ports}}'`
|
|
Expected: all ports show `127.0.0.1:XXXX->YYYY/tcp` (not `0.0.0.0`).
|
|
|
|
- [ ] **Step 8: Verify resource limits**
|
|
|
|
Run: `docker inspect cpv3_api --format '{{.HostConfig.Memory}}'`
|
|
Expected: `536870912` (512MB).
|
|
|
|
Run: `docker inspect remotion --format '{{.HostConfig.Memory}}'`
|
|
Expected: `4294967296` (4GB).
|