memory: K3s migration day notes

This commit is contained in:
Hoid 2026-02-18 13:47:36 +00:00
parent 837832c2d5
commit 07db17080f

View file

@ -151,3 +151,42 @@
- Targets: k3s-w1 (121365839) + k3s-w2 (121365840) — both healthy
- Services: TCP 80→80, TCP 443→443 with health checks
- Total infra cost: €11.67 (3x CAX11) + €5.39 (LB) = €17.06/mo
## K3s Migration — Completed (with issues)
### What's Done
- K3s cluster fully operational: 3 nodes, CNPG PostgreSQL 17.4, PgBouncer, Traefik, cert-manager
- Hetzner LB (46.225.37.135, ID 5834131) fronting both workers
- TLS cert auto-issued via Let's Encrypt after DNS propagation (~40 min TTL)
- DocFast deployed: 2 prod replicas, 1 staging replica
- Staged CI/CD: push to main→staging, git tag v*→prod
- SMTP fixed: nodemailer now uses env vars (SMTP_HOST/PORT/USER/PASS/FROM)
- SMTP credentials added to K8s secrets in both namespaces
- CEO skill updated with full K3s infrastructure knowledge
- Deployer SA with scoped RBAC (watch permission added after first CI failure)
- Forgejo container registry working (needed PAT, GITHUB_TOKEN doesn't have package access in Forgejo)
### Key Issues Found During Migration
- `ctr -n k3s.io images import` doesn't work for K3s — must use `k3s ctr images import --all-platforms`
- CNPG `tolerations: []` caused infinite pod restart loop — remove field entirely
- DB table ownership must be set after pg_restore with --no-owner
- Traefik auto-redirects HTTP→HTTPS when TLS is in ingress spec — broke cert-manager ACME challenge
- Fix: add `traefik.ingress.kubernetes.io/router.entrypoints: web,websecure` annotation
- cert-manager self-check resolves DNS externally — had to wait for DNS propagation
- Forgejo GITHUB_TOKEN lacks package write scope (known issue #3571) — need dedicated PAT
- Deployer SA needed `watch` verb for `kubectl rollout status`
### Open Issues (CEO investigating)
- API returning "Invalid API key" for all requests — likely DATABASE_URL or DB connection issue
- Docs "Try it" page broken — CSP fix applied but may be deeper issue
- CEO session 55 dispatched on Sonnet to investigate
### Infrastructure Costs
- 3x CAX11: €11.67/mo
- Hetzner LB (lb11): €5.39/mo
- Total K3s infra: €17.06/mo
### Files
- CI workflow: `.forgejo/workflows/deploy.yml` (staging), `promote.yml` (prod)
- Deployer kubeconfig: `/tmp/deployer-kubeconfig.yaml` (on k3s-mgr)
- k3s-token removed from .credentials (lives in K8s secrets + Forgejo CI)