Session 55: K3s load test (10x perf gain), w2 node down, cluster cleanup, version+brotli code pushed

This commit is contained in:
Hoid 2026-02-18 18:14:14 +00:00
parent 331b4c1517
commit 37461bc9f8
14 changed files with 117 additions and 48 deletions

View file

@ -1305,3 +1305,30 @@
- **Budget:** €181.71 remaining, Revenue: €9
- **Open bugs:** 0 CRITICAL, 1 HIGH (BUG-076 node down), 0 MEDIUM, 0 LOW
- **Status:** Production operational but HA degraded — single worker node
## Session 55 — 2026-02-18 18:00 UTC (Evening Session)
- **Node situation flipped:** w1 recovered (investor rebooted), but w2 now NotReady/unreachable. HA still degraded — single worker.
- **DevOps agent completed:**
- Force-deleted all stuck Terminating pods on w2 (cert-manager, CNPG, docfast, coredns)
- New pods rescheduled to w1 where topology constraints allow
- Pending pods: 1 docfast (topology spread), 1 main-db-2, 1 pooler (anti-affinity)
- w2 completely unreachable — needs Hetzner Console reboot
- **K3s Load Test completed (production, light load):**
- Sequential avg: 0.198s (10x improvement over Docker's ~2.1s)
- P95: 0.235s, range 0.176-0.235s
- 2 concurrent: ~0.27s each, 100% success
- Large payload (104KB, 3 pages): 1.65s
- 15-worker pool with plenty of headroom
- Finding: staging DB had no tables (schema not migrated after K3s setup)
- **Backend dev (version + Brotli):**
- Code pushed: commit 170ed44 — version bumped to 0.2.9, shrink-ray-current added for Brotli
- CI DID NOT BUILD the image — commit hash image not found in registry
- Staging manually reverted to working image (e611609)
- TODO: Investigate why CI didn't trigger/build for this commit
- **Staging DB issue discovered:** docfast_staging database has no tables — staging is not fully functional
- **Support:** Zero open tickets ✅
- **Investor Test:** All 5 ✅
- **Budget:** €181.71 remaining, Revenue: €9
- **Open bugs:** 0 CRITICAL, 1 HIGH (BUG-076 — now w2 down instead of w1)
- **Escalation:** w2 reboot needed via Hetzner Console
- **New issues found:** Staging DB missing schema, CI pipeline may have failed for latest commit

View file

@ -3,7 +3,7 @@
"phaseLabel": "Build Production-Grade Product",
"status": "launch-ready",
"product": "DocFast \u2014 HTML/Markdown to PDF API",
"currentPriority": "k3s-w1 NODE DOWN — running on w2 only. HA degraded. Escalated to investor for Hetzner reboot.",
"currentPriority": "k3s-w2 NODE DOWN — running on w1 only. HA degraded. Escalated to investor for Hetzner reboot. Version+Brotli code pushed but CI didn't build image.",
"ownerDirectives_PRIORITY": "Process these IN ORDER. Do not skip.",
"ownerDirectives": [
"Stripe: owner has existing Stripe account from another project \u2014 use same account, just create separate Product + webhook endpoint for DocFast.",
@ -65,9 +65,18 @@
"emailDeliveryNote": "MX record fixed 2026-02-17. Postfix + DKIM operational."
},
"loadTestResults": {
"sequential": "~2.1s per PDF, ~28/min",
"concurrent": "3 safe, 5th fails at ~16s",
"server": "CAX11 (2 vCPU ARM, 4GB RAM), container 512MB cap"
"docker_old": {
"sequential": "~2.1s per PDF, ~28/min",
"concurrent": "3 safe, 5th fails at ~16s",
"server": "CAX11 (2 vCPU ARM, 4GB RAM), container 512MB cap"
},
"k3s_current": {
"sequential": "~0.2s avg per PDF (10x improvement)",
"p95": "0.235s",
"concurrent": "2 concurrent at ~0.27s, 15-worker pool",
"largePayload": "1.65s for 104KB/3-page PDF",
"server": "K3s cluster, 2x CAX11 workers (1 active due to w2 down)"
}
},
"infrastructure": {
"domain": "docfast.dev",
@ -104,10 +113,10 @@
},
"openBugs": {
"CRITICAL": [],
"HIGH": ["BUG-076: k3s-w1 node down, HA degraded, needs Hetzner reboot"],
"HIGH": ["BUG-076: k3s-w2 node down (was w1, now w2), HA degraded, needs Hetzner reboot"],
"MEDIUM": [],
"LOW": [],
"note": "Session 54: k3s-w1 node down. CNPG failover to main-db-2 worked. Production running on w2 only. HA validated but degraded."
"note": "Session 55: w1 recovered, w2 now down. Stuck pods force-deleted. Production on w1 only. K3s load test: ~0.2s avg (10x faster than Docker). Version/Brotli code pushed, CI didn't build."
},
"blockers": [],
"resolvedBlockers": [
@ -120,5 +129,5 @@
"Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17"
],
"startDate": "2026-02-14",
"sessionCount": 54
"sessionCount": 55
}