Session 55: K3s load test (10x perf gain), w2 node down, cluster cleanup, version+brotli code pushed
This commit is contained in:
parent
331b4c1517
commit
37461bc9f8
14 changed files with 117 additions and 48 deletions
|
|
@ -1305,3 +1305,30 @@
|
|||
- **Budget:** €181.71 remaining, Revenue: €9
|
||||
- **Open bugs:** 0 CRITICAL, 1 HIGH (BUG-076 node down), 0 MEDIUM, 0 LOW
|
||||
- **Status:** Production operational but HA degraded — single worker node
|
||||
|
||||
## Session 55 — 2026-02-18 18:00 UTC (Evening Session)
|
||||
- **Node situation flipped:** w1 recovered (investor rebooted), but w2 now NotReady/unreachable. HA still degraded — single worker.
|
||||
- **DevOps agent completed:**
|
||||
- Force-deleted all stuck Terminating pods on w2 (cert-manager, CNPG, docfast, coredns)
|
||||
- New pods rescheduled to w1 where topology constraints allow
|
||||
- Pending pods: 1 docfast (topology spread), 1 main-db-2, 1 pooler (anti-affinity)
|
||||
- w2 completely unreachable — needs Hetzner Console reboot
|
||||
- **K3s Load Test completed (production, light load):**
|
||||
- Sequential avg: 0.198s (10x improvement over Docker's ~2.1s)
|
||||
- P95: 0.235s, range 0.176-0.235s
|
||||
- 2 concurrent: ~0.27s each, 100% success
|
||||
- Large payload (104KB, 3 pages): 1.65s
|
||||
- 15-worker pool with plenty of headroom
|
||||
- Finding: staging DB had no tables (schema not migrated after K3s setup)
|
||||
- **Backend dev (version + Brotli):**
|
||||
- Code pushed: commit 170ed44 — version bumped to 0.2.9, shrink-ray-current added for Brotli
|
||||
- CI DID NOT BUILD the image — commit hash image not found in registry
|
||||
- Staging manually reverted to working image (e611609)
|
||||
- TODO: Investigate why CI didn't trigger/build for this commit
|
||||
- **Staging DB issue discovered:** docfast_staging database has no tables — staging is not fully functional
|
||||
- **Support:** Zero open tickets ✅
|
||||
- **Investor Test:** All 5 ✅
|
||||
- **Budget:** €181.71 remaining, Revenue: €9
|
||||
- **Open bugs:** 0 CRITICAL, 1 HIGH (BUG-076 — now w2 down instead of w1)
|
||||
- **Escalation:** w2 reboot needed via Hetzner Console
|
||||
- **New issues found:** Staging DB missing schema, CI pipeline may have failed for latest commit
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
"phaseLabel": "Build Production-Grade Product",
|
||||
"status": "launch-ready",
|
||||
"product": "DocFast \u2014 HTML/Markdown to PDF API",
|
||||
"currentPriority": "k3s-w1 NODE DOWN — running on w2 only. HA degraded. Escalated to investor for Hetzner reboot.",
|
||||
"currentPriority": "k3s-w2 NODE DOWN — running on w1 only. HA degraded. Escalated to investor for Hetzner reboot. Version+Brotli code pushed but CI didn't build image.",
|
||||
"ownerDirectives_PRIORITY": "Process these IN ORDER. Do not skip.",
|
||||
"ownerDirectives": [
|
||||
"Stripe: owner has existing Stripe account from another project \u2014 use same account, just create separate Product + webhook endpoint for DocFast.",
|
||||
|
|
@ -65,9 +65,18 @@
|
|||
"emailDeliveryNote": "MX record fixed 2026-02-17. Postfix + DKIM operational."
|
||||
},
|
||||
"loadTestResults": {
|
||||
"sequential": "~2.1s per PDF, ~28/min",
|
||||
"concurrent": "3 safe, 5th fails at ~16s",
|
||||
"server": "CAX11 (2 vCPU ARM, 4GB RAM), container 512MB cap"
|
||||
"docker_old": {
|
||||
"sequential": "~2.1s per PDF, ~28/min",
|
||||
"concurrent": "3 safe, 5th fails at ~16s",
|
||||
"server": "CAX11 (2 vCPU ARM, 4GB RAM), container 512MB cap"
|
||||
},
|
||||
"k3s_current": {
|
||||
"sequential": "~0.2s avg per PDF (10x improvement)",
|
||||
"p95": "0.235s",
|
||||
"concurrent": "2 concurrent at ~0.27s, 15-worker pool",
|
||||
"largePayload": "1.65s for 104KB/3-page PDF",
|
||||
"server": "K3s cluster, 2x CAX11 workers (1 active due to w2 down)"
|
||||
}
|
||||
},
|
||||
"infrastructure": {
|
||||
"domain": "docfast.dev",
|
||||
|
|
@ -104,10 +113,10 @@
|
|||
},
|
||||
"openBugs": {
|
||||
"CRITICAL": [],
|
||||
"HIGH": ["BUG-076: k3s-w1 node down, HA degraded, needs Hetzner reboot"],
|
||||
"HIGH": ["BUG-076: k3s-w2 node down (was w1, now w2), HA degraded, needs Hetzner reboot"],
|
||||
"MEDIUM": [],
|
||||
"LOW": [],
|
||||
"note": "Session 54: k3s-w1 node down. CNPG failover to main-db-2 worked. Production running on w2 only. HA validated but degraded."
|
||||
"note": "Session 55: w1 recovered, w2 now down. Stuck pods force-deleted. Production on w1 only. K3s load test: ~0.2s avg (10x faster than Docker). Version/Brotli code pushed, CI didn't build."
|
||||
},
|
||||
"blockers": [],
|
||||
"resolvedBlockers": [
|
||||
|
|
@ -120,5 +129,5 @@
|
|||
"Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17"
|
||||
],
|
||||
"startDate": "2026-02-14",
|
||||
"sessionCount": 54
|
||||
"sessionCount": 55
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue