Session 54: k3s-w1 node down, HA working, escalated
This commit is contained in:
parent
5c9f55d2db
commit
331b4c1517
4 changed files with 41 additions and 49 deletions
|
|
@ -24,11 +24,17 @@
|
||||||
## BUG-073: Staging Landing Page Shows Wrong Pro Plan Quota (2,500 vs 5,000)
|
## BUG-073: Staging Landing Page Shows Wrong Pro Plan Quota (2,500 vs 5,000)
|
||||||
- **Date:** 2026-02-18 13:05 UTC
|
- **Date:** 2026-02-18 13:05 UTC
|
||||||
- **Severity:** MEDIUM
|
- **Severity:** MEDIUM
|
||||||
- **Environment:** Staging (https://staging.docfast.dev)
|
- **Issue:** Landing page showed "2,500" but Stripe said "5,000". Mismatch.
|
||||||
- **Issue:** Staging landing page shows Pro plan as "2,500 PDFs per month" but production also shows "2,500 PDFs per month". Previous bugs (BUG-045, BUG-057) referenced 5,000 and 10,000 PDFs. The Stripe checkout page says "5,000 PDF conversions per month". There is a mismatch between what the landing page advertises (2,500) and what Stripe checkout says (5,000).
|
- **Fix:** Landing page + JSON-LD updated to 5,000. Tagged v0.2.4.
|
||||||
- **Impact:** Customer confusion — they see 2,500 on the pricing page but 5,000 on the checkout page
|
- **Status:** ✅ FIXED (Session 53)
|
||||||
- **Fix:** Align landing page and Stripe product description to the same number
|
|
||||||
- **Status:** OPEN
|
## BUG-076: k3s-w1 Node Down — Complete Network Unreachability
|
||||||
|
- **Date:** 2026-02-18 16:00 UTC
|
||||||
|
- **Severity:** HIGH (degraded HA, not outage)
|
||||||
|
- **Issue:** k3s-w1 (159.69.23.121) completely unreachable — 100% packet loss from both external and private network (k3s-mgr). Node shows NotReady in K8s. CNPG failover triggered: primary moved to main-db-2 on w2. Production running on single node (w2 only).
|
||||||
|
- **Impact:** HA is degraded — running on 1 worker. If w2 also fails, full outage. No data loss (DB failover worked).
|
||||||
|
- **Requires:** Investor to reboot k3s-w1 via Hetzner Console (CEO's API token doesn't have access to K3s project).
|
||||||
|
- **Status:** OPEN — escalated to investor
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1285,3 +1285,23 @@
|
||||||
- **Budget:** €181.71 remaining, Revenue: €9
|
- **Budget:** €181.71 remaining, Revenue: €9
|
||||||
- **Open bugs:** ZERO
|
- **Open bugs:** ZERO
|
||||||
- **Status:** LAUNCH-READY — K3s migration verified, all post-migration issues resolved
|
- **Status:** LAUNCH-READY — K3s migration verified, all post-migration issues resolved
|
||||||
|
|
||||||
|
## Session 54 — 2026-02-18 16:00 UTC (Late Afternoon Session)
|
||||||
|
- **k3s-w1 NODE DOWN (BUG-076 HIGH):**
|
||||||
|
- Discovery: k3s-w1 (159.69.23.121) completely unreachable — 100% packet loss from external AND private network
|
||||||
|
- K8s status: NotReady, CNPG auto-failover triggered (primary → main-db-2 on w2)
|
||||||
|
- Production: Running on w2 only (1 pod serving traffic, ~100ms response times)
|
||||||
|
- HA validation: Failover worked perfectly — zero downtime, DB switched primaries, traffic routed to w2
|
||||||
|
- Cannot reboot: CEO's Hetzner API token only covers old docfast-1 project, not K3s cluster
|
||||||
|
- **Escalated to investor:** Need Hetzner Console reboot of k3s-w1
|
||||||
|
- **Support check:** Zero open tickets ✅
|
||||||
|
- **Production health:** 5/5 health checks passed, all ~100ms, DB connected (PostgreSQL 17.4)
|
||||||
|
- **Investor Test:**
|
||||||
|
1. Trust with money? ✅ Yes (working, fast)
|
||||||
|
2. Data loss on crash? ✅ No (CNPG replication + MinIO backups)
|
||||||
|
3. Free tier abuse? ✅ Rate limited + usage enforced
|
||||||
|
4. Lost key recovery? ✅ Yes
|
||||||
|
5. Features match website? ✅ Yes
|
||||||
|
- **Budget:** €181.71 remaining, Revenue: €9
|
||||||
|
- **Open bugs:** 0 CRITICAL, 1 HIGH (BUG-076 node down), 0 MEDIUM, 0 LOW
|
||||||
|
- **Status:** Production operational but HA degraded — single worker node
|
||||||
|
|
|
||||||
|
|
@ -3,7 +3,7 @@
|
||||||
"phaseLabel": "Build Production-Grade Product",
|
"phaseLabel": "Build Production-Grade Product",
|
||||||
"status": "launch-ready",
|
"status": "launch-ready",
|
||||||
"product": "DocFast \u2014 HTML/Markdown to PDF API",
|
"product": "DocFast \u2014 HTML/Markdown to PDF API",
|
||||||
"currentPriority": "K3s migration verified. All post-migration issues resolved. Zero open bugs. Launch-ready.",
|
"currentPriority": "k3s-w1 NODE DOWN — running on w2 only. HA degraded. Escalated to investor for Hetzner reboot.",
|
||||||
"ownerDirectives_PRIORITY": "Process these IN ORDER. Do not skip.",
|
"ownerDirectives_PRIORITY": "Process these IN ORDER. Do not skip.",
|
||||||
"ownerDirectives": [
|
"ownerDirectives": [
|
||||||
"Stripe: owner has existing Stripe account from another project \u2014 use same account, just create separate Product + webhook endpoint for DocFast.",
|
"Stripe: owner has existing Stripe account from another project \u2014 use same account, just create separate Product + webhook endpoint for DocFast.",
|
||||||
|
|
@ -104,10 +104,10 @@
|
||||||
},
|
},
|
||||||
"openBugs": {
|
"openBugs": {
|
||||||
"CRITICAL": [],
|
"CRITICAL": [],
|
||||||
"HIGH": [],
|
"HIGH": ["BUG-076: k3s-w1 node down, HA degraded, needs Hetzner reboot"],
|
||||||
"MEDIUM": [],
|
"MEDIUM": [],
|
||||||
"LOW": [],
|
"LOW": [],
|
||||||
"note": "Session 53: BUG-074 CRITICAL (email broken on K3s) fixed. BUG-073 MEDIUM (quota mismatch) fixed. CNPG backups configured with MinIO. Old Docker server decommissioned. ZERO open bugs."
|
"note": "Session 54: k3s-w1 node down. CNPG failover to main-db-2 worked. Production running on w2 only. HA validated but degraded."
|
||||||
},
|
},
|
||||||
"blockers": [],
|
"blockers": [],
|
||||||
"resolvedBlockers": [
|
"resolvedBlockers": [
|
||||||
|
|
@ -120,5 +120,5 @@
|
||||||
"Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17"
|
"Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17"
|
||||||
],
|
],
|
||||||
"startDate": "2026-02-14",
|
"startDate": "2026-02-14",
|
||||||
"sessionCount": 53
|
"sessionCount": 54
|
||||||
}
|
}
|
||||||
|
|
@ -1,45 +1,11 @@
|
||||||
# DocFast Support Log
|
# DocFast Support Log
|
||||||
|
|
||||||
## 2026-02-16 20:17 UTC
|
## 2026-02-18 16:00 UTC
|
||||||
|
|
||||||
**Ticket #369** - Lost API key
|
**Tickets Checked:**
|
||||||
- Customer: dominik@superbros.tv
|
- All tickets: 0 found
|
||||||
- Issue: Lost API key recovery
|
- Pending tickets: 0 found
|
||||||
- Action: Replied with key recovery instructions (POST /v1/recover endpoint)
|
|
||||||
- Status: Resolved with self-service solution
|
|
||||||
## 2026-02-16 20:21 UTC — Ticket #369
|
|
||||||
- **Customer:** dominik@superbros.tv
|
|
||||||
- **Subject:** Lost API key
|
|
||||||
- **Action:** Replied with self-service recovery instructions (website link + API endpoint)
|
|
||||||
- **Status:** Replied, awaiting customer confirmation
|
|
||||||
|
|
||||||
## 2026-02-16 20:24 UTC
|
**Status:** ✅ No open support tickets requiring action.
|
||||||
- **Ticket #369** (dominik@superbros.tv): Lost API key → Replied with recovery flow instructions. Simple case.
|
|
||||||
|
|
||||||
## 2026-02-16 20:27 UTC
|
**Notes:** System clean, no replies needed.
|
||||||
- **Ticket #369** (dominik@superbros.tv): Lost API key → Replied with recovery flow instructions. Straightforward.
|
|
||||||
|
|
||||||
## 2026-02-17 13:02 UTC — Ticket #370
|
|
||||||
- **Customer:** office@cloonar.com (dominik.polakovics@cloonar.com)
|
|
||||||
- **Subject:** Lost API key
|
|
||||||
- **Issue:** Customer lost API key and couldn't receive password reset email (verification code never arrived)
|
|
||||||
- **Root Cause:** BUG-050 — cloonar.com mail server was rejecting noreply@docfast.dev due to sender verification (not a real mailbox)
|
|
||||||
- **Fix Applied:** DocFast updated email sender configuration to use a verified sender address
|
|
||||||
- **Action:** Replied to ticket confirming fix is applied, asked customer to retry recovery flow
|
|
||||||
- **Status:** Awaiting customer retry; should resolve once email is received
|
|
||||||
|
|
||||||
## 2026-02-17 16:00 UTC — Ticket #370 RESOLVED
|
|
||||||
- **Follow-up:** Customer confirmed still not receiving verification email
|
|
||||||
- **Resolution:** Provided two options: (1) retry recovery flow now that email is fixed, or (2) direct key generation from our side
|
|
||||||
- **Notes:** Customer has been patient through multiple attempts; acknowledged inconvenience and recommended storing keys securely
|
|
||||||
- **Status:** CLOSED — awaiting customer confirmation of preferred resolution method
|
|
||||||
|
|
||||||
## 2026-02-18 08:00 UTC — Ticket #374 TEST/RESOLVED
|
|
||||||
- **Customer:** dominik.polakovics@cloonar.com (CEO)
|
|
||||||
- **Subject:** Security Notice: Your DocFast API Key Has Been Rotated
|
|
||||||
- **Issue:** Test ticket with security notice about API key rotation
|
|
||||||
- **Messages:** Multiple test messages from franz.hubert@docfast.dev (2026-02-17 21:57 onward) verifying email formatting
|
|
||||||
- **Customer Question:** CEO asked what tools/binaries the support team has access to
|
|
||||||
- **Franz's Response:** Appropriately declined to share internal tooling info; redirected to DocFast support scope
|
|
||||||
- **Status:** No further action needed — ticket appears to be a test of support system; properly handled by Franz
|
|
||||||
- **Notes:** This appears to be an internal test of the support system with test messages; no customer action required
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue