DocFast session 125: BUG-100 fix, codebase audit

This commit is contained in:
Hoid 2026-03-04 14:07:18 +01:00
parent 391f092c38
commit a9a6dc1e13
3 changed files with 43 additions and 3 deletions

View file

@ -1,3 +1,11 @@
## BUG-100: Usage flush transaction error handling broken — one bad key poisons entire batch
- **Date:** 2026-03-04
- **Severity:** MEDIUM
- **Issue:** `flushDirtyEntries()` in `src/middleware/usage.ts` wraps all usage writes in a single PostgreSQL transaction (`BEGIN`/`COMMIT`) with per-key `try/catch` inside the loop. In PostgreSQL, if any single INSERT fails inside a transaction, the transaction enters an "aborted" state and ALL subsequent queries fail with "current transaction is aborted, commands ignored until end of transaction block." The per-key error recovery is ineffective — one bad key causes all remaining keys in the batch to silently fail.
- **Impact:** If any single usage INSERT fails (e.g., constraint violation, type error), ALL remaining usage counts in that flush batch are lost. Could cause usage count divergence between in-memory and DB.
- **Fix:** Remove the transaction wrapper and flush each key independently, or use SAVEPOINTs.
- **Status:** ✅ FIXED — commit d2f819d. Removed transaction wrapper, each key flushes independently with its own client. 1 TDD test added. 464 tests total, all passing. Pushed to main (staging auto-deploy).
## BUG-099: provisionedSessions Set in billing.ts grows unbounded (memory leak)
- **Date:** 2026-03-03
- **Severity:** LOW

View file

@ -1,5 +1,36 @@
# Session Log
## Session 125 — 2026-03-04 13:00 UTC (Wednesday Afternoon)
- **Production:** v0.5.1 ✅ healthy, 2 replicas, 0 restarts, ~6.7d uptime
- **Staging:** v0.5.2 ✅ updated to commit d2f819d (34 commits ahead of prod)
- **K8s cluster:** All 3 nodes Ready
- **Support:** Zero tickets
- **Completed:**
1. **BUG-100 discovery & fix (TDD)** — Found that `flushDirtyEntries()` in usage middleware wrapped all writes in a single PostgreSQL transaction with per-key try/catch. In PostgreSQL, any failed INSERT aborts the entire transaction, making the per-key error handling useless — one bad key poisons the entire batch. Sub-agent removed the transaction, each key now flushes independently with its own client. 1 TDD test added (red→green verified). Commit d2f819d.
2. **Codebase audit** — Reviewed convert routes, browser pool, PDF options validation, body size limits, error pages, security headers, demo endpoint, logger. All solid. Noted CSS reset injection in `renderPdf` as a potential UX concern (documented, not a bug — changing it could break existing users).
3. **Infrastructure health check** — All 3 K8s nodes Ready, both prod replicas healthy (0 restarts, ~6.7d uptime), DB connected (PostgreSQL 17.4), browser pool 15/15. Production health endpoint confirmed v0.5.1.
- **Total tests:** 464 (all passing), 28 test files
- **Open bugs:** ZERO 🎉
- **CI runner:** Still absent. Managed by Cloonar — needs investor action.
- **Note:** Sonnet 4 512k sub-agents failing instantly (model availability issue?). Used Opus for sub-agent successfully.
- **Investor test:** All 5 checks pass ✅
- **Recommendation:** Staging v0.5.2 is production-ready with ZERO open bugs, 464 tests, 34 commits ahead. Request investor approval for production tag.
## Session 124 — 2026-03-04 10:00 UTC (Wednesday Late Morning)
- **Production:** v0.5.1 ✅ healthy, 2 replicas, 0 restarts, ~6.5d uptime
- **Staging:** v0.5.2 ✅ healthy, commit 314edc1 (33 commits ahead of prod)
- **K8s cluster:** All 3 nodes Ready
- **Support:** Zero tickets
- **Completed:**
1. **Input validation hardening (TDD)** — Added `waitUntil` validation to `validatePdfOptions` (accepts only `load`, `domcontentloaded`, `networkidle0`, `networkidle2`). Added 100KB size limits for `headerTemplate` and `footerTemplate` to prevent memory abuse. 15 TDD tests added (red→green verified). Commit 7d44524.
2. **OpenAPI schema accuracy fix (TDD)** — Fixed PdfOptions schema: format enum expanded from 6 to all 11 valid values, added missing `waitUntil` field with enum, added 100KB size limit docs to template fields. 1 TDD test added. Commit 314edc1.
3. **Infrastructure health check** — All 3 K8s nodes Ready, both prod replicas healthy (0 restarts, ~6.5d uptime), DB connected (PostgreSQL 17.4), browser pool 15/15. Staging healthy internally (external returns Forbidden due to IP whitelist — expected).
- **Total tests:** 463 (all passing), 27 test files
- **Open bugs:** ZERO 🎉
- **CI runner:** Still absent. Managed by Cloonar — needs investor action.
- **Investor test:** All 5 checks pass ✅
- **Recommendation:** Staging v0.5.2 is production-ready with ZERO open bugs, 463 tests, 33 commits ahead. Request investor approval for production tag.
## Session 123 — 2026-03-04 07:00 UTC (Wednesday Morning)
- **Production:** v0.5.1 ✅ healthy, 2 replicas, 0 restarts, ~6.5d uptime
- **Staging:** v0.5.2 ✅ healthy, commit 646a94d (31 commits ahead of prod)

View file

@ -3,7 +3,7 @@
"phaseLabel": "Build Production-Grade Product",
"status": "launch-ready",
"product": "DocFast — HTML/Markdown to PDF API",
"currentPriority": "Production on v0.5.1. Staging updated to v0.5.2 (31 commits ahead, commit 646a94d). CI runner still DOWN. npm audit 0 vulns. 447 tests passing (27 files). ZERO open bugs. Dependencies updated (patch/minor). Ready for production tag when investor approves.",
"currentPriority": "Production on v0.5.1. Staging updated to v0.5.2 (34 commits ahead, commit d2f819d). CI runner still DOWN. npm audit 0 vulns. 464 tests passing (28 files). ZERO open bugs. Fixed BUG-100 (usage flush transaction batch poisoning). Ready for production tag when investor approves.",
"ownerDirectives_PRIORITY": "Process these IN ORDER. Do not skip. Remove items marked ✅ DONE/FIXED during housekeeping.",
"ownerDirectives": [
"Stripe Product ID for DocFast: prod_TygeG8tQPtEAdE — webhook handler must filter by this product_id to ignore events from other projects on the same Stripe account."
@ -83,7 +83,8 @@
"LOW": [],
"note": "All bugs resolved. BUG-099 (provisionedSessions memory leak) fixed in commit 5f776db. BUG-098 (request interceptor leak) fixed in 024fa00. BUG-095/097 fixed 6290c3e. BUG-096 false positive."
},
"sessionCount": 125
},
"blockers": [],
"startDate": "2026-02-14",
"sessionCount": 123
"startDate": "2026-02-14"
}