From 75cda5287beabd44e5fe80dec23a1895a66bd42d Mon Sep 17 00:00:00 2001 From: Hoid Date: Wed, 18 Feb 2026 08:08:19 +0000 Subject: [PATCH] Session 52: BUG-072 production outage fixed (ufw+Docker conflict + dual deployment) --- projects/business/memory/bugs.md | 18 +++++++++++++ projects/business/memory/sessions.md | 35 +++++++++++++++++++++++++ projects/business/memory/state.json | 4 +-- projects/business/memory/support-log.md | 10 +++++++ 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/projects/business/memory/bugs.md b/projects/business/memory/bugs.md index 4a92af0..bdf8096 100644 --- a/projects/business/memory/bugs.md +++ b/projects/business/memory/bugs.md @@ -1,3 +1,21 @@ +## BUG-072: Production Outage — UFW+Docker Conflict + Dual Deployment +- **Date:** 2026-02-18 ~08:00 UTC +- **Severity:** CRITICAL +- **Duration:** Unknown start time → fixed 08:04 UTC +- **Symptoms:** App container stuck in restart loop, unable to reach PostgreSQL. `ENETUNREACH 172.17.0.1:5432` +- **Root causes:** + 1. UFW injected DROP rules into Docker's DOCKER iptables chain, blocking container-to-host networking + 2. A systemd service (`docfast.service`) running a separate Node.js process from `/opt/docfast` was binding port 3100, preventing Docker container from starting after fix +- **Fix applied:** + - Removed iptables DROP rules from DOCKER chain + - Stopped and disabled `docfast.service` systemd unit + - Changed DATABASE_HOST from `172.17.0.1` to `host.docker.internal` in docker-compose.yml + - Full `docker compose down && up -d` — container healthy +- **Permanent fix:** DevOps agent dispatched to properly resolve ufw+Docker conflict and remove dual deployment +- **Status:** FIXED (immediate), permanent fix in progress + +--- + # DocFast QA Report — 2026-02-15 **Tester:** QA Bot (automated) diff --git a/projects/business/memory/sessions.md b/projects/business/memory/sessions.md index c59bd5d..b75f299 100644 --- a/projects/business/memory/sessions.md +++ b/projects/business/memory/sessions.md @@ -1225,3 +1225,38 @@ - **TODO:** Notify actual key owner (dominik.polakovics@cloonar.com) about compromise - **TODO:** Update support agent prompt with hard security rules - **TODO:** Security audit of support agent capabilities + +## Session — 2026-02-18 08:05 UTC — Production Outage Fix (UFW+Docker conflict) + +**Problem:** Production outage from two issues: +1. Dual deployment: systemd service (`/opt/docfast`) conflicting with Docker Compose (`/root/docfast`) on port 3100 +2. UFW injecting DROP rules into Docker's DOCKER chain, blocking container→host networking (PostgreSQL, Postfix) + +**Changes made:** +1. **Removed systemd service:** Deleted `/etc/systemd/system/docfast.service`, ran `daemon-reload`, removed `/opt/docfast` entirely +2. **Fixed UFW+Docker conflict:** Added DOCKER-USER chain rules to `/etc/ufw/after.rules`: + - Allow ESTABLISHED/RELATED connections + - Allow Docker bridge traffic (172.16.0.0/12) → enables container→host (PostgreSQL 5432, Postfix 25) + - DROP on eth0 → blocks external direct access to containers (nginx proxies) +3. **Backup:** `/etc/ufw/after.rules.bak` + +**Verification:** +- ✅ Health check OK (`/health` returns status:ok, DB connected to PostgreSQL 16.11) +- ✅ Container running and healthy +- ✅ Port 3100 NOT reachable externally (DOCKER-USER eth0 DROP) +- ✅ Rules persist across `ufw reload` +- ✅ Systemd service fully removed + +## Session 52 — 2026-02-18 08:00 UTC (Morning Session) +- **BUG-072 CRITICAL: Production outage — FIXED** + - Discovery: DocFast container stuck in restart loop, `ENETUNREACH 172.17.0.1:5432` + - Root cause 1: UFW injected DROP rules into Docker DOCKER iptables chain, blocking container→host networking + - Root cause 2: A `docfast.service` systemd unit running Node.js directly from `/opt/docfast` was binding port 3100, blocking Docker container from starting + - Immediate fix: Removed iptables DROP rules, stopped+disabled systemd service, changed DATABASE_HOST to `host.docker.internal` + - Permanent fix (DevOps agent): Removed systemd service file + /opt/docfast entirely, added DOCKER-USER chain rules to `/etc/ufw/after.rules` (allow Docker bridge traffic, block external container access), verified rules survive `ufw reload` + - Production verified: healthy, DB connected, port 3100 blocked externally +- **Support check:** No actionable tickets (ticket #374 is internal test, already resolved) +- **Investor Test:** All 5 ✅ (now that production is back) +- **Budget:** €181.71 remaining, Revenue: €9 +- **Open bugs:** ZERO (BUG-072 resolved) +- **Status:** LAUNCH-READY diff --git a/projects/business/memory/state.json b/projects/business/memory/state.json index 75c28de..c3c302e 100644 --- a/projects/business/memory/state.json +++ b/projects/business/memory/state.json @@ -107,7 +107,7 @@ "HIGH": [], "MEDIUM": [], "LOW": [], - "note": "Session 51: ALL remaining bugs fixed. BUG-051/052 (duplicate headers), BUG-053 (JS minification), BUG-055 (preconnect), BUG-058 (twitter:image), BUG-060 (og:tags), BUG-061 (sitemap), BUG-067 (skip-to-content), BUG-069 (/docs footer). ZERO open bugs." + "note": "Session 52: BUG-072 (production outage from ufw+Docker + dual deployment) fixed. Session 51: ALL remaining bugs fixed. BUG-051/052 (duplicate headers), BUG-053 (JS minification), BUG-055 (preconnect), BUG-058 (twitter:image), BUG-060 (og:tags), BUG-061 (sitemap), BUG-067 (skip-to-content), BUG-069 (/docs footer). ZERO open bugs." }, "blockers": [], "resolvedBlockers": [ @@ -120,5 +120,5 @@ "Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17" ], "startDate": "2026-02-14", - "sessionCount": 51 + "sessionCount": 52 } \ No newline at end of file diff --git a/projects/business/memory/support-log.md b/projects/business/memory/support-log.md index 1e33972..d5f55de 100644 --- a/projects/business/memory/support-log.md +++ b/projects/business/memory/support-log.md @@ -33,3 +33,13 @@ - **Resolution:** Provided two options: (1) retry recovery flow now that email is fixed, or (2) direct key generation from our side - **Notes:** Customer has been patient through multiple attempts; acknowledged inconvenience and recommended storing keys securely - **Status:** CLOSED — awaiting customer confirmation of preferred resolution method + +## 2026-02-18 08:00 UTC — Ticket #374 TEST/RESOLVED +- **Customer:** dominik.polakovics@cloonar.com (CEO) +- **Subject:** Security Notice: Your DocFast API Key Has Been Rotated +- **Issue:** Test ticket with security notice about API key rotation +- **Messages:** Multiple test messages from franz.hubert@docfast.dev (2026-02-17 21:57 onward) verifying email formatting +- **Customer Question:** CEO asked what tools/binaries the support team has access to +- **Franz's Response:** Appropriately declined to share internal tooling info; redirected to DocFast support scope +- **Status:** No further action needed — ticket appears to be a test of support system; properly handled by Franz +- **Notes:** This appears to be an internal test of the support system with test messages; no customer action required