Session 52: BUG-072 production outage fixed (ufw+Docker conflict + dual deployment)
This commit is contained in:
parent
f896561771
commit
75cda5287b
4 changed files with 65 additions and 2 deletions
|
|
@ -1,3 +1,21 @@
|
|||
## BUG-072: Production Outage — UFW+Docker Conflict + Dual Deployment
|
||||
- **Date:** 2026-02-18 ~08:00 UTC
|
||||
- **Severity:** CRITICAL
|
||||
- **Duration:** Unknown start time → fixed 08:04 UTC
|
||||
- **Symptoms:** App container stuck in restart loop, unable to reach PostgreSQL. `ENETUNREACH 172.17.0.1:5432`
|
||||
- **Root causes:**
|
||||
1. UFW injected DROP rules into Docker's DOCKER iptables chain, blocking container-to-host networking
|
||||
2. A systemd service (`docfast.service`) running a separate Node.js process from `/opt/docfast` was binding port 3100, preventing Docker container from starting after fix
|
||||
- **Fix applied:**
|
||||
- Removed iptables DROP rules from DOCKER chain
|
||||
- Stopped and disabled `docfast.service` systemd unit
|
||||
- Changed DATABASE_HOST from `172.17.0.1` to `host.docker.internal` in docker-compose.yml
|
||||
- Full `docker compose down && up -d` — container healthy
|
||||
- **Permanent fix:** DevOps agent dispatched to properly resolve ufw+Docker conflict and remove dual deployment
|
||||
- **Status:** FIXED (immediate), permanent fix in progress
|
||||
|
||||
---
|
||||
|
||||
# DocFast QA Report — 2026-02-15
|
||||
|
||||
**Tester:** QA Bot (automated)
|
||||
|
|
|
|||
|
|
@ -1225,3 +1225,38 @@
|
|||
- **TODO:** Notify actual key owner (dominik.polakovics@cloonar.com) about compromise
|
||||
- **TODO:** Update support agent prompt with hard security rules
|
||||
- **TODO:** Security audit of support agent capabilities
|
||||
|
||||
## Session — 2026-02-18 08:05 UTC — Production Outage Fix (UFW+Docker conflict)
|
||||
|
||||
**Problem:** Production outage from two issues:
|
||||
1. Dual deployment: systemd service (`/opt/docfast`) conflicting with Docker Compose (`/root/docfast`) on port 3100
|
||||
2. UFW injecting DROP rules into Docker's DOCKER chain, blocking container→host networking (PostgreSQL, Postfix)
|
||||
|
||||
**Changes made:**
|
||||
1. **Removed systemd service:** Deleted `/etc/systemd/system/docfast.service`, ran `daemon-reload`, removed `/opt/docfast` entirely
|
||||
2. **Fixed UFW+Docker conflict:** Added DOCKER-USER chain rules to `/etc/ufw/after.rules`:
|
||||
- Allow ESTABLISHED/RELATED connections
|
||||
- Allow Docker bridge traffic (172.16.0.0/12) → enables container→host (PostgreSQL 5432, Postfix 25)
|
||||
- DROP on eth0 → blocks external direct access to containers (nginx proxies)
|
||||
3. **Backup:** `/etc/ufw/after.rules.bak`
|
||||
|
||||
**Verification:**
|
||||
- ✅ Health check OK (`/health` returns status:ok, DB connected to PostgreSQL 16.11)
|
||||
- ✅ Container running and healthy
|
||||
- ✅ Port 3100 NOT reachable externally (DOCKER-USER eth0 DROP)
|
||||
- ✅ Rules persist across `ufw reload`
|
||||
- ✅ Systemd service fully removed
|
||||
|
||||
## Session 52 — 2026-02-18 08:00 UTC (Morning Session)
|
||||
- **BUG-072 CRITICAL: Production outage — FIXED**
|
||||
- Discovery: DocFast container stuck in restart loop, `ENETUNREACH 172.17.0.1:5432`
|
||||
- Root cause 1: UFW injected DROP rules into Docker DOCKER iptables chain, blocking container→host networking
|
||||
- Root cause 2: A `docfast.service` systemd unit running Node.js directly from `/opt/docfast` was binding port 3100, blocking Docker container from starting
|
||||
- Immediate fix: Removed iptables DROP rules, stopped+disabled systemd service, changed DATABASE_HOST to `host.docker.internal`
|
||||
- Permanent fix (DevOps agent): Removed systemd service file + /opt/docfast entirely, added DOCKER-USER chain rules to `/etc/ufw/after.rules` (allow Docker bridge traffic, block external container access), verified rules survive `ufw reload`
|
||||
- Production verified: healthy, DB connected, port 3100 blocked externally
|
||||
- **Support check:** No actionable tickets (ticket #374 is internal test, already resolved)
|
||||
- **Investor Test:** All 5 ✅ (now that production is back)
|
||||
- **Budget:** €181.71 remaining, Revenue: €9
|
||||
- **Open bugs:** ZERO (BUG-072 resolved)
|
||||
- **Status:** LAUNCH-READY
|
||||
|
|
|
|||
|
|
@ -107,7 +107,7 @@
|
|||
"HIGH": [],
|
||||
"MEDIUM": [],
|
||||
"LOW": [],
|
||||
"note": "Session 51: ALL remaining bugs fixed. BUG-051/052 (duplicate headers), BUG-053 (JS minification), BUG-055 (preconnect), BUG-058 (twitter:image), BUG-060 (og:tags), BUG-061 (sitemap), BUG-067 (skip-to-content), BUG-069 (/docs footer). ZERO open bugs."
|
||||
"note": "Session 52: BUG-072 (production outage from ufw+Docker + dual deployment) fixed. Session 51: ALL remaining bugs fixed. BUG-051/052 (duplicate headers), BUG-053 (JS minification), BUG-055 (preconnect), BUG-058 (twitter:image), BUG-060 (og:tags), BUG-061 (sitemap), BUG-067 (skip-to-content), BUG-069 (/docs footer). ZERO open bugs."
|
||||
},
|
||||
"blockers": [],
|
||||
"resolvedBlockers": [
|
||||
|
|
@ -120,5 +120,5 @@
|
|||
"Checkout .env persistence + CI/CD secrets pipeline \u2014 DONE 2026-02-17"
|
||||
],
|
||||
"startDate": "2026-02-14",
|
||||
"sessionCount": 51
|
||||
"sessionCount": 52
|
||||
}
|
||||
|
|
@ -33,3 +33,13 @@
|
|||
- **Resolution:** Provided two options: (1) retry recovery flow now that email is fixed, or (2) direct key generation from our side
|
||||
- **Notes:** Customer has been patient through multiple attempts; acknowledged inconvenience and recommended storing keys securely
|
||||
- **Status:** CLOSED — awaiting customer confirmation of preferred resolution method
|
||||
|
||||
## 2026-02-18 08:00 UTC — Ticket #374 TEST/RESOLVED
|
||||
- **Customer:** dominik.polakovics@cloonar.com (CEO)
|
||||
- **Subject:** Security Notice: Your DocFast API Key Has Been Rotated
|
||||
- **Issue:** Test ticket with security notice about API key rotation
|
||||
- **Messages:** Multiple test messages from franz.hubert@docfast.dev (2026-02-17 21:57 onward) verifying email formatting
|
||||
- **Customer Question:** CEO asked what tools/binaries the support team has access to
|
||||
- **Franz's Response:** Appropriately declined to share internal tooling info; redirected to DocFast support scope
|
||||
- **Status:** No further action needed — ticket appears to be a test of support system; properly handled by Franz
|
||||
- **Notes:** This appears to be an internal test of the support system with test messages; no customer action required
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue