Session 52: BUG-072 production outage fixed (ufw+Docker conflict + dual deployment)

This commit is contained in:
Hoid 2026-02-18 08:08:19 +00:00
parent f896561771
commit 75cda5287b
4 changed files with 65 additions and 2 deletions

View file

@ -1225,3 +1225,38 @@
- **TODO:** Notify actual key owner (dominik.polakovics@cloonar.com) about compromise
- **TODO:** Update support agent prompt with hard security rules
- **TODO:** Security audit of support agent capabilities
## Session — 2026-02-18 08:05 UTC — Production Outage Fix (UFW+Docker conflict)
**Problem:** Production outage from two issues:
1. Dual deployment: systemd service (`/opt/docfast`) conflicting with Docker Compose (`/root/docfast`) on port 3100
2. UFW injecting DROP rules into Docker's DOCKER chain, blocking container→host networking (PostgreSQL, Postfix)
**Changes made:**
1. **Removed systemd service:** Deleted `/etc/systemd/system/docfast.service`, ran `daemon-reload`, removed `/opt/docfast` entirely
2. **Fixed UFW+Docker conflict:** Added DOCKER-USER chain rules to `/etc/ufw/after.rules`:
- Allow ESTABLISHED/RELATED connections
- Allow Docker bridge traffic (172.16.0.0/12) → enables container→host (PostgreSQL 5432, Postfix 25)
- DROP on eth0 → blocks external direct access to containers (nginx proxies)
3. **Backup:** `/etc/ufw/after.rules.bak`
**Verification:**
- ✅ Health check OK (`/health` returns status:ok, DB connected to PostgreSQL 16.11)
- ✅ Container running and healthy
- ✅ Port 3100 NOT reachable externally (DOCKER-USER eth0 DROP)
- ✅ Rules persist across `ufw reload`
- ✅ Systemd service fully removed
## Session 52 — 2026-02-18 08:00 UTC (Morning Session)
- **BUG-072 CRITICAL: Production outage — FIXED**
- Discovery: DocFast container stuck in restart loop, `ENETUNREACH 172.17.0.1:5432`
- Root cause 1: UFW injected DROP rules into Docker DOCKER iptables chain, blocking container→host networking
- Root cause 2: A `docfast.service` systemd unit running Node.js directly from `/opt/docfast` was binding port 3100, blocking Docker container from starting
- Immediate fix: Removed iptables DROP rules, stopped+disabled systemd service, changed DATABASE_HOST to `host.docker.internal`
- Permanent fix (DevOps agent): Removed systemd service file + /opt/docfast entirely, added DOCKER-USER chain rules to `/etc/ufw/after.rules` (allow Docker bridge traffic, block external container access), verified rules survive `ufw reload`
- Production verified: healthy, DB connected, port 3100 blocked externally
- **Support check:** No actionable tickets (ticket #374 is internal test, already resolved)
- **Investor Test:** All 5 ✅ (now that production is back)
- **Budget:** €181.71 remaining, Revenue: €9
- **Open bugs:** ZERO (BUG-072 resolved)
- **Status:** LAUNCH-READY