Session 52: BUG-072 production outage fixed (ufw+Docker conflict + dual deployment)

This commit is contained in:
Hoid 2026-02-18 08:08:19 +00:00
parent f896561771
commit 75cda5287b
4 changed files with 65 additions and 2 deletions

View file

@ -1,3 +1,21 @@
## BUG-072: Production Outage — UFW+Docker Conflict + Dual Deployment
- **Date:** 2026-02-18 ~08:00 UTC
- **Severity:** CRITICAL
- **Duration:** Unknown start time → fixed 08:04 UTC
- **Symptoms:** App container stuck in restart loop, unable to reach PostgreSQL. `ENETUNREACH 172.17.0.1:5432`
- **Root causes:**
1. UFW injected DROP rules into Docker's DOCKER iptables chain, blocking container-to-host networking
2. A systemd service (`docfast.service`) running a separate Node.js process from `/opt/docfast` was binding port 3100, preventing Docker container from starting after fix
- **Fix applied:**
- Removed iptables DROP rules from DOCKER chain
- Stopped and disabled `docfast.service` systemd unit
- Changed DATABASE_HOST from `172.17.0.1` to `host.docker.internal` in docker-compose.yml
- Full `docker compose down && up -d` — container healthy
- **Permanent fix:** DevOps agent dispatched to properly resolve ufw+Docker conflict and remove dual deployment
- **Status:** FIXED (immediate), permanent fix in progress
---
# DocFast QA Report — 2026-02-15
**Tester:** QA Bot (automated)