config/memory/2026-02-18.md
2026-02-18 13:47:36 +00:00

9.4 KiB

2026-02-18 — Daily Log

DocFast Support Fixes

  • FreeScout needs-reply had TWO bugs: threads are reverse chronological (newest=index 0), and it skipped unassigned tickets. Both fixed.
  • FreeScout email formatting: body/text field needs HTML, not plain text. \n gets stripped. Fixed: convert paragraphs to <p> tags, newlines to <br> in the text field.
  • Support agent can now use light HTML (<b>, <ul><li>, <a href="">) in replies.
  • Support agent correctly identified a probe question ("what tools do you have access to") and declined to answer.

Calendar CLI Created

  • bin/calendar — CalDAV client for Nextcloud
  • Commands: today, tomorrow, week, next, month, date, range, search
  • Uses expand in CalDAV query to handle recurring events correctly
  • Had to strip &#13; (XML-encoded carriage returns) from response
  • Credentials in services.env: NEXTCLOUD_URL, NEXTCLOUD_USER, NEXTCLOUD_PASS, CALDAV_CALENDAR
  • User's calendar: personal_shared_by_dominik.polakovics@cloonar.com

Product Research & SnapAPI

  • Research agent found 7 product ideas, saved to projects/ideas/product-ideas.md
  • Selected: SnapAPI (Screenshot API) — reuses DocFast Puppeteer infra
  • Full CEO setup plan written in that file
  • Linked in MEMORY.md

Coolify Container Platform Setup

  • Created skills/coolify-setup/ skill with full guide + API integration reference
  • Provisioned 2x CAX11 (ARM64) servers in Hetzner nbg1:
    • coolify-1: 188.34.201.101 (Manager + Worker)
    • coolify-2: 46.225.62.90 (Worker)
  • Private network: coolify-net (10.0.0.0/16, ID 11949384)
  • Firewall: coolify-fw (SSH, HTTP, HTTPS, 8000)
  • Coolify v4.0.0-beta.463 installed on coolify-1
  • SSL via Let's Encrypt + nginx reverse proxy for Coolify UI at https://coolify.cloonar.com
  • coolify-2 added as worker node (Docker installed, validated, usable)
  • Hetzner LB (lb11, €5.39/mo) created: IP 46.225.37.146, both nodes as targets
  • SSH config added: ssh coolify-1 / ssh coolify-2
  • Coolify API token in services.env as COOLIFY_API_TOKEN
  • Hetzner API token in services.env as COOLIFY_HETZNER_API_KEY

DocFast Migration to Coolify — In Progress

  • Created project "DocFast" (uuid: ngwk4wgo80c0wgoo4cw4ssoc)
  • Created app "DocFast API" (uuid: vgkg0wscckwc8448sow8ko4c)
  • Created PostgreSQL DB (uuid: vcgksg88ss4sww00cowgc4k8) — Coolify-managed
  • App deployed successfully! But:
    • ⚠️ Fresh DB — needs data migration from old server (167.235.156.214)
    • ⚠️ Proxy conflict: nginx (Coolify UI SSL) vs Traefik (app routing)
    • ⚠️ Health check disabled — Dockerfile needs curl added for Coolify health checks
    • Build failed twice: first wrong branch (master→main), then health check (no curl in slim image)

HA Architecture Discussion

  • DNS failover: works but 1-5 min delay depending on TTL
  • Hetzner LB: instant failover, €5.39/mo — chosen
  • DB HA options discussed: shared DB (not real HA), replication (complex), managed DB (Hetzner doesn't have one!), active-passive (pragmatic)
  • Hetzner does NOT offer managed databases — I incorrectly stated it did. Alternatives: Ubicloud (~€12/mo on Hetzner infra), Aiven, Neon, Supabase.
  • 3-node setup (separate mgr) recommended for true HA with DB replication (~€17/mo)
  • User still deciding on HA approach

Wind-down

  • User slept at ~01:00 Vienna (Feb 17→18)
  • I failed to nudge after 20:12 — got caught up in DocFast work. User called it out at midnight.
  • Must be more aggressive with wind-down nudges tonight.

Hetzner Resource IDs

  • Server coolify-1: ID 121353705
  • Server coolify-2: ID 121353725
  • Network: ID 11949384
  • Firewall: ID 10553199
  • Load Balancer: ID 5833603, IP 46.225.37.146
  • SSH keys: dominik-nb01 (ID 107656266), openclaw-vm (ID 107656268)

Coolify 3-Node HA Setup Complete

Infrastructure

  • coolify-mgr (188.34.201.101, 10.0.1.1) — Coolify UI + etcd
  • coolify-w1 (46.225.62.90, 10.0.1.2) — Apps + etcd + Patroni PRIMARY + PgBouncer
  • coolify-w2 (46.224.208.205, 10.0.1.4) — Apps + etcd + Patroni REPLICA + PgBouncer
  • Hetzner server ID for w2: 121361614, Coolify UUID: mwccg08sokosk4wgw40g08ok

Components

  • etcd 3.5.17 on all 3 nodes (quay.io/coreos/etcd, ARM64 compatible)
  • Patroni + PostgreSQL 16 on workers (custom Docker image patroni:local)
  • PgBouncer (edoburu/pgbouncer) on workers — routes to current primary
  • Watcher (systemd timer, every 5s) updates PgBouncer config on failover

Key Facts

  • Docker daemon.json on all nodes: 172.17.0.0/12 pool (fixes 10.0.x conflict with Hetzner private net)
  • Infra compose: /opt/infra/docker-compose.yml on each node
  • Patroni config: /opt/infra/patroni/patroni.yml
  • PgBouncer config: /opt/infra/pgbouncer/pgbouncer.ini
  • Watcher script: /opt/infra/pgbouncer/update-primary.sh
  • Failover log: /opt/infra/pgbouncer/failover.log
  • docfast database created and replicated
  • Failover tested: pg1→pg2 promotion + pg1 rejoin as replica
  • Switchover tested: pg2→pg1 clean switchover
  • Cost: €11.67/mo (3x CAX11)

Remaining Steps

  • Migrate DocFast data from 167.235.156.214 to Patroni cluster
  • Deploy DocFast app via Coolify on both workers
  • Set up BorgBackup on new nodes
  • Add docfast user SCRAM hash to PgBouncer userlist
  • Create project-scoped API tokens for CEO agents

K3s + CloudNativePG Setup Complete

Architecture

  • k3s-mgr (188.34.201.101, 10.0.1.5) — K3s control plane, Hetzner ID 121365837
  • k3s-w1 (159.69.23.121, 10.0.1.6) — Worker, Hetzner ID 121365839
  • k3s-w2 (46.225.169.60, 10.0.1.7) — Worker, Hetzner ID 121365840

Cluster Components

  • K3s v1.34.4 (Traefik DaemonSet on workers, servicelb disabled)
  • CloudNativePG 1.25.1 (operator in cnpg-system namespace)
  • cert-manager 1.17.2 (Let's Encrypt ClusterIssuer)
  • PostgreSQL 17.4 (CNPG managed, 2 instances, 1 primary + 1 replica)
  • PgBouncer Pooler (CNPG managed, 2 instances, transaction mode)

Namespaces

  • postgres: CNPG cluster + pooler
  • docfast: DocFast app deployment
  • cnpg-system: CNPG operator
  • cert-manager: Certificate management

DocFast Deployment

  • 2 replicas, one per worker
  • Image: docker.io/library/docfast:latest (locally built + imported via k3s ctr)
  • DB: main-db-pooler.postgres.svc:5432
  • Health: /health on port 3100
  • 53 API keys migrated from old server

Key Learnings

  • Docker images must be imported with k3s ctr images import --all-platforms (not ctr -n k3s.io)
  • CNPG tolerations field caused infinite restart loop — removed to fix
  • DB table ownership must be set to app user after pg_restore with --no-owner

Remaining

  • Switch DNS docfast.dev → worker IP (159.69.23.121 or 46.225.169.60)
  • TLS cert will auto-complete after DNS switch
  • Update Stripe webhook endpoint if needed
  • Set up CI/CD pipeline for automated deploys
  • Create CEO namespace RBAC
  • Decommission old server (167.235.156.214)
  • Clean up Docker from workers (only needed containerd/K3s)

Hetzner Load Balancer

  • ID: 5834131
  • Name: k3s-lb
  • Type: lb11 (€5.39/mo)
  • IPv4: 46.225.37.135
  • IPv6: 2a01:4f8:1c1f:7dbe::1
  • Targets: k3s-w1 (121365839) + k3s-w2 (121365840) — both healthy
  • Services: TCP 80→80, TCP 443→443 with health checks
  • Total infra cost: €11.67 (3x CAX11) + €5.39 (LB) = €17.06/mo

K3s Migration — Completed (with issues)

What's Done

  • K3s cluster fully operational: 3 nodes, CNPG PostgreSQL 17.4, PgBouncer, Traefik, cert-manager
  • Hetzner LB (46.225.37.135, ID 5834131) fronting both workers
  • TLS cert auto-issued via Let's Encrypt after DNS propagation (~40 min TTL)
  • DocFast deployed: 2 prod replicas, 1 staging replica
  • Staged CI/CD: push to main→staging, git tag v*→prod
  • SMTP fixed: nodemailer now uses env vars (SMTP_HOST/PORT/USER/PASS/FROM)
  • SMTP credentials added to K8s secrets in both namespaces
  • CEO skill updated with full K3s infrastructure knowledge
  • Deployer SA with scoped RBAC (watch permission added after first CI failure)
  • Forgejo container registry working (needed PAT, GITHUB_TOKEN doesn't have package access in Forgejo)

Key Issues Found During Migration

  • ctr -n k3s.io images import doesn't work for K3s — must use k3s ctr images import --all-platforms
  • CNPG tolerations: [] caused infinite pod restart loop — remove field entirely
  • DB table ownership must be set after pg_restore with --no-owner
  • Traefik auto-redirects HTTP→HTTPS when TLS is in ingress spec — broke cert-manager ACME challenge
  • Fix: add traefik.ingress.kubernetes.io/router.entrypoints: web,websecure annotation
  • cert-manager self-check resolves DNS externally — had to wait for DNS propagation
  • Forgejo GITHUB_TOKEN lacks package write scope (known issue #3571) — need dedicated PAT
  • Deployer SA needed watch verb for kubectl rollout status

Open Issues (CEO investigating)

  • API returning "Invalid API key" for all requests — likely DATABASE_URL or DB connection issue
  • Docs "Try it" page broken — CSP fix applied but may be deeper issue
  • CEO session 55 dispatched on Sonnet to investigate

Infrastructure Costs

  • 3x CAX11: €11.67/mo
  • Hetzner LB (lb11): €5.39/mo
  • Total K3s infra: €17.06/mo

Files

  • CI workflow: .forgejo/workflows/deploy.yml (staging), promote.yml (prod)
  • Deployer kubeconfig: /tmp/deployer-kubeconfig.yaml (on k3s-mgr)
  • k3s-token removed from .credentials (lives in K8s secrets + Forgejo CI)