Compare commits

..

No commits in common. "de56cbf220f4bd86efa110f71fa73787c59b2f74" and "3026420c9dea93dc8ea622e40120de744a28d3cf" have entirely different histories.

5 changed files with 13 additions and 354 deletions

BIN
kubectl

Binary file not shown.

View file

@ -67,87 +67,3 @@
- Firewall: ID 10553199
- Load Balancer: ID 5833603, IP 46.225.37.146
- SSH keys: dominik-nb01 (ID 107656266), openclaw-vm (ID 107656268)
## Coolify 3-Node HA Setup Complete
### Infrastructure
- **coolify-mgr** (188.34.201.101, 10.0.1.1) — Coolify UI + etcd
- **coolify-w1** (46.225.62.90, 10.0.1.2) — Apps + etcd + Patroni PRIMARY + PgBouncer
- **coolify-w2** (46.224.208.205, 10.0.1.4) — Apps + etcd + Patroni REPLICA + PgBouncer
- Hetzner server ID for w2: 121361614, Coolify UUID: mwccg08sokosk4wgw40g08ok
### Components
- **etcd 3.5.17** on all 3 nodes (quay.io/coreos/etcd, ARM64 compatible)
- **Patroni + PostgreSQL 16** on workers (custom Docker image `patroni:local`)
- **PgBouncer** (edoburu/pgbouncer) on workers — routes to current primary
- **Watcher** (systemd timer, every 5s) updates PgBouncer config on failover
### Key Facts
- Docker daemon.json on all nodes: `172.17.0.0/12` pool (fixes 10.0.x conflict with Hetzner private net)
- Infra compose: `/opt/infra/docker-compose.yml` on each node
- Patroni config: `/opt/infra/patroni/patroni.yml`
- PgBouncer config: `/opt/infra/pgbouncer/pgbouncer.ini`
- Watcher script: `/opt/infra/pgbouncer/update-primary.sh`
- Failover log: `/opt/infra/pgbouncer/failover.log`
- `docfast` database created and replicated
- Failover tested: pg1→pg2 promotion + pg1 rejoin as replica ✅
- Switchover tested: pg2→pg1 clean switchover ✅
- Cost: €11.67/mo (3x CAX11)
### Remaining Steps
- [ ] Migrate DocFast data from 167.235.156.214 to Patroni cluster
- [ ] Deploy DocFast app via Coolify on both workers
- [ ] Set up BorgBackup on new nodes
- [ ] Add docfast user SCRAM hash to PgBouncer userlist
- [ ] Create project-scoped API tokens for CEO agents
## K3s + CloudNativePG Setup Complete
### Architecture
- **k3s-mgr** (188.34.201.101, 10.0.1.5) — K3s control plane, Hetzner ID 121365837
- **k3s-w1** (159.69.23.121, 10.0.1.6) — Worker, Hetzner ID 121365839
- **k3s-w2** (46.225.169.60, 10.0.1.7) — Worker, Hetzner ID 121365840
### Cluster Components
- K3s v1.34.4 (Traefik DaemonSet on workers, servicelb disabled)
- CloudNativePG 1.25.1 (operator in cnpg-system namespace)
- cert-manager 1.17.2 (Let's Encrypt ClusterIssuer)
- PostgreSQL 17.4 (CNPG managed, 2 instances, 1 primary + 1 replica)
- PgBouncer Pooler (CNPG managed, 2 instances, transaction mode)
### Namespaces
- postgres: CNPG cluster + pooler
- docfast: DocFast app deployment
- cnpg-system: CNPG operator
- cert-manager: Certificate management
### DocFast Deployment
- 2 replicas, one per worker
- Image: docker.io/library/docfast:latest (locally built + imported via k3s ctr)
- DB: main-db-pooler.postgres.svc:5432
- Health: /health on port 3100
- 53 API keys migrated from old server
### Key Learnings
- Docker images must be imported with `k3s ctr images import --all-platforms` (not `ctr -n k3s.io`)
- CNPG tolerations field caused infinite restart loop — removed to fix
- DB table ownership must be set to app user after pg_restore with --no-owner
### Remaining
- [ ] Switch DNS docfast.dev → worker IP (159.69.23.121 or 46.225.169.60)
- [ ] TLS cert will auto-complete after DNS switch
- [ ] Update Stripe webhook endpoint if needed
- [ ] Set up CI/CD pipeline for automated deploys
- [ ] Create CEO namespace RBAC
- [ ] Decommission old server (167.235.156.214)
- [ ] Clean up Docker from workers (only needed containerd/K3s)
### Hetzner Load Balancer
- ID: 5834131
- Name: k3s-lb
- Type: lb11 (€5.39/mo)
- IPv4: 46.225.37.135
- IPv6: 2a01:4f8:1c1f:7dbe::1
- Targets: k3s-w1 (121365839) + k3s-w2 (121365840) — both healthy
- Services: TCP 80→80, TCP 443→443 with health checks
- Total infra cost: €11.67 (3x CAX11) + €5.39 (LB) = €17.06/mo

View file

@ -54,15 +54,10 @@
"created": "2026-02-12T20:00:00Z",
"lastUpdated": "2026-02-17T16:15:00Z",
"closingSnapshot": {
"date": "2026-02-18",
"DFNS": 57.80,
"portfolioValue": 1027.92,
"dailyPL": 1.89,
"totalReturn": 2.79
},
"middayCheck": {
"date": "2026-02-18",
"DFNS": 57.80,
"action": "HOLD"
"date": "2026-02-17",
"DFNS": 57.01,
"portfolioValue": 1014.27,
"dailyPL": 0.07,
"totalReturn": 1.43
}
}

View file

@ -1,166 +0,0 @@
#!/bin/bash
# Coolify Infrastructure Setup via Hetzner Cloud API
set -euo pipefail
# Load API key from services.env
CRED_FILE="${HOME}/.openclaw/workspace/.credentials/services.env"
HETZNER_TOKEN=$(grep '^COOLIFY_HETZNER_API_KEY=' "$CRED_FILE" | cut -d= -f2- || true)
if [ -z "$HETZNER_TOKEN" ]; then
echo "ERROR: COOLIFY_HETZNER_API_KEY not found in services.env"
exit 1
fi
API="https://api.hetzner.cloud/v1"
AUTH="Authorization: Bearer $HETZNER_TOKEN"
hcloud() {
curl -s -H "$AUTH" -H "Content-Type: application/json" "$@"
}
echo "=== Coolify Infrastructure Setup ==="
# Step 1: Upload SSH keys
echo ""
echo "--- Step 1: SSH Keys ---"
# User's key
USER_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFshMhXwS0FQFPlITipshvNKrV8sA52ZFlnaoHd1thKg dominik@nb-01"
# OpenClaw key
OPENCLAW_KEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+i4Nn0Nc1ovqHXmbyekxCigT2Qn6RD1cdbKkW727Yl openclaw@openclaw-vm"
# Check if keys already exist
EXISTING_KEYS=$(hcloud "$API/ssh_keys" | jq -r '.ssh_keys[].name' 2>/dev/null || true)
upload_key() {
local name="$1" key="$2"
if echo "$EXISTING_KEYS" | grep -q "^${name}$"; then
hcloud "$API/ssh_keys" | jq -r ".ssh_keys[] | select(.name==\"$name\") | .id"
else
echo " Uploading key '$name'..." >&2
RESP=$(hcloud -X POST "$API/ssh_keys" -d "{\"name\":\"$name\",\"public_key\":\"$key\"}")
local kid=$(echo "$RESP" | jq -r '.ssh_key.id')
if [ "$kid" = "null" ] || [ -z "$kid" ]; then
echo " ERROR: $(echo "$RESP" | jq -r '.error.message // "Unknown error"')" >&2
exit 1
fi
echo " Uploaded: ID $kid" >&2
echo "$kid"
fi
}
USER_KEY_ID=$(upload_key "dominik-nb01" "$USER_KEY")
OPENCLAW_KEY_ID=$(upload_key "openclaw-vm" "$OPENCLAW_KEY")
echo " User key ID: $USER_KEY_ID"
echo " OpenClaw key ID: $OPENCLAW_KEY_ID"
# Step 2: Create Network
echo ""
echo "--- Step 2: Private Network ---"
EXISTING_NET=$(hcloud "$API/networks" | jq -r '.networks[] | select(.name=="coolify-net") | .id' 2>/dev/null || echo "")
if [ -n "$EXISTING_NET" ]; then
echo " Network 'coolify-net' already exists: ID $EXISTING_NET"
NET_ID="$EXISTING_NET"
else
echo " Creating network 'coolify-net' (10.0.0.0/16)..."
RESP=$(hcloud -X POST "$API/networks" -d '{
"name": "coolify-net",
"ip_range": "10.0.0.0/16",
"subnets": [{"type": "cloud", "network_zone": "eu-central", "ip_range": "10.0.1.0/24"}]
}')
NET_ID=$(echo "$RESP" | jq '.network.id')
if [ "$NET_ID" = "null" ] || [ -z "$NET_ID" ]; then
echo " ERROR: $(echo "$RESP" | jq -r '.error.message // "Unknown error"')"
exit 1
fi
echo " Created: ID $NET_ID"
fi
# Step 3: Create Servers
echo ""
echo "--- Step 3: Servers ---"
create_server() {
local name="$1"
EXISTING=$(hcloud "$API/servers" | jq -r ".servers[] | select(.name==\"$name\") | .id" 2>/dev/null || echo "")
if [ -n "$EXISTING" ]; then
echo " Server '$name' already exists: ID $EXISTING"
IP=$(hcloud "$API/servers/$EXISTING" | jq -r '.server.public_net.ipv4.ip')
echo " IP: $IP"
return
fi
echo " Creating server '$name' (CAX11, ARM64, Ubuntu 24.04, fsn1)..."
RESP=$(hcloud -X POST "$API/servers" -d "{
\"name\": \"$name\",
\"server_type\": \"cax11\",
\"image\": \"ubuntu-24.04\",
\"location\": \"fsn1\",
\"ssh_keys\": [$USER_KEY_ID, $OPENCLAW_KEY_ID],
\"networks\": [$NET_ID],
\"public_net\": {\"enable_ipv4\": true, \"enable_ipv6\": true},
\"labels\": {\"project\": \"coolify\", \"role\": \"$(echo $name | sed 's/coolify-//')\"}
}")
SERVER_ID=$(echo "$RESP" | jq '.server.id')
IP=$(echo "$RESP" | jq -r '.server.public_net.ipv4.ip')
if [ "$SERVER_ID" = "null" ] || [ -z "$SERVER_ID" ]; then
echo " ERROR: $(echo "$RESP" | jq -r '.error.message // "Unknown error"')"
exit 1
fi
echo " Created: ID $SERVER_ID, IP $IP"
}
create_server "coolify-1"
create_server "coolify-2"
# Step 4: Create Firewall
echo ""
echo "--- Step 4: Firewall ---"
EXISTING_FW=$(hcloud "$API/firewalls" | jq -r '.firewalls[] | select(.name=="coolify-fw") | .id' 2>/dev/null || echo "")
if [ -n "$EXISTING_FW" ]; then
echo " Firewall 'coolify-fw' already exists: ID $EXISTING_FW"
else
echo " Creating firewall 'coolify-fw'..."
RESP=$(hcloud -X POST "$API/firewalls" -d '{
"name": "coolify-fw",
"rules": [
{"direction": "in", "protocol": "tcp", "port": "22", "source_ips": ["0.0.0.0/0", "::/0"], "description": "SSH"},
{"direction": "in", "protocol": "tcp", "port": "80", "source_ips": ["0.0.0.0/0", "::/0"], "description": "HTTP"},
{"direction": "in", "protocol": "tcp", "port": "443", "source_ips": ["0.0.0.0/0", "::/0"], "description": "HTTPS"},
{"direction": "in", "protocol": "tcp", "port": "8000", "source_ips": ["0.0.0.0/0", "::/0"], "description": "Coolify UI"}
]
}')
FW_ID=$(echo "$RESP" | jq '.firewall.id')
echo " Created: ID $FW_ID"
# Apply to both servers
echo " Applying firewall to servers..."
SERVER_IDS=$(hcloud "$API/servers" | jq '[.servers[] | select(.name | startswith("coolify-")) | .id]')
for SID in $(echo "$SERVER_IDS" | jq -r '.[]'); do
hcloud -X POST "$API/firewalls/$FW_ID/actions/apply_to_resources" \
-d "{\"apply_to\": [{\"type\": \"server\", \"server\": {\"id\": $SID}}]}" > /dev/null
echo " Applied to server $SID"
done
fi
# Summary
echo ""
echo "=== DONE ==="
echo ""
echo "Servers:"
for name in coolify-1 coolify-2; do
IP=$(hcloud "$API/servers" | jq -r ".servers[] | select(.name==\"$name\") | .public_net.ipv4.ip")
PRIV_IP=$(hcloud "$API/servers" | jq -r ".servers[] | select(.name==\"$name\") | .private_net[0].ip // \"pending\"")
echo " $name: public=$IP private=$PRIV_IP"
done
echo ""
echo "Next: Wait ~60s for servers to boot, then install Coolify on coolify-1"

View file

@ -87,12 +87,11 @@ You don't have a fixed team. You **hire experts on demand** using `sessions_spaw
6. Verify their work, then report results
**Every specialist brief MUST include:**
- **For code changes:** Forgejo repo openclawd/docfast, push via old server: `ssh docfast 'cd /root/docfast && git add -A && git commit -m "..." && git push origin main'`
- **For K3s/infra work:** SSH: `ssh k3s-mgr`, kubectl: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; export PATH=$PATH:/usr/local/bin`
- **Namespaces:** `docfast` (prod, 2 replicas), `docfast-staging` (staging, 1 replica), `postgres` (CNPG DB)
- Server: 167.235.156.214, SSH key: /home/openclaw/.ssh/docfast
- Forgejo repo: openclawd/docfast (push via SSH: `GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i /home/openclaw/.ssh/docfast"`)
- Credentials: `source /home/openclaw/.openclaw/workspace/.credentials/docfast.env` (NEVER read directly)
- Clear task definition and acceptance criteria
- "Push to main deploys to STAGING. Verify on staging.docfast.dev first. Tag v* for production."
- "Deploy to production AND verify on the LIVE site before reporting back"
**For QA agents, always include browser testing instructions:**
```
@ -239,96 +238,11 @@ Message on WhatsApp with: what you need (specific), cost (exact), urgency.
## Infrastructure
### K3s Cluster (Production)
DocFast runs on a 3-node K3s Kubernetes cluster behind a Hetzner Load Balancer.
**Architecture:**
```
Internet → Hetzner LB (46.225.37.135) → k3s-w1 / k3s-w2 (Traefik) → DocFast pods
CloudNativePG (main-db) → PostgreSQL 17.4
PgBouncer pooler (transaction mode)
```
**Nodes:**
- k3s-mgr (188.34.201.101) — control plane only, no workloads
- k3s-w1 (159.69.23.121) — worker
- k3s-w2 (46.225.169.60) — worker
- All CAX11 ARM64, SSH key: /home/openclaw/.ssh/id_ed25519
**Load Balancer:** Hetzner LB `k3s-lb` (ID 5834131), IPv4 46.225.37.135
**Namespaces:**
- `docfast` — production (2 replicas)
- `docfast-staging` — staging (1 replica), accessible at staging.docfast.dev
- `postgres` — CloudNativePG cluster (main-db, 2 instances) + PgBouncer pooler
- `cnpg-system` — CloudNativePG operator
- `cert-manager` — Let's Encrypt certificates (auto-managed)
**Databases:**
- Production: `docfast` database on main-db-pooler.postgres.svc:5432
- Staging: `docfast_staging` database on same pooler
**Container Registry:** Forgejo at git.cloonar.com/openclawd/docfast
**SSH Access:** `ssh k3s-mgr` / `ssh k3s-w1` / `ssh k3s-w2` (all as root)
**kubectl:** On k3s-mgr: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml`
### CI/CD Pipeline (Staged Deployment)
**Push to `main` → Staging:**
1. Forgejo CI builds ARM64 image via QEMU cross-compile
2. Pushes to Forgejo container registry
3. `kubectl set image` deploys to `docfast-staging` namespace
4. Verify at staging.docfast.dev
**Push git tag `v*` → Production:**
1. Retags latest image with version tag
2. `kubectl set image` deploys to `docfast` namespace (prod)
3. Verify at docfast.dev
**To deploy to production:**
```bash
# On the repo (via old server or any clone with push access)
git tag v0.2.2
git push origin v0.2.2
```
**CI Secrets (in Forgejo):**
- `KUBECONFIG` — base64-encoded deployer kubeconfig (docfast + docfast-staging namespace access only)
- `REGISTRY_TOKEN` — Forgejo PAT with write:package scope
**Deployer ServiceAccount:** `deployer` in `docfast` namespace — can only update deployments and list/watch/exec pods. Cannot read secrets or access other namespaces.
### Old Server (TO BE DECOMMISSIONED)
- Server: 167.235.156.214, SSH key: /home/openclaw/.ssh/docfast
- Still used for: git push access to Forgejo repo
- **Do NOT deploy here anymore** — all deployments go through K3s CI/CD
### When to Hire Which Expert
| Problem | Hire | Key Context |
|---------|------|-------------|
| App crashes / pod restarts | **DevOps Engineer** | Check `kubectl logs -n docfast <pod>`, `kubectl describe pod` |
| Database issues | **Database Admin** | CNPG cluster in `postgres` namespace, `kubectl exec -n postgres main-db-1 -c postgres -- psql` |
| Deployment failures | **DevOps Engineer** | Check Forgejo CI run logs, deployer SA RBAC, registry auth |
| SSL/TLS cert issues | **DevOps Engineer** | cert-manager in `cert-manager` namespace, `kubectl get certificate -n docfast` |
| Load balancer issues | **DevOps Engineer** | Hetzner LB ID 5834131, API via HETZNER_API_TOKEN |
| Need to scale | **DevOps Engineer** | `kubectl scale deployment docfast -n docfast --replicas=3` |
| Code changes | **Backend/Frontend Dev** | Push to main → auto-deploys to staging. Tag `v*` for prod. |
| Security issues | **Security Expert** | Full cluster access via k3s-mgr |
**Every specialist brief for K3s work MUST include:**
- SSH: `ssh k3s-mgr` (key: /home/openclaw/.ssh/id_ed25519)
- kubectl: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; export PATH=$PATH:/usr/local/bin`
- Namespaces: `docfast` (prod), `docfast-staging` (staging), `postgres` (DB)
- Registry: git.cloonar.com/openclawd/docfast
- Credentials: `source /home/openclaw/.openclaw/workspace/.credentials/docfast.env`
### Credentials
- `/home/openclaw/.openclaw/workspace/.credentials/docfast.env` — Hetzner API, Stripe keys
- `/home/openclaw/.openclaw/workspace/.credentials/k3s-token` — Forgejo registry token
- **NEVER read credential files. Source them in scripts. No exceptions.**
- Domain: docfast.dev
- Server: Hetzner CAX11, 167.235.156.214, SSH key /home/openclaw/.ssh/docfast (EU — Falkenstein/Nuremberg, Germany)
- Credentials: `/home/openclaw/.openclaw/workspace/.credentials/docfast.env`
- `HETZNER_API_TOKEN`, `STRIPE_SECRET_KEY`
- **NEVER read this file. Source it in scripts. No exceptions.**
## What "Done" Means