# K3s Cluster Restore Guide ## Prerequisites - Fresh Ubuntu 24.04 server (CAX11 ARM64, Hetzner) - Borg backup repo access: `ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster` - SSH key for Storage Box: `/root/.ssh/id_ed25519` (fingerprint: `k3s-mgr-backup`) - Borg passphrase (stored in password manager) ## 1. Install Borg & Mount Backup ```bash apt update && apt install -y borgbackup python3-pyfuse3 export BORG_RSH='ssh -p 23 -i /root/.ssh/id_ed25519' export BORG_PASSPHRASE='' # List available archives borg list ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster # Mount latest archive mkdir -p /mnt/borg borg mount ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster:: /mnt/borg ``` ## 2. Install K3s (Control Plane) ```bash # Restore K3s token (needed for worker rejoin) mkdir -p /var/lib/rancher/k3s/server cp /mnt/borg/var/lib/rancher/k3s/server/token /var/lib/rancher/k3s/server/token # Install K3s server (tainted, no workloads on mgr) curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.4+k3s1" sh -s - server \ --node-taint CriticalAddonsOnly=true:NoSchedule \ --flannel-iface enp7s0 \ --cluster-cidr 10.42.0.0/16 \ --service-cidr 10.43.0.0/16 \ --tls-san 188.34.201.101 \ --token "$(cat /var/lib/rancher/k3s/server/token)" ``` ## 3. Rejoin Worker Nodes On each worker (k3s-w1: 159.69.23.121, k3s-w2: 46.225.169.60): ```bash curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.4+k3s1" \ K3S_URL=https://188.34.201.101:6443 \ K3S_TOKEN="" \ sh -s - agent --flannel-iface enp7s0 ``` ## 4. Restore K3s Manifests & Config ```bash cp /mnt/borg/etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.yaml cp -r /mnt/borg/var/lib/rancher/k3s/server/manifests/* /var/lib/rancher/k3s/server/manifests/ ``` ## 5. Install Operators ```bash # cert-manager kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.2/cert-manager.yaml # CloudNativePG kubectl apply --server-side -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.1.yaml # Traefik (via Helm) helm repo add traefik https://traefik.github.io/charts helm install traefik traefik/traefik -n kube-system \ --set deployment.kind=DaemonSet \ --set nodeSelector."kubernetes\.io/os"=linux \ --set tolerations={} ``` ## 6. Restore K8s Manifests (Namespaces, Secrets, Services) ```bash # Apply namespace manifests from backup for ns in postgres docfast docfast-staging snapapi snapapi-staging cert-manager; do kubectl apply -f /mnt/borg/var/backup/manifests/${ns}.yaml done ``` **Review before applying** — manifests contain secrets and may reference node-specific IPs. ## 7. Restore CNPG PostgreSQL Option A: Let CNPG create fresh cluster, then restore from SQL dumps: ```bash # After CNPG cluster is healthy: PRIMARY=$(kubectl -n postgres get pods -l cnpg.io/cluster=main-db,role=primary -o name | head -1) for db in docfast docfast_staging snapapi snapapi_staging; do # Create database if needed kubectl -n postgres exec $PRIMARY -- psql -U postgres -c "CREATE DATABASE ${db};" 2>/dev/null || true # Restore dump kubectl -n postgres exec -i $PRIMARY -- psql -U postgres "${db}" < /mnt/borg/var/backup/postgresql/${db}.sql done ``` Option B: Restore CNPG from backup (if configured with barman/S3 — not currently used). ## 8. Restore Application Deployments Deployments are in the namespace manifests, but it's cleaner to redeploy from CI/CD: ```bash # DocFast: push to main (staging) or tag v* (prod) on Forgejo # SnapAPI: same workflow ``` Or apply from backup manifests: ```bash kubectl apply -f /mnt/borg/var/backup/manifests/docfast.yaml kubectl apply -f /mnt/borg/var/backup/manifests/snapapi.yaml ``` ## 9. Verify ```bash kubectl get nodes # All 3 nodes Ready kubectl get pods -A # All pods Running kubectl -n postgres get cluster main-db # CNPG healthy curl -k https://docfast.dev/health # App responding ``` ## 10. Post-Restore Checklist - [ ] DNS: `docfast.dev` A record → 46.225.37.135 (Hetzner LB) - [ ] Hetzner LB targets updated (w1 + w2 on ports 80/443) - [ ] Let's Encrypt certs issued (cert-manager auto-renews) - [ ] Stripe webhook endpoint updated if IP changed - [ ] HA spread constraints re-applied (CoreDNS 3 replicas, CNPG operator 2 replicas, PgBouncer anti-affinity) - [ ] Borg backup cron re-enabled: `30 3 * * * /root/k3s-backup.sh` - [ ] Verify backup works: `borg-backup && borg-list` ## Cluster Info | Node | IP (Public) | IP (Private) | Role | |------|------------|--------------|------| | k3s-mgr | 188.34.201.101 | 10.0.1.5 | Control plane (tainted) | | k3s-w1 | 159.69.23.121 | 10.0.1.6 | Worker | | k3s-w2 | 46.225.169.60 | 10.0.1.7 | Worker | | Resource | Detail | |----------|--------| | Hetzner LB | ID 5834131, IP 46.225.37.135 | | Private Network | 10.0.0.0/16, ID 11949384 | | Firewall | coolify-fw, ID 10553199 | | Storage Box | u149513-sub10.your-backup.de:23 |