150 lines
5 KiB
Markdown
150 lines
5 KiB
Markdown
# K3s Cluster Restore Guide
|
|
|
|
## Prerequisites
|
|
- Fresh Ubuntu 24.04 server (CAX11 ARM64, Hetzner)
|
|
- Borg backup repo access: `ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster`
|
|
- SSH key for Storage Box: `/root/.ssh/id_ed25519` (fingerprint: `k3s-mgr-backup`)
|
|
- Borg passphrase (stored in password manager)
|
|
|
|
## 1. Install Borg & Mount Backup
|
|
|
|
```bash
|
|
apt update && apt install -y borgbackup python3-pyfuse3
|
|
export BORG_RSH='ssh -p 23 -i /root/.ssh/id_ed25519'
|
|
export BORG_PASSPHRASE='<from password manager>'
|
|
|
|
# List available archives
|
|
borg list ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster
|
|
|
|
# Mount latest archive
|
|
mkdir -p /mnt/borg
|
|
borg mount ssh://u149513-sub10@u149513-sub10.your-backup.de/./k3s-cluster::<archive-name> /mnt/borg
|
|
```
|
|
|
|
## 2. Install K3s (Control Plane)
|
|
|
|
```bash
|
|
# Restore K3s token (needed for worker rejoin)
|
|
mkdir -p /var/lib/rancher/k3s/server
|
|
cp /mnt/borg/var/lib/rancher/k3s/server/token /var/lib/rancher/k3s/server/token
|
|
|
|
# Install K3s server (tainted, no workloads on mgr)
|
|
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.4+k3s1" sh -s - server \
|
|
--node-taint CriticalAddonsOnly=true:NoSchedule \
|
|
--flannel-iface enp7s0 \
|
|
--cluster-cidr 10.42.0.0/16 \
|
|
--service-cidr 10.43.0.0/16 \
|
|
--tls-san 188.34.201.101 \
|
|
--token "$(cat /var/lib/rancher/k3s/server/token)"
|
|
```
|
|
|
|
## 3. Rejoin Worker Nodes
|
|
|
|
On each worker (k3s-w1: 159.69.23.121, k3s-w2: 46.225.169.60):
|
|
```bash
|
|
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.4+k3s1" \
|
|
K3S_URL=https://188.34.201.101:6443 \
|
|
K3S_TOKEN="<token from step 2>" \
|
|
sh -s - agent --flannel-iface enp7s0
|
|
```
|
|
|
|
## 4. Restore K3s Manifests & Config
|
|
|
|
```bash
|
|
cp /mnt/borg/etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.yaml
|
|
cp -r /mnt/borg/var/lib/rancher/k3s/server/manifests/* /var/lib/rancher/k3s/server/manifests/
|
|
```
|
|
|
|
## 5. Install Operators
|
|
|
|
```bash
|
|
# cert-manager
|
|
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.2/cert-manager.yaml
|
|
|
|
# CloudNativePG
|
|
kubectl apply --server-side -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.1.yaml
|
|
|
|
# Traefik (via Helm)
|
|
helm repo add traefik https://traefik.github.io/charts
|
|
helm install traefik traefik/traefik -n kube-system \
|
|
--set deployment.kind=DaemonSet \
|
|
--set nodeSelector."kubernetes\.io/os"=linux \
|
|
--set tolerations={}
|
|
```
|
|
|
|
## 6. Restore K8s Manifests (Namespaces, Secrets, Services)
|
|
|
|
```bash
|
|
# Apply namespace manifests from backup
|
|
for ns in postgres docfast docfast-staging snapapi snapapi-staging cert-manager; do
|
|
kubectl apply -f /mnt/borg/var/backup/manifests/${ns}.yaml
|
|
done
|
|
```
|
|
|
|
**Review before applying** — manifests contain secrets and may reference node-specific IPs.
|
|
|
|
## 7. Restore CNPG PostgreSQL
|
|
|
|
Option A: Let CNPG create fresh cluster, then restore from SQL dumps:
|
|
```bash
|
|
# After CNPG cluster is healthy:
|
|
PRIMARY=$(kubectl -n postgres get pods -l cnpg.io/cluster=main-db,role=primary -o name | head -1)
|
|
|
|
for db in docfast docfast_staging snapapi snapapi_staging; do
|
|
# Create database if needed
|
|
kubectl -n postgres exec $PRIMARY -- psql -U postgres -c "CREATE DATABASE ${db};" 2>/dev/null || true
|
|
# Restore dump
|
|
kubectl -n postgres exec -i $PRIMARY -- psql -U postgres "${db}" < /mnt/borg/var/backup/postgresql/${db}.sql
|
|
done
|
|
```
|
|
|
|
Option B: Restore CNPG from backup (if configured with barman/S3 — not currently used).
|
|
|
|
## 8. Restore Application Deployments
|
|
|
|
Deployments are in the namespace manifests, but it's cleaner to redeploy from CI/CD:
|
|
```bash
|
|
# DocFast: push to main (staging) or tag v* (prod) on Forgejo
|
|
# SnapAPI: same workflow
|
|
```
|
|
|
|
Or apply from backup manifests:
|
|
```bash
|
|
kubectl apply -f /mnt/borg/var/backup/manifests/docfast.yaml
|
|
kubectl apply -f /mnt/borg/var/backup/manifests/snapapi.yaml
|
|
```
|
|
|
|
## 9. Verify
|
|
|
|
```bash
|
|
kubectl get nodes # All 3 nodes Ready
|
|
kubectl get pods -A # All pods Running
|
|
kubectl -n postgres get cluster main-db # CNPG healthy
|
|
curl -k https://docfast.dev/health # App responding
|
|
```
|
|
|
|
## 10. Post-Restore Checklist
|
|
|
|
- [ ] DNS: `docfast.dev` A record → 46.225.37.135 (Hetzner LB)
|
|
- [ ] Hetzner LB targets updated (w1 + w2 on ports 80/443)
|
|
- [ ] Let's Encrypt certs issued (cert-manager auto-renews)
|
|
- [ ] Stripe webhook endpoint updated if IP changed
|
|
- [ ] HA spread constraints re-applied (CoreDNS 3 replicas, CNPG operator 2 replicas, PgBouncer anti-affinity)
|
|
- [ ] Borg backup cron re-enabled: `30 3 * * * /root/k3s-backup.sh`
|
|
- [ ] Verify backup works: `borg-backup && borg-list`
|
|
|
|
## Cluster Info
|
|
|
|
| Node | IP (Public) | IP (Private) | Role |
|
|
|------|------------|--------------|------|
|
|
| k3s-mgr | 188.34.201.101 | 10.0.1.5 | Control plane (tainted) |
|
|
| k3s-w1 | 159.69.23.121 | 10.0.1.6 | Worker |
|
|
| k3s-w2 | 46.225.169.60 | 10.0.1.7 | Worker |
|
|
|
|
| Resource | Detail |
|
|
|----------|--------|
|
|
| Hetzner LB | ID 5834131, IP 46.225.37.135 |
|
|
| Private Network | 10.0.0.0/16, ID 11949384 |
|
|
| Firewall | coolify-fw, ID 10553199 |
|
|
| Storage Box | u149513-sub10.your-backup.de:23 |
|
|
|