feat(diag): extend read-only channel to web-02 fw guest microVM #91

Merged
dominik.polakovics merged 1 commit from afk/88 into main 2026-06-04 13:36:26 +02:00

What

Extends the read-only diag channel (ADR-0005) into the web-02 fw guest microVM, so the dev agent can SSH in read-only to diagnose its services (Invidious + companion, Matrix synapse + mautrix bridges, MAS, n8n, phpldapadmin, mcp-forgejo, lab) instead of always falling back to a human.

The original channel only covered the bare-metal fleet; web-02 authorised root only (via utils/ssh-keys.nix) and had no diag user, so the agent's key was refused. The network path already worked (fw permits dev → vm-*, web-02 runs sshd on :22, split-horizon DNS resolves web-02.cloonar.com from dev) — only authorisation was missing.

Changes

  • hosts/fw/vms/web/default.nix — import the shared utils/modules/diag module. Reuses the already-committed diag pubkey, so no new SOPS secret is added (the private key already lives on dev at /run/secrets/diag-ssh-key).
  • utils/home-manager/diag-ssh.nix — add web-02 / web-02.cloonar.com to both matchBlocks so ssh web-02 from dev uses User diag + the diag identity rather than offering the agent's personal key.
  • utils/modules/diag/wrapper.sh — widen the denylist so the cat/head/tail/ls allowances can't read web-02's on-disk secrets: /var/lib/{matrix-synapse,mautrix-*,mas,n8n,zammad} (joining the existing /var/lib/postgresql) and the SSH host private keys at their real /persist/etc/ssh/ssh_host_*_key location — web-02 keeps host keys under the impermanence /persist tree, which the bare /etc/ssh/... rule misses. Deny still wins over allow; bare-metal behaviour is unchanged (additive denies only).
  • docs/adr/0005-… — dated amendment recording the extension and that the systemd-journal/adm journal-read tradeoff now also covers web-02's sensitive services.

Verification (done)

  • Pre-commit hook dry-builds all 6 hosts (touched utils/ → shared path), including fw, which builds web-02 with the diag module and dev with the updated client config.
  • nixpkgs-fmt --check clean on both changed .nix files.
  • Denylist exercised end-to-end against the actual wrapper.sh: the new paths (synapse, every mautrix bridge, mas, n8n, zammad, the /persist host keys) reject with exit 2; existing denies (/etc/ssh/ssh_host_*_key, /run/secrets/*, /var/lib/postgresql) still reject; *.pub, /etc/hostname, systemctl status, and journalctl -u invidious remain allowed.

Post-merge manual check (agent can't self-perform — prod SSH is human-gated)

After this deploys to web-02, from dev confirm read-only access works and secrets/mutations are refused:

ssh web-02 'systemctl status matrix-synapse'     # ✅ succeeds, read-only
ssh web-02 'journalctl -u invidious -n 50'       # ✅ succeeds (the diagnostic this unblocks)
ssh web-02 'cat /var/lib/mas/<secret>'           # ❌ refused (exit 2, denylist)
ssh web-02 'cat /var/lib/n8n/database.sqlite'    # ❌ refused
ssh web-02 'systemctl restart matrix-synapse'    # ❌ refused (mutating, not on allowlist)

Closes #88

## What Extends the read-only `diag` channel (ADR-0005) into the **web-02** fw guest microVM, so the dev agent can SSH in read-only to diagnose its services (Invidious + companion, Matrix synapse + mautrix bridges, MAS, n8n, phpldapadmin, mcp-forgejo, lab) instead of always falling back to a human. The original channel only covered the bare-metal fleet; web-02 authorised root only (via `utils/ssh-keys.nix`) and had no `diag` user, so the agent's key was refused. The network path already worked (fw permits `dev → vm-*`, web-02 runs sshd on :22, split-horizon DNS resolves `web-02.cloonar.com` from dev) — only **authorisation** was missing. ## Changes - **`hosts/fw/vms/web/default.nix`** — import the shared `utils/modules/diag` module. Reuses the already-committed diag pubkey, so **no new SOPS secret** is added (the private key already lives on dev at `/run/secrets/diag-ssh-key`). - **`utils/home-manager/diag-ssh.nix`** — add `web-02` / `web-02.cloonar.com` to both matchBlocks so `ssh web-02` from dev uses `User diag` + the diag identity rather than offering the agent's personal key. - **`utils/modules/diag/wrapper.sh`** — widen the denylist so the `cat`/`head`/`tail`/`ls` allowances can't read web-02's on-disk secrets: `/var/lib/{matrix-synapse,mautrix-*,mas,n8n,zammad}` (joining the existing `/var/lib/postgresql`) and the SSH host private keys at their real `/persist/etc/ssh/ssh_host_*_key` location — web-02 keeps host keys under the impermanence `/persist` tree, which the bare `/etc/ssh/...` rule misses. Deny still wins over allow; bare-metal behaviour is unchanged (additive denies only). - **`docs/adr/0005-…`** — dated amendment recording the extension and that the `systemd-journal`/`adm` journal-read tradeoff now also covers web-02's sensitive services. ## Verification (done) - Pre-commit hook dry-builds all 6 hosts (touched `utils/` → shared path), including fw, which builds web-02 with the diag module and dev with the updated client config. - `nixpkgs-fmt --check` clean on both changed `.nix` files. - Denylist exercised end-to-end against the actual `wrapper.sh`: the new paths (synapse, every mautrix bridge, mas, n8n, zammad, the `/persist` host keys) reject with exit 2; existing denies (`/etc/ssh/ssh_host_*_key`, `/run/secrets/*`, `/var/lib/postgresql`) still reject; `*.pub`, `/etc/hostname`, `systemctl status`, and `journalctl -u invidious` remain allowed. ## Post-merge manual check (agent can't self-perform — prod SSH is human-gated) After this deploys to web-02, from **dev** confirm read-only access works and secrets/mutations are refused: ``` ssh web-02 'systemctl status matrix-synapse' # ✅ succeeds, read-only ssh web-02 'journalctl -u invidious -n 50' # ✅ succeeds (the diagnostic this unblocks) ssh web-02 'cat /var/lib/mas/<secret>' # ❌ refused (exit 2, denylist) ssh web-02 'cat /var/lib/n8n/database.sqlite' # ❌ refused ssh web-02 'systemctl restart matrix-synapse' # ❌ refused (mutating, not on allowlist) ``` Closes #88
The diag channel (ADR-0005) only covered the bare-metal fleet, so every
service inside the web-02 fw guest microVM (Invidious + companion, Matrix
synapse + mautrix bridges, MAS, n8n, phpldapadmin, mcp-forgejo, lab) was
undiagnosable by the dev agent without falling back to a human (issue #88).

- import utils/modules/diag in hosts/fw/vms/web/default.nix — reuses the
  already-committed diag pubkey, so no new secret is added
- route web-02 / web-02.cloonar.com to User diag in the diag client ssh
  config so dev no longer offers the agent's personal key
- widen the wrapper denylist to keep web-02's on-disk secrets unreadable:
  /var/lib/{matrix-synapse,mautrix-*,mas,n8n,zammad} and the SSH host
  private keys at their real /persist/etc/ssh location (impermanence tree);
  deny still wins over allow, bare-metal behaviour unchanged
- amend ADR-0005 to record the extension and the journal-read tradeoff now
  covering web-02's sensitive services

Closes #88
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos!91
No description provided.