feat(logging): fan out grafana-alloy to remaining hosts, delete promtail #125

Merged
dominik.polakovics merged 1 commit from afk/122 into main 2026-06-07 18:09:12 +02:00

Step 2 of #118 (promtail→alloy migration). The nas canary (step 1, #121 / PR #124) is merged and confirmed live in Grafana, so this switches the remaining four hosts off promtail and removes the module.

Changes

  • Swap ./utils/modules/promtail./utils/modules/alloy in fw, mail, web-arm, amzebs-01 configuration.nix.
  • Delete utils/modules/promtail/ (default.nix + secrets.yaml).
  • Remove the orphan utils/modules/promtail/… creation rule from .sops.yaml; the utils/modules/alloy/… rule already covers the same recipient set.
  • Drop the now-false "nas is the canary / other four stay on promtail until step 2" framing from the nas import and the alloy module header, and refresh the CLAUDE.md module list (promtail → alloy).

Out of scope (intentionally untouched)

  • hosts/web-arm/modules/loki.nix and the promtail-nginx-password secret: that is the server-side Loki nginx basic-auth file. Alloy clients still authenticate to loki.cloonar.com as promtail@cloonar.com, validated against it, so it stays. Renaming would require re-encrypting secrets and is unrelated to this swap.

Verification

  • Pre-commit dry-build (eval) green for the whole fleet — amzebs-01 fw mail nas nb web-arm all OK (.sops.yaml + utils/ are touched, so the hook builds every host).
  • Post-deploy check (not catchable by eval or build): config.alloy ships as a static environment.etc file, so neither the pre-commit eval nor a build parses it. After deploy, confirm at least one 25.11 host (e.g. mail or amzebs-01) shows up in Grafana/Loki with the expected labels — the canary only proved grafana-alloy 1.16.0 (26.05); these four run 1.12.2 (25.11). The config uses only GA-stable components, so the version gap is low-risk. Also confirm promtail is no longer running anywhere.

Closes #122

Step 2 of #118 (promtail→alloy migration). The nas canary (step 1, #121 / PR #124) is merged and confirmed live in Grafana, so this switches the remaining four hosts off promtail and removes the module. ## Changes - Swap `./utils/modules/promtail` → `./utils/modules/alloy` in `fw`, `mail`, `web-arm`, `amzebs-01` `configuration.nix`. - Delete `utils/modules/promtail/` (`default.nix` + `secrets.yaml`). - Remove the orphan `utils/modules/promtail/…` creation rule from `.sops.yaml`; the `utils/modules/alloy/…` rule already covers the same recipient set. - Drop the now-false "nas is the canary / other four stay on promtail until step 2" framing from the nas import and the alloy module header, and refresh the `CLAUDE.md` module list (promtail → alloy). ## Out of scope (intentionally untouched) - `hosts/web-arm/modules/loki.nix` and the `promtail-nginx-password` secret: that is the **server-side** Loki nginx basic-auth file. Alloy clients still authenticate to `loki.cloonar.com` as `promtail@cloonar.com`, validated against it, so it stays. Renaming would require re-encrypting secrets and is unrelated to this swap. ## Verification - Pre-commit dry-build (eval) green for the whole fleet — `amzebs-01 fw mail nas nb web-arm` all `OK` (`.sops.yaml` + `utils/` are touched, so the hook builds every host). - **Post-deploy check (not catchable by eval or build):** `config.alloy` ships as a static `environment.etc` file, so neither the pre-commit eval nor a build parses it. After deploy, confirm at least one **25.11** host (e.g. `mail` or `amzebs-01`) shows up in Grafana/Loki with the expected labels — the canary only proved grafana-alloy 1.16.0 (26.05); these four run 1.12.2 (25.11). The config uses only GA-stable components, so the version gap is low-risk. Also confirm promtail is no longer running anywhere. Closes #122
Step 2 of #118. The nas canary (step 1, #121) proved the shared alloy
module, so switch the remaining four hosts off utils/modules/promtail and
remove the now-unused module.

- Swap the import to ./utils/modules/alloy in fw, mail, web-arm and
  amzebs-01 configuration.nix.
- Delete utils/modules/promtail (default.nix + secrets.yaml) and its now
  orphan creation rule in .sops.yaml; alloy's rule already mirrors the same
  recipient set.
- Drop the obsolete 'nas is the canary / others on promtail until step 2'
  framing from the nas import and the alloy module header, and refresh the
  CLAUDE.md module list.

The web-arm Loki server's promtail-nginx-password is the server-side
basic-auth file and is untouched; alloy clients still authenticate as
promtail@cloonar.com.

Closes #122
Author
Owner

This was generated by AI while landing a PR.

Validation: PASS

Verification signal relied on: the repo's commit-time gate — the pre-commit dry-build (eval) is green for all six hosts (this PR touches .sops.yaml + utils/, so the hook builds every host). Per the repo's gate model I did not re-run it.

Independently verified the one dimension eval/build cannot check — sops decryptability at deploy time. #124 exercised the alloy module only on nas, so this is the first time fw/mail/web-arm/amzebs-01 decrypt alloy-env. The age recipients embedded in utils/modules/alloy/secrets.yaml are byte-for-byte identical to the old utils/modules/promtail/secrets.yaml, and cover all five alloy hosts:

host key (anchor) recipient?
fw &fw
mail &ldap-server-arm (legacy anchor name)
web-arm &web-arm
amzebs-01 &amzebs-01
nas &nas

Other checks:

  • No dangling references to the deleted module — no host still imports ./utils/modules/promtail. The only remaining promtail strings are historical ADRs (immutable), the server-side Loki nginx basic-auth on web-arm (intentionally out of scope; promtail@cloonar.com is the Loki credential username and still matches), and explanatory comments in the alloy module.
  • .sops.yaml: removing the utils/modules/promtail/… creation rule is correct orphan cleanup (the path is deleted); the utils/modules/alloy/… rule remains and governs the kept secret.
  • No derivation src / *Hash change, so the eval-only-gate vendorHash caveat does not apply.
  • Conventions: Conventional-Commits title; no in-place secrets edit (only whole-file deletion of the removed module's secret); no system.stateVersion change; modules imported by explicit path.
  • AFK contract: head afk/122, body carries Closes #122, the branch number matches the issue, and #122 is the open step-2 target of #118.

Residual (post-deploy, already documented in the PR): the static config.alloy is parsed only at runtime (unchanged from #124, already proven on nas), and these four hosts run alloy 1.12.2 (25.11) vs the canary's 1.16.0 (26.05). The config uses only GA-stable components, so the version gap is low-risk. After deploy, confirm a 25.11 host (mail/amzebs-01) appears in Grafana/Loki with the expected labels and that promtail is no longer running anywhere.

mergeable ✔ — no conflict resolution needed.

> *This was generated by AI while landing a PR.* **Validation: PASS** ✅ **Verification signal relied on:** the repo's commit-time gate — the pre-commit dry-build (eval) is green for all six hosts (this PR touches `.sops.yaml` + `utils/`, so the hook builds every host). Per the repo's gate model I did not re-run it. **Independently verified the one dimension eval/build cannot check — sops decryptability at deploy time.** `#124` exercised the alloy module only on nas, so this is the first time fw/mail/web-arm/amzebs-01 decrypt `alloy-env`. The age recipients embedded in `utils/modules/alloy/secrets.yaml` are **byte-for-byte identical** to the old `utils/modules/promtail/secrets.yaml`, and cover all five alloy hosts: | host | key (anchor) | recipient? | |---|---|---| | fw | `&fw` | ✔ | | mail | `&ldap-server-arm` (legacy anchor name) | ✔ | | web-arm | `&web-arm` | ✔ | | amzebs-01 | `&amzebs-01` | ✔ | | nas | `&nas` | ✔ | **Other checks:** - No dangling references to the deleted module — no host still imports `./utils/modules/promtail`. The only remaining `promtail` strings are historical ADRs (immutable), the *server-side* Loki nginx basic-auth on web-arm (intentionally out of scope; `promtail@cloonar.com` is the Loki credential username and still matches), and explanatory comments in the alloy module. - `.sops.yaml`: removing the `utils/modules/promtail/…` creation rule is correct orphan cleanup (the path is deleted); the `utils/modules/alloy/…` rule remains and governs the kept secret. - No derivation `src` / `*Hash` change, so the eval-only-gate vendorHash caveat does not apply. - Conventions: Conventional-Commits title; no in-place secrets edit (only whole-file deletion of the removed module's secret); no `system.stateVersion` change; modules imported by explicit path. - AFK contract: head `afk/122`, body carries `Closes #122`, the branch number matches the issue, and #122 is the open step-2 target of #118. **Residual (post-deploy, already documented in the PR):** the static `config.alloy` is parsed only at runtime (unchanged from #124, already proven on nas), and these four hosts run alloy 1.12.2 (25.11) vs the canary's 1.16.0 (26.05). The config uses only GA-stable components, so the version gap is low-risk. After deploy, confirm a 25.11 host (mail/amzebs-01) appears in Grafana/Loki with the expected labels and that promtail is no longer running anywhere. `mergeable ✔` — no conflict resolution needed.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos!125
No description provided.