feat(logging): fan out grafana-alloy to remaining hosts + delete promtail (step 2) #122

Closed
opened 2026-06-07 14:33:21 +02:00 by dominik.polakovics · 1 comment

Step 2 of #118 (promtail→alloy migration). Do after the nas canary (step 1) is verified in Grafana.

Agent Brief

Category: enhancement

Summary: Once the nas canary proves the alloy config, switch the remaining four hosts (fw, mail, web-arm, amzebs-01) from utils/modules/promtail to utils/modules/alloy, and delete the now-unused promtail module.

Background / why: Completes the fleet migration started in step 1. The shared alloy module already exists and is verified on nas; the config is host-agnostic (it just reads the local journal), so the remaining hosts switch by swapping the import.

Tasks:

  • In each of hosts/{fw,mail,web-arm,amzebs-01}/configuration.nix: replace the ./utils/modules/promtail import with ./utils/modules/alloy.
  • Delete utils/modules/promtail (default.nix + secrets.yaml).
  • Remove the now-orphan utils/modules/promtail/... creation rule from .sops.yaml.

Acceptance:

  • pre-commit eval green for all hosts.
  • After deploy: all five hosts visible in Grafana Loki with the expected labels; promtail no longer running anywhere.

Note: fw and web-arm may still be on 25.11 with the temporary docker_29 pin — that's independent of this change. If their 26.05 bumps land first, even better. Part of #118.

*Step 2 of #118 (promtail→alloy migration). Do after the nas canary (step 1) is verified in Grafana.* ## Agent Brief **Category:** enhancement **Summary:** Once the nas canary proves the alloy config, switch the remaining four hosts (fw, mail, web-arm, amzebs-01) from `utils/modules/promtail` to `utils/modules/alloy`, and delete the now-unused promtail module. **Background / why:** Completes the fleet migration started in step 1. The shared alloy module already exists and is verified on nas; the config is host-agnostic (it just reads the local journal), so the remaining hosts switch by swapping the import. **Tasks:** - In each of `hosts/{fw,mail,web-arm,amzebs-01}/configuration.nix`: replace the `./utils/modules/promtail` import with `./utils/modules/alloy`. - Delete `utils/modules/promtail` (`default.nix` + `secrets.yaml`). - Remove the now-orphan `utils/modules/promtail/...` creation rule from `.sops.yaml`. **Acceptance:** - pre-commit eval green for all hosts. - After deploy: all five hosts visible in Grafana Loki with the expected labels; promtail no longer running anywhere. **Note:** fw and web-arm may still be on 25.11 with the temporary docker_29 pin — that's independent of this change. If their 26.05 bumps land first, even better. Part of #118.
Author
Owner

This was generated by AI during triage.

Triage → ready-for-agent

Step-2 fan-out is unblocked; prerequisites verified:

  • Step 1 (nas canary, #121 / PR #124) merged and confirmed live in Grafana.
  • Eval-unblock (docker_29, PR #123) is in main, so a shared-module change passes the pre-commit dry-build on fw/web-arm.
  • alloy-env secret already decrypts on all four targets. .sops.yaml's utils/modules/alloy rule mirrors promtail's recipient set exactly (web-arm, ldap-server-arm = mail, fw, nas, amzebs-01) — no secret rewiring needed.
  • services.alloy is present on 25.11. All four targets (fw, mail, web-arm, amzebs-01) are still on 25.11, but the module exists there, so the swap holds now and survives each host's later 26.05 bump.
  • No afk/122 claim — free to pick up.

Verify nuance (not caught by eval/build)

The four targets run grafana-alloy 1.12.2 (25.11); the nas canary proved 1.16.0 (26.05). config.alloy uses only GA-stable components (loki.source.journal, loki.process stages, loki.write, discovery.relabel, sys.env), so the version gap is low-risk — but config.alloy is shipped as a static environment.etc file, so neither the pre-commit eval nor the build parses it. Post-deploy verification must include at least one 25.11 host (e.g. mail or amzebs-01) in Grafana/Loki, not lean on the nas/1.16.0 precedent alone. Each host re-verifies alloy again at its 26.05 bump (#106/#108/#110/#112).

Reminder from the brief

When deleting utils/modules/promtail (default.nix + secrets.yaml), also remove its now-orphan creation rule from .sops.yaml (the utils/modules/promtail/... block) — keep the utils/modules/alloy rule.

> *This was generated by AI during triage.* ## Triage → ready-for-agent Step-2 fan-out is unblocked; prerequisites verified: - **Step 1 (nas canary, #121 / PR #124)** merged and confirmed live in Grafana. - **Eval-unblock (docker_29, PR #123)** is in `main`, so a shared-module change passes the pre-commit dry-build on fw/web-arm. - **`alloy-env` secret already decrypts on all four targets.** `.sops.yaml`'s `utils/modules/alloy` rule mirrors promtail's recipient set exactly (web-arm, ldap-server-arm = mail, fw, nas, amzebs-01) — no secret rewiring needed. - **`services.alloy` is present on 25.11.** All four targets (fw, mail, web-arm, amzebs-01) are still on 25.11, but the module exists there, so the swap holds now and survives each host's later 26.05 bump. - No `afk/122` claim — free to pick up. ### Verify nuance (not caught by eval/build) The four targets run **grafana-alloy 1.12.2** (25.11); the nas canary proved **1.16.0** (26.05). `config.alloy` uses only GA-stable components (`loki.source.journal`, `loki.process` stages, `loki.write`, `discovery.relabel`, `sys.env`), so the version gap is low-risk — **but** `config.alloy` is shipped as a static `environment.etc` file, so neither the pre-commit eval nor the build parses it. **Post-deploy verification must include at least one 25.11 host** (e.g. mail or amzebs-01) in Grafana/Loki, not lean on the nas/1.16.0 precedent alone. Each host re-verifies alloy again at its 26.05 bump (#106/#108/#110/#112). ### Reminder from the brief When deleting `utils/modules/promtail` (default.nix + secrets.yaml), also remove its now-orphan creation rule from `.sops.yaml` (the `utils/modules/promtail/...` block) — keep the `utils/modules/alloy` rule.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos#122
No description provided.