feat(web-arm): fetch PowerSync sync rules live from Cloonar/fit #96

Merged
dominik.polakovics merged 1 commit from afk/95 into main 2026-06-04 22:39:53 +02:00

What

Source the deployed PowerSync sync rules live from Cloonar/fit:powersync/sync-rules.yaml@master instead of the vendored, hand-copied hosts/web-arm/modules/powersync/sync-rules.yaml. App-team rules changes now reach prod within ~5 min with no nixos edit; the fit PR review is the gate. service.yaml stays nix-rendered (it carries the sops DSN / storage / JWKS).

How

  • powersync-syncrules-fetch.service (root oneshot) + .timer every 5 min, also runs at boot. Fetches over the Forgejo raw API (GET /api/v1/repos/Cloonar/fit/raw/powersync/sync-rules.yaml?ref=master, Authorization: token <T>) with the read-only reptide-powersync-syncrules-token, fed as a systemd credential (the fueltide-backup pattern), not an on-disk env file.
  • Pre-swap gate: HTTP 200 → non-empty → parses as YAML → has the bucket_definitions top-level key (the last check rejects a Forgejo JSON error body, which is itself valid YAML).
  • Swap only if changed, snapshot the prior file to .prev, then systemctl try-restart podman-powersync.service (no-op when the container isn't running, e.g. at boot — PowerSync re-reads rules only on restart).
  • Post-restart rollback: poll /probes/liveness; if PowerSync doesn't come healthy, restore .prev, restart, and exit non-zero → page.
  • Boot / fresh host: podman-powersync is After=/Requires= the fetch unit. A git.cloonar.com blip keeps the persisted file (exit 0, never blocks the container); only a truly fresh host with no usable file hard-fails — that blocks startup and pages.
  • Mounts split: service.yaml bind-mounted from the nix store, sync-rules.yaml from /var/lib/powersync — two file mounts so the mutable file isn't nested inside a read-only store mount. In-container paths unchanged.
  • Vendored sync-rules.yaml deleted, no seed file.
  • ADR-0015 added (amends ADR-0012); ADR-0012 gets a forward-pointer.

Decision raised (per the issue): paging via OnFailure=, not a Grafana metric rule

The issue listed two paging options. I used OnFailure= → Pushover (priority 1 = cp_dominik_normal) rather than a Grafana rule on node_systemd_unit_state{...,state="failed"} == 1, because web-arm's local node-exporter does not enable the systemd collector — only the shared utils/modules/victoriametrics does; web-arm rolls its own victoriametrics.nix with just services.prometheus.exporters.node.enable = true. That series is therefore never scraped for web-arm:9100, so the proposed Grafana rule would have silently never fired. OnFailure= needs no metrics pipeline and fires immediately, still reusing the existing Pushover account. (The journeyapps/powersync-service image also exposes no documented cheap sync-rules validate subcommand, so the YAML/structure gate + post-restart liveness rollback are the validation, as the issue allows.)

Verification

  • Dry-build: the pre-commit hook (scripts/test-configuration web-arm) passes — :: web-arm OK. nixpkgs-fmt --check clean; nix-instantiate --parse clean.
  • Forgejo route confirmed live against the running Forgejo 11.0.14: GET /api/v1/repos/Cloonar/nixos/raw/CONTEXT.md?ref=main returns the raw file, confirming the /raw/{path}?ref= shape; the Authorization: token <T> format is the documented Forgejo API auth.
  • Deployed-host only (the dry-build cannot reach these — same caveat ADR-0012 records): the HTTPS fetch + token auth against the private fit repo, the change-detection swap + container restart, the liveness rollback, and the Pushover page all exercise only on web-arm after deploy. Suggested post-deploy checks:
    • systemctl start powersync-syncrules-fetch.service && journalctl -u powersync-syncrules-fetch.service — confirm a 200 fetch and either "unchanged" or a swap + restart.
    • cat /var/lib/powersync/sync-rules.yaml matches fit@master; a second run with no upstream change is a silent no-op.
    • Force a failure (e.g. clear the token) and confirm the Pushover page arrives and last-good is retained.

Closes #95

## What Source the deployed PowerSync sync rules live from `Cloonar/fit:powersync/sync-rules.yaml@master` instead of the vendored, hand-copied `hosts/web-arm/modules/powersync/sync-rules.yaml`. App-team rules changes now reach prod within ~5 min with no nixos edit; the fit PR review is the gate. `service.yaml` stays nix-rendered (it carries the sops DSN / storage / JWKS). ## How - **`powersync-syncrules-fetch.service`** (root oneshot) + **`.timer`** every 5 min, also runs at boot. Fetches over the **Forgejo raw API** (`GET /api/v1/repos/Cloonar/fit/raw/powersync/sync-rules.yaml?ref=master`, `Authorization: token <T>`) with the read-only `reptide-powersync-syncrules-token`, fed as a **systemd credential** (the `fueltide-backup` pattern), not an on-disk env file. - **Pre-swap gate:** HTTP 200 → non-empty → parses as YAML → has the `bucket_definitions` top-level key (the last check rejects a Forgejo JSON error body, which is itself valid YAML). - **Swap only if changed**, snapshot the prior file to `.prev`, then `systemctl try-restart podman-powersync.service` (no-op when the container isn't running, e.g. at boot — PowerSync re-reads rules only on restart). - **Post-restart rollback:** poll `/probes/liveness`; if PowerSync doesn't come healthy, restore `.prev`, restart, and **exit non-zero → page**. - **Boot / fresh host:** `podman-powersync` is `After=`/`Requires=` the fetch unit. A `git.cloonar.com` blip keeps the persisted file (`exit 0`, never blocks the container); only a truly fresh host with no usable file hard-fails — that blocks startup and pages. - **Mounts split:** `service.yaml` bind-mounted from the nix store, `sync-rules.yaml` from `/var/lib/powersync` — two file mounts so the mutable file isn't nested inside a read-only store mount. In-container paths unchanged. - **Vendored `sync-rules.yaml` deleted**, no seed file. - **ADR-0015** added (amends ADR-0012); ADR-0012 gets a forward-pointer. ## Decision raised (per the issue): paging via `OnFailure=`, not a Grafana metric rule The issue listed two paging options. I used **`OnFailure=` → Pushover** (priority 1 = `cp_dominik_normal`) rather than a Grafana rule on `node_systemd_unit_state{...,state="failed"} == 1`, because **web-arm's local node-exporter does not enable the `systemd` collector** — only the shared `utils/modules/victoriametrics` does; web-arm rolls its own `victoriametrics.nix` with just `services.prometheus.exporters.node.enable = true`. That series is therefore never scraped for `web-arm:9100`, so the proposed Grafana rule would have silently never fired. `OnFailure=` needs no metrics pipeline and fires immediately, still reusing the existing Pushover account. (The `journeyapps/powersync-service` image also exposes no documented cheap sync-rules validate subcommand, so the YAML/structure gate + post-restart liveness rollback are the validation, as the issue allows.) ## Verification - **Dry-build:** the pre-commit hook (`scripts/test-configuration web-arm`) passes — `:: web-arm OK`. `nixpkgs-fmt --check` clean; `nix-instantiate --parse` clean. - **Forgejo route confirmed live** against the running Forgejo **11.0.14**: `GET /api/v1/repos/Cloonar/nixos/raw/CONTEXT.md?ref=main` returns the raw file, confirming the `/raw/{path}?ref=` shape; the `Authorization: token <T>` format is the documented Forgejo API auth. - **Deployed-host only** (the dry-build cannot reach these — same caveat ADR-0012 records): the HTTPS fetch + token auth against the private `fit` repo, the change-detection swap + container restart, the liveness rollback, and the Pushover page all exercise only on web-arm after deploy. Suggested post-deploy checks: - `systemctl start powersync-syncrules-fetch.service && journalctl -u powersync-syncrules-fetch.service` — confirm a 200 fetch and either "unchanged" or a swap + restart. - `cat /var/lib/powersync/sync-rules.yaml` matches fit@master; a second run with no upstream change is a silent no-op. - Force a failure (e.g. clear the token) and confirm the Pushover page arrives and last-good is retained. Closes #95
powersync.reptide.eu mounted a vendored sync-rules.yaml hand-copied from
the app on every change. Source it live instead: a root oneshot + 5-min
timer (powersync-syncrules-fetch) pulls
Cloonar/fit:powersync/sync-rules.yaml@master over the Forgejo raw API with
a read-only token, validates it (HTTP 200, non-empty, parses as YAML, has
bucket_definitions), and swaps /var/lib/powersync/sync-rules.yaml only when
it changed before try-restarting the container. It snapshots the prior file
to .prev and, if PowerSync fails liveness after the swap, restores
last-good, restarts, and exits non-zero. On a git.cloonar.com blip it keeps
the persisted file (exit 0); it hard-fails only on a truly fresh host with
no usable file.

- split the container mounts: service.yaml from the store, sync-rules.yaml
  from /var/lib/powersync (in-container paths unchanged)
- order podman-powersync after + requires the fetch unit; it runs at boot
- page on fetch failure via OnFailure -> Pushover (priority 1), because
  web-arm's local node-exporter does not enable the systemd collector, so a
  node_systemd_unit_state Grafana rule would never fire here
- delete the vendored sync-rules.yaml; no seed file
- ADR-0015 records this and amends ADR-0012

The token is wired via sops.secrets.reptide-powersync-syncrules-token
(already created) as a systemd credential; no secrets file is edited. The
HTTPS fetch, restart, liveness rollback, and Pushover page verify only on
the deployed host.

Closes #95
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos!96
No description provided.