Cloonar/nixos

Fork 0

Auto-update invidious-companion (+ decide Invidious) and harden companion env-file generation #89

New issue

Closed

opened 2026-06-04 09:40:24 +02:00 by dominik.polakovics · 1 comment

dominik.polakovics commented

2026-06-04 09:40:24 +02:00

Owner

Background

Diagnosed 2026-06-04: Yattee couldn't play videos; invidious-companion had been failing PO-token validation because its :latest image was frozen at a 2026-03-04 build for ~3 months. --pull=newer only re-pulls on container (re)creation, and nothing ever triggered that (no reboot / config change / timer). Manually pulling the current build (2026-06-03) + restarting restored playback (verified: /api/v1/videos/… returns adaptiveFormats, no error). That fix is runtime-only / uncommitted — this issue makes it permanent.

File: hosts/fw/vms/web/invidious.nix. Related: #88 (agent SSH access to the VMs).

1. Auto-update the companion container

Replace the ineffective --pull=newer with real auto-update:

Add label io.containers.autoupdate=registry to virtualisation.oci-containers.containers.invidious-companion.
Enable podman's auto-update timer (podman-auto-update.timer → runs podman auto-update daily). Note: there is no single virtualisation.podman.autoUpdate.enable option — wire the systemd timer/service (or enable podman's packaged podman-auto-update.timer).
Recommended: set virtualisation.oci-containers.containers.invidious-companion.podman.sdnotify = "healthy" + a container healthcheck, so podman auto-update --rollback reverts automatically if a new :latest is broken.

2. (REQUIRED — must land with #1) Harden the companion env file

Today the env file (PORT/HOST/SERVER_SECRET_KEY) is written to volatile /run/invidious-companion/env by a boot-only oneshot + RemainAfterExit=true generator. During the last uptime the entire /run/invidious-companion/ directory vanished, so the restart failed with:

Error: parsing file "/run/invidious-companion/env": open /run/invidious-companion/env: no such file or directory

(Had to recreate the dir + env by hand.) This must be fixed before/with auto-update: podman auto-update restarts the container on every pull, so without this fix every auto-update would brick companion exactly as observed.

Fix: regenerate the env on every companion start (drop RemainAfterExit, or move generation into the container unit's ExecStartPre/preStart) and back it with systemd RuntimeDirectory=invidious-companion (+ RuntimeDirectoryPreserve) instead of a tmpfiles dir in /run. Then no restart can fail on a missing env file regardless of what cleans /run. (Exact cause of the dir's disappearance unconfirmed — likely a tmpfiles --remove/clean or RuntimeDirectory teardown — but the fix is robust either way.)

3. Auto-update Invidious itself — needs a decision

Invidious is not a container here — it's the native services.invidious nixpkgs module (which also wires http3-ytproxy, the nginx vhost, the DB, and admin-user init). So container auto-update doesn't apply. Options:

(a) Keep native, track the channel — Invidious updates when the NixOS channel file is bumped + bento rebuild. Lowest risk, keeps module integration, but updates are manual and may lag upstream (and Invidious, like companion, needs frequent updates to track YouTube).
(b) Containerize Invidious (quay.io/invidious/invidious:latest) for the same auto-update treatment as companion. True parity + fastest YouTube-tracking, but a significant migration: reimplement DB/nginx/TLS/http3-ytproxy/admin-init/companion wiring that the module does today, plus Postgres data migration.

Recommendation: ship #1+#2 now (clear, low-risk win); decide #3 separately — likely (a) + a discipline of regular channel bumps, unless we commit to the (b) migration.

Acceptance

Companion image updates automatically (verify the timer pulls a newer digest over time).
Companion survives restart / reboot / auto-update with no manual env recreation.
A decision is recorded for the Invidious update strategy (and implemented if (b)).

## Background Diagnosed **2026-06-04**: Yattee couldn't play videos; `invidious-companion` had been failing PO-token validation because its `:latest` image was **frozen at a 2026-03-04 build for ~3 months**. `--pull=newer` only re-pulls on container (re)creation, and nothing ever triggered that (no reboot / config change / timer). Manually pulling the current build (2026-06-03) + restarting restored playback (verified: `/api/v1/videos/…` returns `adaptiveFormats`, no error). That fix is **runtime-only / uncommitted** — this issue makes it permanent. File: `hosts/fw/vms/web/invidious.nix`. Related: #88 (agent SSH access to the VMs). ## 1. Auto-update the companion container Replace the ineffective `--pull=newer` with real auto-update: - Add label `io.containers.autoupdate=registry` to `virtualisation.oci-containers.containers.invidious-companion`. - Enable podman's auto-update timer (`podman-auto-update.timer` → runs `podman auto-update` daily). Note: there is **no** single `virtualisation.podman.autoUpdate.enable` option — wire the systemd timer/service (or enable podman's packaged `podman-auto-update.timer`). - Recommended: set `virtualisation.oci-containers.containers.invidious-companion.podman.sdnotify = "healthy"` + a container healthcheck, so `podman auto-update --rollback` reverts automatically if a new `:latest` is broken. ## 2. (REQUIRED — must land with #1) Harden the companion env file Today the env file (`PORT`/`HOST`/`SERVER_SECRET_KEY`) is written to volatile `/run/invidious-companion/env` by a **boot-only** `oneshot` + `RemainAfterExit=true` generator. During the last uptime the **entire `/run/invidious-companion/` directory vanished**, so the restart failed with: ``` Error: parsing file "/run/invidious-companion/env": open /run/invidious-companion/env: no such file or directory ``` (Had to recreate the dir + env by hand.) **This must be fixed before/with auto-update**: `podman auto-update` restarts the container on every pull, so without this fix every auto-update would brick companion exactly as observed. Fix: regenerate the env on **every** companion start (drop `RemainAfterExit`, or move generation into the container unit's `ExecStartPre`/`preStart`) and back it with systemd `RuntimeDirectory=invidious-companion` (+ `RuntimeDirectoryPreserve`) instead of a tmpfiles dir in `/run`. Then no restart can fail on a missing env file regardless of what cleans `/run`. (Exact cause of the dir's disappearance unconfirmed — likely a tmpfiles `--remove`/clean or `RuntimeDirectory` teardown — but the fix is robust either way.) ## 3. Auto-update Invidious itself — needs a decision Invidious is **not a container** here — it's the native `services.invidious` nixpkgs module (which also wires `http3-ytproxy`, the nginx vhost, the DB, and admin-user init). So container auto-update doesn't apply. Options: - **(a) Keep native, track the channel** — Invidious updates when the NixOS `channel` file is bumped + bento rebuild. Lowest risk, keeps module integration, but updates are manual and may lag upstream (and Invidious, like companion, needs frequent updates to track YouTube). - **(b) Containerize Invidious** (`quay.io/invidious/invidious:latest`) for the same auto-update treatment as companion. True parity + fastest YouTube-tracking, but a significant migration: reimplement DB/nginx/TLS/http3-ytproxy/admin-init/companion wiring that the module does today, plus Postgres data migration. Recommendation: ship #1+#2 now (clear, low-risk win); decide #3 separately — likely (a) + a discipline of regular channel bumps, unless we commit to the (b) migration. ## Acceptance - Companion image updates automatically (verify the timer pulls a newer digest over time). - Companion survives restart / reboot / auto-update with **no** manual env recreation. - A decision is recorded for the Invidious update strategy (and implemented if (b)).

dominik.polakovics added the

needs-triage

enhancement

labels

2026-06-04 09:40:24 +02:00

dominik.polakovics commented

2026-06-04 09:58:23 +02:00

Author

Owner

This was generated by AI during triage.

Agent Brief

Category: enhancement
Summary: Make invidious-companion auto-update its container image and regenerate its env file on every start; record the decision to keep Invidious itself native.

Current behavior:

The virtualisation.oci-containers.containers.invidious-companion container (image quay.io/invidious/invidious-companion:latest) relies on --pull=newer, which only re-pulls on container (re)creation. Nothing triggers re-creation, so the image froze at a ~3-month-old build and broke PO-token validation / playback. Fixed manually at runtime; uncommitted.
The companion env file (PORT/HOST/SERVER_SECRET_KEY) is written to /run/invidious-companion/env by a oneshot + RemainAfterExit=true generator (systemd.services.invidious-companion-env-generate) that runs once at boot and never again, backed by a tmpfiles /run directory. When /run/invidious-companion/ was cleaned, the env file vanished and the container failed to start (open /run/invidious-companion/env: no such file or directory), needing manual recreation.
Invidious itself is the native services.invidious nixpkgs module (not a container).

Desired behavior:

The companion image updates automatically on a recurring (~daily) schedule with no reboot/config change. A broken new image must not leave companion down — podman auto-update should roll back to the last working image.
The env file is regenerated on every companion start, so no /run cleanup, restart, reboot, or auto-update pull can leave the container without it. Generation must no longer be a once-per-boot, RemainAfterExit-latched step.
Invidious stays native and tracks the NixOS channel; this decision is recorded as an ADR.

Key interfaces / config shapes (durable Nix attributes):

virtualisation.oci-containers.containers.invidious-companion: add the io.containers.autoupdate=registry label. Recommended: set .podman.sdnotify = "healthy" with a container healthcheck so rollback triggers on a functionally-broken image — but verify the companion image exposes a real health endpoint on :8282 first; if not, add a --health-cmd via extraOptions, or fall back to default sdnotify (rollback then only covers hard start failures). Do not invent a health endpoint.
Auto-update timer: there is no virtualisation.podman.autoUpdate NixOS option (verified on nixos-25.11). Wire it whichever way dry-builds cleanly: enable podman's packaged podman-auto-update.timer, or define a small oneshot service running podman auto-update + a daily systemd.timer (OnCalendar=daily, Persistent=true).
Env generator: drop RemainAfterExit=true (or move generation into the companion unit's ExecStartPre/preStart) so it runs on every start, and back the directory with systemd RuntimeDirectory=invidious-companion (+ RuntimeDirectoryPreserve) on the owning unit instead of the tmpfiles /run/invidious-companion rule, tying the dir's lifecycle to the service.
The companion key still comes from the invidious-companion-key SOPS secret — don't change secret handling.
Add a new ADR (next sequential number; current highest is ADR-0012) recording "Invidious stays native + channel-bump discipline," with the rejected alternative (containerize quay.io/invidious/invidious:latest) and its cost (reimplementing DB/nginx/TLS/http3-ytproxy/admin-init wiring + Postgres data migration).

Acceptance criteria:

The companion container carries io.containers.autoupdate=registry and a recurring podman auto-update is armed (timer enabled/active).
Env generation no longer uses RemainAfterExit=true as a once-per-boot latch; the env is regenerated on every start and its directory is provided by RuntimeDirectory (no /run tmpfiles rule for it).
Companion starts cleanly after a /run-wipe + restart cycle with no manual env recreation. (Runtime check — see verification note.)
A new ADR records the keep-native decision with the containerize alternative and its rejection.
The fw host dry-build passes (the pre-commit gate).

Out of scope:

Containerizing Invidious itself (option b) — explicitly rejected; the ADR records why.
Changing companion's hardening (--cap-drop=ALL, no-new-privileges, --read-only) or its invidious-net network.
The native Invidious companion-config.json generator, the SOPS secret, and any nginx/http3-ytproxy/CORS config — only the companion env-file generator is in scope.

Verification note (ties to #88): This module lives in the fw web microVM, which agents currently can't SSH into (#88). Make the change and rely on the pre-commit dry-build as the gate; the runtime acceptance checks (survives /run-wipe restart without manual env recreation; timer pulls a newer digest over time) must be confirmed by a human/HITL on the VM after deploy. A dry-build-passing PR is the agent's completion bar; runtime sign-off is the maintainer's.

> *This was generated by AI during triage.* ## Agent Brief **Category:** enhancement **Summary:** Make `invidious-companion` auto-update its container image and regenerate its env file on every start; record the decision to keep Invidious itself native. **Current behavior:** - The `virtualisation.oci-containers.containers.invidious-companion` container (image `quay.io/invidious/invidious-companion:latest`) relies on `--pull=newer`, which only re-pulls on container (re)creation. Nothing triggers re-creation, so the image froze at a ~3-month-old build and broke PO-token validation / playback. Fixed manually at runtime; uncommitted. - The companion env file (`PORT`/`HOST`/`SERVER_SECRET_KEY`) is written to `/run/invidious-companion/env` by a `oneshot` + `RemainAfterExit=true` generator (`systemd.services.invidious-companion-env-generate`) that runs once at boot and never again, backed by a tmpfiles `/run` directory. When `/run/invidious-companion/` was cleaned, the env file vanished and the container failed to start (`open /run/invidious-companion/env: no such file or directory`), needing manual recreation. - Invidious itself is the native `services.invidious` nixpkgs module (not a container). **Desired behavior:** - The companion image updates automatically on a recurring (~daily) schedule with no reboot/config change. A broken new image must not leave companion down — `podman auto-update` should roll back to the last working image. - The env file is regenerated on **every** companion start, so no `/run` cleanup, restart, reboot, or auto-update pull can leave the container without it. Generation must no longer be a once-per-boot, `RemainAfterExit`-latched step. - Invidious stays native and tracks the NixOS channel; this decision is recorded as an ADR. **Key interfaces / config shapes (durable Nix attributes):** - `virtualisation.oci-containers.containers.invidious-companion`: add the `io.containers.autoupdate=registry` label. Recommended: set `.podman.sdnotify = "healthy"` *with* a container healthcheck so rollback triggers on a functionally-broken image — but **verify the companion image exposes a real health endpoint on :8282 first**; if not, add a `--health-cmd` via `extraOptions`, or fall back to default sdnotify (rollback then only covers hard start failures). Do not invent a health endpoint. - Auto-update timer: there is **no** `virtualisation.podman.autoUpdate` NixOS option (verified on nixos-25.11). Wire it whichever way dry-builds cleanly: enable podman's packaged `podman-auto-update.timer`, or define a small `oneshot` service running `podman auto-update` + a daily `systemd.timer` (`OnCalendar=daily`, `Persistent=true`). - Env generator: drop `RemainAfterExit=true` (or move generation into the companion unit's `ExecStartPre`/`preStart`) so it runs on every start, and back the directory with systemd `RuntimeDirectory=invidious-companion` (+ `RuntimeDirectoryPreserve`) on the owning unit instead of the tmpfiles `/run/invidious-companion` rule, tying the dir's lifecycle to the service. - The companion key still comes from the `invidious-companion-key` SOPS secret — don't change secret handling. - Add a new ADR (next sequential number; current highest is ADR-0012) recording "Invidious stays native + channel-bump discipline," with the rejected alternative (containerize `quay.io/invidious/invidious:latest`) and its cost (reimplementing DB/nginx/TLS/http3-ytproxy/admin-init wiring + Postgres data migration). **Acceptance criteria:** - [ ] The companion container carries `io.containers.autoupdate=registry` and a recurring `podman auto-update` is armed (timer enabled/active). - [ ] Env generation no longer uses `RemainAfterExit=true` as a once-per-boot latch; the env is regenerated on every start and its directory is provided by `RuntimeDirectory` (no `/run` tmpfiles rule for it). - [ ] Companion starts cleanly after a `/run`-wipe + restart cycle with no manual env recreation. (Runtime check — see verification note.) - [ ] A new ADR records the keep-native decision with the containerize alternative and its rejection. - [ ] The fw host dry-build passes (the pre-commit gate). **Out of scope:** - Containerizing Invidious itself (option b) — explicitly rejected; the ADR records why. - Changing companion's hardening (`--cap-drop=ALL`, `no-new-privileges`, `--read-only`) or its `invidious-net` network. - The native Invidious `companion-config.json` generator, the SOPS secret, and any nginx/http3-ytproxy/CORS config — only the companion *env-file* generator is in scope. **Verification note (ties to #88):** This module lives in the fw web microVM, which agents currently can't SSH into (#88). Make the change and rely on the pre-commit dry-build as the gate; the runtime acceptance checks (survives `/run`-wipe restart without manual env recreation; timer pulls a newer digest over time) must be confirmed by a human/HITL on the VM after deploy. A dry-build-passing PR is the agent's completion bar; runtime sign-off is the maintainer's.

dominik.polakovics added

in-progress

and removed

needs-triage

labels

2026-06-04 09:58:23 +02:00

dominik.polakovics referenced this issue from a pull request that will close it,

2026-06-04 10:18:16 +02:00

feat(fw): auto-update invidious-companion and harden its env file #90

dominik.polakovics closed this issue

2026-06-04 13:36:17 +02:00