lab: AFK run lifecycle — auto-reap on PR success, fail on death/timeout #63

Closed
opened 2026-06-01 12:23:04 +02:00 by dominik.polakovics · 1 comment

Parent

#61

What to build

A watcher in lab's poll loop that completes AFK runs on its own, since claude --remote-control never exits by itself (it opens the PR, then idles holding its slot). A run is done when a PR with head afk/<N> exists; it is a failure when its session dies with no PR or it overruns a time budget. lab reaps accordingly and frees the slot — which also gives the auto-loop (Slice 3) an "in-flight" signal that actually clears.

Acceptance criteria

  • lab enumerates live AFK runs from session names <project>~<slot>~afk-<N> and tracks each run's start time (the budget clock; resetting it on a lab restart is acceptable).
  • A background watcher checks each live AFK run for a PR whose head branch is afk/<N>, matched client-side from tea pulls list --fields index,head,state.
  • Success (PR found): stop the tmux session and remove the worktree ~/.local/state/lab/worktrees/<project>~<N>; the branch + PR survive (the PR's Closes #N closes the issue on merge).
  • Failure (session died with no PR, OR run exceeded a ~45-min budget with no PR): stop the session if still alive and KEEP the worktree for inspection.
  • Reaping frees the instance slot — the run leaves the instances list and stops counting against the global cap.
  • User-initiated Stop is neutral: it keeps the worktree and leaves the issue in in-progress for manual requeue; it is not a success and (per Slice 4) must not count as a failure. Supersedes Slice 1's interim "Stop removes the worktree."
  • The 45-min budget is a single tunable constant.
  • Go unit tests cover the done/failure classification from (PR-exists, session-alive, run-age) inputs.

Blocked by

  • #62 (Slice 1 — the manual run that produces AFK sessions + worktrees)
## Parent #61 ## What to build A watcher in `lab`'s poll loop that completes AFK runs on its own, since `claude --remote-control` never exits by itself (it opens the PR, then idles holding its slot). A run is **done** when a PR with head `afk/<N>` exists; it is a **failure** when its session dies with no PR or it overruns a time budget. `lab` reaps accordingly and frees the slot — which also gives the auto-loop (Slice 3) an "in-flight" signal that actually clears. ## Acceptance criteria - [ ] `lab` enumerates live AFK runs from session names `<project>~<slot>~afk-<N>` and tracks each run's start time (the budget clock; resetting it on a lab restart is acceptable). - [ ] A background watcher checks each live AFK run for a PR whose head branch is `afk/<N>`, matched client-side from `tea pulls list --fields index,head,state`. - [ ] Success (PR found): stop the tmux session and remove the worktree `~/.local/state/lab/worktrees/<project>~<N>`; the branch + PR survive (the PR's `Closes #N` closes the issue on merge). - [ ] Failure (session died with no PR, OR run exceeded a ~45-min budget with no PR): stop the session if still alive and KEEP the worktree for inspection. - [ ] Reaping frees the instance slot — the run leaves the instances list and stops counting against the global cap. - [ ] User-initiated Stop is neutral: it keeps the worktree and leaves the issue in `in-progress` for manual requeue; it is not a success and (per Slice 4) must not count as a failure. **Supersedes Slice 1's interim "Stop removes the worktree."** - [ ] The 45-min budget is a single tunable constant. - [ ] Go unit tests cover the done/failure classification from (PR-exists, session-alive, run-age) inputs. ## Blocked by - #62 (Slice 1 — the manual run that produces AFK sessions + worktrees)
Author
Owner

This was generated by AI during triage.

Agent Brief

Category: enhancement
Summary: Add a periodic background watcher to lab that completes ("reaps") AFK runs on its own — success when the run's afk/<N> PR appears, failure on session-death-without-PR or a budget overrun — freeing the instance slot in every terminal case.

Context:
claude --remote-control never self-exits: an AFK run opens its PR and then idles, holding its tmux session (and its instance slot) forever. lab therefore can't use "the session ended" as the done signal. This is Slice 2 of #61, building on the manual-run substrate from #62 (now merged). The design is locked in ADR-0007 (docs/adr/0007-lab-drives-afk-runs.md) and the AFK run / Instance glossary entries in CONTEXT.md — follow them; this brief only sharpens the slice boundary and bakes in two decisions taken at triage.

Current behavior:

  • A manual "Start AFK run" claims the lowest ready-for-agent issue (flipping it to in-progress), creates an isolated worktree on branch afk/<N>, and spawns a seeded --remote-control session named <project>~<slot>~afk-<N>, shown as an instance row badged AFK #N.
  • Nothing ever completes a run. The session idles after opening its PR and counts against the global instance cap indefinitely; the only end is a manual Stop.
  • The manual Stop handler (handleStop) currently also removes the worktree — interim Slice-1 behavior, marked in-code as superseded by this issue. The afk/<N> branch and pushed commits survive.
  • lab has no long-lived server-side background loop today — only transient per-session deep-link scrape goroutines (startCapture) and the browser's own ~4s fragment poll. This slice introduces lab's first periodic watcher goroutine, started once at process startup.

Desired behavior:

A single periodic watcher (one long-lived goroutine, ticking on a tunable interval ~30s, within ADR-0007's 30–60s band) enumerates live AFK runs and classifies each from three inputs — does an afk/<N> PR exist, is the session still alive, run age vs. budget — taking exactly one terminal action:

  • Success — an afk/<N> PR exists (open or merged): stop the session if still alive, then remove its worktree. Branch, commits, and PR survive (the PR's Closes #N closes the issue on merge). A present PR means success regardless of session liveness — a run that opened its PR and then died is a success, not a death-failure.
  • Failure — death: the session is gone and no afk/<N> PR exists. Keep the worktree for inspection; the issue stays in-progress. (Nothing to stop; the worktree was never removed.)
  • Failure — timeout: the session is still alive, no afk/<N> PR exists, and the run has exceeded a ~45-minute budget. Stop the session; keep the worktree; the issue stays in-progress.
  • In progress: alive, no PR, under budget → leave it alone.

PR matching is client-side from the project's own tracker (tea pulls list, head + state fields). A PR whose head is afk/<N> counts as done only when open or merged; a closed-and-unmerged afk/<N> PR is treated as no PR (so the run fails on death/timeout rather than being falsely reaped as success).

Run start times (the budget clock) are held in memory, keyed by session name, stamped lazily the first time the watcher sees a run. Re-deriving the live-run set each tick from session names means a lab restart re-adopts in-flight runs with the budget clock reset to the restart — acceptable per ADR-0007.

Reaping in every terminal case kills the tmux session, so the run leaves the instances list and stops counting against the global cap with no extra slot bookkeeping (the cap counts live sessions).

The manual Stop becomes neutral, superseding the interim behavior: it keeps the worktree and leaves the issue in-progress for manual requeue. It is neither a success nor a failure. The coupling that makes handleStop remove an AFK worktree must go.

Key interfaces:

  • The Tracker seam (already wraps tea for issue queries, with a substitutable test fake) gains a way to list a project's pull requests with their head branch and state, so the watcher matches afk/<N> client-side. Mirror the existing ready-queue method and keep the tea shell-out inside this seam so it stays unit-testable via the fake.
  • A pure classification function over (prPresent bool, sessionAlive bool, age, budget) returning the outcome (in-progress / success / failure-with-reason). This is the unit-tested core; it must not touch tmux, tea, or the clock directly.
  • The existing AFK-worktree teardown helper (removeAFKWorktree) already has the correct success semantics — removes the worktree, preserves branch/commits/PR. Reuse it for the success path.
  • A single internal chokepoint that, given a run and its classified terminal outcome, performs the reap (stop-if-alive + remove-or-keep worktree) and logs the outcome. All terminal paths route through it. This is the seam Slice 4 (#65) will extend to count consecutive failures and pause a project — so this slice must funnel outcomes through one place, but must not itself implement or mutate any failure counter.
  • The watcher interval and the run budget are named tunable constants, defined together.

Acceptance criteria:

  • A background watcher enumerates live AFK runs from <project>~<slot>~afk-<N> session names and tracks each run's start time in memory (reset on lab restart is acceptable).
  • The watcher checks each run for a PR whose head branch is afk/<N>, matched client-side from the project's tea pull list (head + state).
  • Success (open/merged afk/<N> PR): the session is stopped and the run's worktree is removed; the afk/<N> branch and PR survive.
  • Failure (session died with no PR, OR run exceeded the budget with no PR): the session is stopped if still alive and the worktree is kept; the issue remains in-progress.
  • A reaped run leaves the instances list and stops counting against the global instance cap.
  • User-initiated Stop is neutral: it keeps the worktree and leaves the issue in-progress; it is not a success and not a failure (supersedes the interim "Stop removes the worktree").
  • The run budget is a single tunable constant; the watcher interval is likewise a named constant (~30s).
  • A closed-but-unmerged afk/<N> PR does not reap the run as success.
  • All terminal outcomes pass through one internal reap entry point (the seam #65 will hook), and this slice neither reads nor mutates any consecutive-failure counter.
  • Go unit tests cover the done/failure/in-progress classification from (PR-present, session-alive, run-age) inputs — including PR-present-but-session-dead = success, and closed-unmerged-PR = not-success.

Out of scope:

  • The per-project consecutiveFailures counter, the 3-strikes auto-pause, and the UI reset — Slice 4 (#65). This slice only exposes the outcome seam those hook onto.
  • The automatic scheduler / per-project auto toggle that launches runs — Slice 3 (#64). This slice only reaps; it may later share the scheduler goroutine, but launch logic is not built here.
  • Any (N ready) count hint — Slice 5 (#66).
  • Changing the claim/spawn flow, seed prompt, worktree layout, or session-naming scheme — inherited from #62 and ADR-0007.
  • Persisting in-flight runs across restarts (ADR-0007 accepts re-adoption with a reset budget clock).
  • Retrying failed runs (ADR-0007 rejects auto-retry).

Implementer note: lab's Go tests do not run in the repo pre-commit hook (it is eval-only). Run go test ./..., go vet, and go build for this module locally before opening the PR — a Go regression won't be caught by the hook.

> *This was generated by AI during triage.* ## Agent Brief **Category:** enhancement **Summary:** Add a periodic background watcher to lab that completes ("reaps") AFK runs on its own — success when the run's `afk/<N>` PR appears, failure on session-death-without-PR or a budget overrun — freeing the instance slot in every terminal case. **Context:** `claude --remote-control` never self-exits: an AFK run opens its PR and then idles, holding its tmux session (and its instance slot) forever. lab therefore can't use "the session ended" as the done signal. This is **Slice 2 of #61**, building on the manual-run substrate from #62 (now merged). The design is locked in **ADR-0007** (`docs/adr/0007-lab-drives-afk-runs.md`) and the **AFK run** / **Instance** glossary entries in CONTEXT.md — follow them; this brief only sharpens the slice boundary and bakes in two decisions taken at triage. **Current behavior:** - A manual "Start AFK run" claims the lowest `ready-for-agent` issue (flipping it to `in-progress`), creates an isolated worktree on branch `afk/<N>`, and spawns a seeded `--remote-control` session named `<project>~<slot>~afk-<N>`, shown as an instance row badged `AFK #N`. - Nothing ever completes a run. The session idles after opening its PR and counts against the global instance cap indefinitely; the only end is a manual **Stop**. - The manual **Stop** handler (`handleStop`) currently also **removes the worktree** — interim Slice-1 behavior, marked in-code as superseded by this issue. The `afk/<N>` branch and pushed commits survive. - lab has **no long-lived server-side background loop** today — only transient per-session deep-link scrape goroutines (`startCapture`) and the browser's own ~4s fragment poll. This slice introduces lab's first periodic watcher goroutine, started once at process startup. **Desired behavior:** A single periodic watcher (one long-lived goroutine, ticking on a tunable interval ~30s, within ADR-0007's 30–60s band) enumerates live AFK runs and classifies each from three inputs — *does an `afk/<N>` PR exist*, *is the session still alive*, *run age vs. budget* — taking exactly one terminal action: - **Success** — an `afk/<N>` PR exists (open or merged): stop the session if still alive, then remove its worktree. Branch, commits, and PR survive (the PR's `Closes #N` closes the issue on merge). **A present PR means success regardless of session liveness** — a run that opened its PR and then died is a success, not a death-failure. - **Failure — death:** the session is gone and no `afk/<N>` PR exists. Keep the worktree for inspection; the issue stays `in-progress`. (Nothing to stop; the worktree was never removed.) - **Failure — timeout:** the session is still alive, no `afk/<N>` PR exists, and the run has exceeded a ~45-minute budget. Stop the session; keep the worktree; the issue stays `in-progress`. - **In progress:** alive, no PR, under budget → leave it alone. PR matching is **client-side** from the project's own tracker (`tea pulls list`, head + state fields). A PR whose head is `afk/<N>` counts as done only when **open or merged**; a **closed-and-unmerged** `afk/<N>` PR is treated as *no PR* (so the run fails on death/timeout rather than being falsely reaped as success). Run start times (the budget clock) are held **in memory**, keyed by session name, stamped lazily the first time the watcher sees a run. Re-deriving the live-run set each tick from session names means a lab restart re-adopts in-flight runs with the budget clock reset to the restart — acceptable per ADR-0007. Reaping in every terminal case kills the tmux session, so the run leaves the instances list and stops counting against the global cap with **no extra slot bookkeeping** (the cap counts live sessions). The manual **Stop** becomes **neutral**, superseding the interim behavior: it keeps the worktree and leaves the issue `in-progress` for manual requeue. It is neither a success nor a failure. The coupling that makes `handleStop` remove an AFK worktree must go. **Key interfaces:** - The `Tracker` seam (already wraps `tea` for issue queries, with a substitutable test fake) gains a way to list a project's pull requests with their **head branch and state**, so the watcher matches `afk/<N>` client-side. Mirror the existing ready-queue method and keep the `tea` shell-out inside this seam so it stays unit-testable via the fake. - A **pure classification function** over `(prPresent bool, sessionAlive bool, age, budget)` returning the outcome (in-progress / success / failure-with-reason). This is the unit-tested core; it must not touch tmux, tea, or the clock directly. - The existing AFK-worktree teardown helper (`removeAFKWorktree`) already has the correct success semantics — removes the worktree, preserves branch/commits/PR. Reuse it for the success path. - A **single internal chokepoint** that, given a run and its classified terminal outcome, performs the reap (stop-if-alive + remove-or-keep worktree) and logs the outcome. **All terminal paths route through it.** This is the seam **Slice 4 (#65)** will extend to count consecutive failures and pause a project — so this slice must funnel outcomes through one place, but must **not** itself implement or mutate any failure counter. - The watcher interval and the run budget are **named tunable constants**, defined together. **Acceptance criteria:** - [ ] A background watcher enumerates live AFK runs from `<project>~<slot>~afk-<N>` session names and tracks each run's start time in memory (reset on lab restart is acceptable). - [ ] The watcher checks each run for a PR whose head branch is `afk/<N>`, matched client-side from the project's `tea` pull list (head + state). - [ ] **Success** (open/merged `afk/<N>` PR): the session is stopped and the run's worktree is removed; the `afk/<N>` branch and PR survive. - [ ] **Failure** (session died with no PR, OR run exceeded the budget with no PR): the session is stopped if still alive and the worktree is **kept**; the issue remains `in-progress`. - [ ] A reaped run leaves the instances list and stops counting against the global instance cap. - [ ] User-initiated **Stop** is neutral: it keeps the worktree and leaves the issue `in-progress`; it is not a success and not a failure (supersedes the interim "Stop removes the worktree"). - [ ] The run budget is a single tunable constant; the watcher interval is likewise a named constant (~30s). - [ ] A closed-but-unmerged `afk/<N>` PR does **not** reap the run as success. - [ ] All terminal outcomes pass through one internal reap entry point (the seam #65 will hook), and this slice neither reads nor mutates any consecutive-failure counter. - [ ] Go unit tests cover the done/failure/in-progress classification from `(PR-present, session-alive, run-age)` inputs — including PR-present-but-session-dead = success, and closed-unmerged-PR = not-success. **Out of scope:** - The per-project `consecutiveFailures` counter, the 3-strikes auto-pause, and the UI reset — **Slice 4 (#65)**. This slice only exposes the outcome seam those hook onto. - The automatic scheduler / per-project auto toggle that *launches* runs — **Slice 3 (#64)**. This slice only reaps; it may later share the scheduler goroutine, but launch logic is not built here. - Any `(N ready)` count hint — **Slice 5 (#66)**. - Changing the claim/spawn flow, seed prompt, worktree layout, or session-naming scheme — inherited from #62 and ADR-0007. - Persisting in-flight runs across restarts (ADR-0007 accepts re-adoption with a reset budget clock). - Retrying failed runs (ADR-0007 rejects auto-retry). **Implementer note:** lab's Go tests do **not** run in the repo pre-commit hook (it is eval-only). Run `go test ./...`, `go vet`, and `go build` for this module locally before opening the PR — a Go regression won't be caught by the hook.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos#63
No description provided.