lab: claim AFK-run issues via the afk/<N> branch, not an in-progress label #92

Closed
opened 2026-06-04 11:39:56 +02:00 by dominik.polakovics · 0 comments

This was generated by AI during triage.

Agent Brief

Category: enhancement
Summary: Make lab claim an AFK-run issue by creating its afk/<N> branch instead of flipping a tracker in-progress label, so the claim can't be clobbered by triage/humans and can't flap.

Current behavior:
lab claims a ready-for-agent issue by flipping its label ready-for-agent -> in-progress before spawning the run (rolling back to ready-for-agent if the spawn fails), per ADR-0007. The in-progress label is the sole claim record; the queue query (open + ready-for-agent) excludes claimed issues "for free". land-pr clears in-progress on merge.

Because the claim is a shared, mutable tracker label, any other actor can overwrite it. The triage skill (or a human) re-applying ready-for-agent to a claimed issue un-claims it. Worse, a failed run keeps its afk/<N> branch and worktree for inspection; if ready-for-agent is then re-added (the obvious "requeue" move), the next claim flips the label, the worktree-add fails because afk/<N> already exists, the rollback flips the label back, and the issue flaps ready-for-agent <-> in-progress every scheduler tick while the real run is still working it.

Desired behavior:
lab no longer reads or writes any claim label. An AFK run's claim is its afk/<N> branch -- a durable artifact lab already creates that survives a lab restart (local ref on disk). Issue selection treats an open ready-for-agent issue that already has a local afk/<N> branch as already-claimed and skips it. As a result:

  • triage and humans can freely (re)apply ready-for-agent; it can never clobber a claim or cause flapping, because the branch -- not the label -- is the source of truth. No coordination between triage and lab is required.
  • A failed run's surviving afk/<N> branch keeps its issue out of the claimable set (parked); the scheduler drains around it with no flapping. Requeue is a single action: delete the afk/<N> branch (and its worktree).
  • "What's being worked" lives in the lab UI and in the existence of afk/<N> branches/PRs, not in the tracker. This is an accepted trade-off (it loses the tracker-visible "in-progress with no PR = failed run" signature; the lab UI is the dashboard).

Selection signal: the local afk/<N> branch only. A merged run closes its issue, so an open-state query excludes it; a failed or PR-still-open run keeps its branch, which the reaper preserves. The one hole -- deleting an afk/<N> branch while its PR is still open -- is operator error already forbidden by convention. Do NOT add a PR-based exclusion to selection.

Key interfaces (names may have shifted -- explore; do not trust paths):

  • Tracker interface -- remove the label-mutation methods (EnsureLabel, Relabel); keep ReadyIssues and ListPulls. The in-progress label name/color constants go away.
  • Git interface -- add a read-only method returning the set of issue numbers that have a local afk/<N> branch (e.g. via git for-each-ref refs/heads/afk/). One local call; restart-safe.
  • The shared AFK claim path (currently launchAFKRun) -- drop the label flip and its label-rollback; the claim becomes "worktree/branch created", and failed-spawn rollback tears down only the worktree+branch.
  • Issue selection (currently pickLowestIssue and its caller) -- filter ReadyIssues to the claimable set (no existing afk/<N> branch) before choosing the lowest.
  • The auto scheduler (currently scheduleAFKRuns) and the ready-count cache feeding the "(N ready)" hint and ReadyExists -- must count the claimable set, not the raw ready-for-agent count, so the hint stays honest and a project whose only ready issues are all parked does not loop.
  • The reaper (classifyAFKRun/reapAFKRun) and the consecutiveFailures/3-strikes/Reset machinery -- unchanged (already label-independent).

Documentation (part of this issue):

  • Add ADR-0013 recording branch-as-claim, superseding only ADR-0007's claim sub-decision (the rest of ADR-0007 stands). Match the repo's ADR house style (strong sentence-title, prose, optional "Considered options"/"Consequences"). Add a "partially superseded by ADR-0013" pointer at the top of ADR-0007. Record the considered alternatives (keep the label; keep an advisory-only label that lab ignores; branch+PR selection) and the accepted trade-off (tracker-visibility loss).
  • Update docs/agents/triage-labels.md: remove the in-progress lifecycle-label section; state that lab claims via afk/<N> branches (invisible to triage) and triage manages only the five canonical state roles.
  • Update the land-pr skill (under utils/home-manager/claude-code/skills/): drop the "clear the in-progress claim label" step and the in-progress-trigger phrasing; keep the afk/<N> head-branch + Closes #N contract.

Acceptance criteria:

  • No code path creates, adds, or removes a claim label; the Tracker interface exposes no label-mutation method.
  • An AFK run is claimed by creating afk/<N>; a failed spawn rolls back by tearing down only the worktree/branch (no label restore).
  • Selection excludes any open ready-for-agent issue that has a local afk/<N> branch; the scheduler's ready count and ReadyExists use the same claimable set.
  • Re-applying ready-for-agent to a claimed or parked issue causes neither a re-claim nor any label change (no flapping) -- covered by a unit test.
  • A failed run's issue stays parked via its surviving branch; consecutiveFailures/3-strikes/Reset behave exactly as before.
  • land-pr performs no in-progress label removal; its afk/<N> + Closes #N validation is intact.
  • docs/agents/triage-labels.md no longer presents in-progress as a lifecycle label.
  • ADR-0013 exists and supersedes ADR-0007's claim sub-decision; ADR-0007 carries the superseded-by pointer.
  • The lab Go package passes go build ./..., go vet ./..., and go test ./... run locally -- the pre-commit hook is eval-only (nix-instantiate) and does NOT run Go tests.
  • The affected host dry-builds (pre-commit hook passes) and the change lands via a PR.

Out of scope:

  • A lab UI for parked/failed runs (listing orphan afk/<N> branches with a requeue/discard action) -- file as a follow-up.
  • PR-based selection exclusion -- branch-only is the decision; the PR list stays the reaper's done-signal only.
  • CONTEXT.md edits -- the "lab claims the issue" concept is unchanged; only the mechanism changes.
  • Deleting the now-unused in-progress tracker label -- optional cleanup, not required here.
  • Any auto-retry/requeue automation for failed runs (ADR-0007 rejected auto-retry; unchanged).
> *This was generated by AI during triage.* ## Agent Brief **Category:** enhancement **Summary:** Make lab claim an AFK-run issue by creating its `afk/<N>` branch instead of flipping a tracker `in-progress` label, so the claim can't be clobbered by triage/humans and can't flap. **Current behavior:** lab claims a `ready-for-agent` issue by flipping its label `ready-for-agent -> in-progress` before spawning the run (rolling back to `ready-for-agent` if the spawn fails), per ADR-0007. The `in-progress` label is the sole claim record; the queue query (open + `ready-for-agent`) excludes claimed issues "for free". `land-pr` clears `in-progress` on merge. Because the claim is a shared, mutable tracker label, any other actor can overwrite it. The triage skill (or a human) re-applying `ready-for-agent` to a claimed issue un-claims it. Worse, a failed run keeps its `afk/<N>` branch and worktree for inspection; if `ready-for-agent` is then re-added (the obvious "requeue" move), the next claim flips the label, the worktree-add fails because `afk/<N>` already exists, the rollback flips the label back, and the issue flaps `ready-for-agent <-> in-progress` every scheduler tick while the real run is still working it. **Desired behavior:** lab no longer reads or writes any claim label. An AFK run's claim *is* its `afk/<N>` branch -- a durable artifact lab already creates that survives a lab restart (local ref on disk). Issue selection treats an open `ready-for-agent` issue that already has a local `afk/<N>` branch as already-claimed and skips it. As a result: - triage and humans can freely (re)apply `ready-for-agent`; it can never clobber a claim or cause flapping, because the branch -- not the label -- is the source of truth. No coordination between triage and lab is required. - A failed run's surviving `afk/<N>` branch keeps its issue out of the claimable set (parked); the scheduler drains *around* it with no flapping. Requeue is a single action: delete the `afk/<N>` branch (and its worktree). - "What's being worked" lives in the lab UI and in the existence of `afk/<N>` branches/PRs, not in the tracker. This is an accepted trade-off (it loses the tracker-visible "in-progress with no PR = failed run" signature; the lab UI is the dashboard). **Selection signal:** the local `afk/<N>` branch only. A merged run closes its issue, so an open-state query excludes it; a failed or PR-still-open run keeps its branch, which the reaper preserves. The one hole -- deleting an `afk/<N>` branch while its PR is still open -- is operator error already forbidden by convention. Do NOT add a PR-based exclusion to selection. **Key interfaces (names may have shifted -- explore; do not trust paths):** - `Tracker` interface -- remove the label-mutation methods (`EnsureLabel`, `Relabel`); keep `ReadyIssues` and `ListPulls`. The `in-progress` label name/color constants go away. - `Git` interface -- add a read-only method returning the set of issue numbers that have a local `afk/<N>` branch (e.g. via `git for-each-ref refs/heads/afk/`). One local call; restart-safe. - The shared AFK claim path (currently `launchAFKRun`) -- drop the label flip and its label-rollback; the claim becomes "worktree/branch created", and failed-spawn rollback tears down only the worktree+branch. - Issue selection (currently `pickLowestIssue` and its caller) -- filter `ReadyIssues` to the claimable set (no existing `afk/<N>` branch) before choosing the lowest. - The auto scheduler (currently `scheduleAFKRuns`) and the ready-count cache feeding the "(N ready)" hint and `ReadyExists` -- must count the *claimable* set, not the raw `ready-for-agent` count, so the hint stays honest and a project whose only ready issues are all parked does not loop. - The reaper (`classifyAFKRun`/`reapAFKRun`) and the `consecutiveFailures`/3-strikes/Reset machinery -- unchanged (already label-independent). **Documentation (part of this issue):** - Add **ADR-0013** recording branch-as-claim, superseding *only* ADR-0007's claim sub-decision (the rest of ADR-0007 stands). Match the repo's ADR house style (strong sentence-title, prose, optional "Considered options"/"Consequences"). Add a "partially superseded by ADR-0013" pointer at the top of ADR-0007. Record the considered alternatives (keep the label; keep an advisory-only label that lab ignores; branch+PR selection) and the accepted trade-off (tracker-visibility loss). - Update `docs/agents/triage-labels.md`: remove the `in-progress` lifecycle-label section; state that lab claims via `afk/<N>` branches (invisible to triage) and triage manages only the five canonical state roles. - Update the `land-pr` skill (under `utils/home-manager/claude-code/skills/`): drop the "clear the `in-progress` claim label" step and the `in-progress`-trigger phrasing; keep the `afk/<N>` head-branch + `Closes #N` contract. **Acceptance criteria:** - [ ] No code path creates, adds, or removes a claim label; the `Tracker` interface exposes no label-mutation method. - [ ] An AFK run is claimed by creating `afk/<N>`; a failed spawn rolls back by tearing down only the worktree/branch (no label restore). - [ ] Selection excludes any open `ready-for-agent` issue that has a local `afk/<N>` branch; the scheduler's ready count and `ReadyExists` use the same claimable set. - [ ] Re-applying `ready-for-agent` to a claimed or parked issue causes neither a re-claim nor any label change (no flapping) -- covered by a unit test. - [ ] A failed run's issue stays parked via its surviving branch; `consecutiveFailures`/3-strikes/Reset behave exactly as before. - [ ] `land-pr` performs no `in-progress` label removal; its `afk/<N>` + `Closes #N` validation is intact. - [ ] `docs/agents/triage-labels.md` no longer presents `in-progress` as a lifecycle label. - [ ] ADR-0013 exists and supersedes ADR-0007's claim sub-decision; ADR-0007 carries the superseded-by pointer. - [ ] The lab Go package passes `go build ./...`, `go vet ./...`, and `go test ./...` run **locally** -- the pre-commit hook is eval-only (nix-instantiate) and does NOT run Go tests. - [ ] The affected host dry-builds (pre-commit hook passes) and the change lands via a PR. **Out of scope:** - A lab UI for parked/failed runs (listing orphan `afk/<N>` branches with a requeue/discard action) -- file as a follow-up. - PR-based selection exclusion -- branch-only is the decision; the PR list stays the reaper's done-signal only. - CONTEXT.md edits -- the "lab claims the issue" concept is unchanged; only the mechanism changes. - Deleting the now-unused `in-progress` tracker label -- optional cleanup, not required here. - Any auto-retry/requeue automation for failed runs (ADR-0007 rejected auto-retry; unchanged).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos#92
No description provided.