Issue claim state machine
Full doc at docs/STATE_MACHINE.md. This page is the executive summary.
The problem
Section titled “The problem”Two actors can race on the same issue:
- Two agent firings (rare;
with_lockserializes per codename, but cross-codename collisions exist). - One agent + you pushing a manual branch.
- Two human users if you ever expand beyond solo (out of scope today, but the design accommodates it).
Without a coordination primitive, you get duplicate work. A real failure mode: one agent ships a quick PR for an issue in the morning, then you open a careful PR for the same issue later, neither aware of the other.
The mechanism
Section titled “The mechanism”State carried entirely on GitHub labels + structured HTML comments. No shared database, no shared filesystem, no Slack lock. GitHub is the synchronisation point. The lifecycle is single-host today, but the contract works the same way if you ever spread the fleet across machines.
Lifecycle
Section titled “Lifecycle”stateDiagram-v2
[*] --> agent_implement : drake / human files
agent_implement : agent:implement
agent_implement --> agent_in_flight : claim_issue()
agent_implement --> needs_human_scope : 3+ failed attempts
agent_in_flight : agent:in-flight
agent_in_flight --> plan_pending : Batman posts plan to Slack
agent_in_flight --> agent_implement : release(transition_to=None)
agent_in_flight --> agent_pr_open : release(transition_to=agent:pr-open)
agent_in_flight --> agent_implement : stale-claim sweep (>4h)
agent_in_flight --> race_yield : earlier claim detected
race_yield --> agent_implement : yield + post race-yielded comment
plan_pending : agent:plan-pending-approval
plan_pending --> agent_in_flight : approver reacts yes
plan_pending --> agent_implement : approver rejects or timeout
agent_pr_open : agent:pr-open
agent_pr_open --> agent_done : automerge / human merge
agent_pr_open --> agent_implement : PR closed without merge
agent_done : agent:done
needs_human_scope : needs:human-scope
do_not_pickup : do-not-pickup (sticky, orthogonal)
agent_done --> [*]
needs_human_scope --> [*]
Lifecycle labels (mutually exclusive)
Section titled “Lifecycle labels (mutually exclusive)”| Label | Meaning | Set by |
|---|---|---|
agent:implement | Eligible for autonomous pickup | Drake (or human) |
agent:in-flight | An agent is actively working it | claim_issue() |
agent:plan-pending-approval | Plan posted for approval, waiting on go/no-go | Slack approval gate (lib/slack_approval.py) before reaction resolves |
agent:pr-open | A PR exists for this issue | release_issue(transition_to=...) |
agent:done | Closed and shipped | external (PR merge handler) |
The agent:plan-pending-approval label is set by Batman when it posts a
plan to Slack and is waiting for an approval reaction. The
Slack approval gate
polls one message’s reactions, and on the approver’s reply the agent
either transitions back to agent:in-flight (approved, worker pickup) or
returns to agent:implement (rejected or timed out). See
docs/SLACK_APPROVAL.md
for the full setup walkthrough.
Sticky modifiers (orthogonal)
Section titled “Sticky modifiers (orthogonal)”| Label | Meaning |
|---|---|
do-not-pickup | Operator override; agents skip this issue |
needs:human-scope | Issue is too vague; not eligible for autonomous pickup |
Claim comments
Section titled “Claim comments”Posted alongside every label change so the audit trail survives manual label edits:
<!-- agent-claim:codename=lucius firing_id=20260501-194217-643a ts=2026-05-01T19:42:33Z --><!-- agent-release:codename=lucius firing_id=20260501-194217-643a outcome=success pr=https://github.com/foo/bar/pull/42 ts=2026-05-01T20:08:11Z -->find_stale_claims() reads these to decide who currently holds an in-flight claim and how old that claim is, without depending on label-event timestamps.
Race resolution
Section titled “Race resolution”claim_issue():
- Reads current label set; refuses if any blocker label is present.
- Atomically adds
agent:in-flight+ posts the claim comment. - Re-reads recent comments to detect any unreleased earlier claim.
- If an earlier claimant exists (by
createdAttimestamp), the loser:- Removes its own
agent:in-flightlabel - Restores
agent:implement - Posts a release comment with
outcome=race-yielded-to=<earlier_codename>:<earlier_firing_id>
- Removes its own
- The earlier claimant keeps the issue uncontested.
The loser exits the firing without burning a Claude turn on duplicate work. The race window collapses from ~20 minutes (between agent pick + PR open) to the sub-second gap between read-labels and add-label.
sequenceDiagram
participant L as Lucius (firing A)
participant gh as GitHub issue #303
participant B as Lucius (firing B)
L->>gh: read labels (agent:implement)
B->>gh: read labels (agent:implement)
L->>gh: add agent:in-flight + claim comment (ts=T1)
B->>gh: add agent:in-flight + claim comment (ts=T2)
L->>gh: re-read comments
Note over L: only my claim, T1 is earliest
L->>gh: keep issue, make worktree
B->>gh: re-read comments
Note over B: A's claim at T1 < my T2 -> I lost
B->>gh: remove my agent:in-flight, restore agent:implement
B->>gh: post release: outcome=race-yielded-to=lucius:A
Note over B: exit firing, 0 Claude turns spent
Stale-claim sweep
Section titled “Stale-claim sweep”A runner crashing between claim_issue and release_issue would normally leave an issue blocked indefinitely. find_stale_claims() reads claim comments and surfaces any in-flight claim with no matching release after max_age_hours (default 4). force_release_stale_claim() then transitions the issue back to agent:implement so the queue picks it up again.
Wire it into your fleet’s daily cleanup runner. The shipped bin/alfred-label-state.py binary exposes this as alfred-label-state sweep-claims [--max-age-hours N] [--dry-run]. deploy.sh copies it into $ALFRED_HOME/bin/ alongside the other alfred-* binaries.
Operator overrides
Section titled “Operator overrides”Two ways to take an issue manually without racing an agent:
# Mark a single issue do-not-pickupalfred-label-state claim your-org/your-backend#42# ... do your work ...alfred-label-state release your-org/your-backend#42# Take a whole repo offline from the fleetalfred-label-state repo pause your-backend# ... refactor in peace ...alfred-label-state repo resume your-backendsweep-claims reads LABEL_STATE_SWEEP_REPOS (comma-separated) for its default repo set:
LABEL_STATE_SWEEP_REPOS="your-backend,your-frontend,your-mobile" \ alfred-label-state sweep-claims --max-age-hours 4 --dry-runThe pre-push git hook (examples/git-hooks/pre-push) enforces this symmetrically. Push a branch whose commits reference Closes #N and that issue is currently in-flight or has a PR open, the push is refused.
Override per-push: git push --no-verify.
Override globally: LABEL_STATE_SKIP_DEDUP_CHECK=1 in your shell rc.
API reference
Section titled “API reference”# State transitionsclaim_issue(repo, num, *, codename, firing_id) -> boolrelease_issue(repo, num, *, codename, firing_id, outcome="success", transition_to=None, pr_url=None) -> bool
# Inspectionissue_dedup_check(repo, num) -> dictfind_stale_claims(repo, *, max_age_hours=4) -> list[dict]
# Recoveryforce_release_stale_claim(repo, num, *, sweep_id, released_codename=None, released_firing_id=None) -> bool
# Operator overridesis_repo_paused(repo) -> boollist_paused_repos() -> list[str]set_repo_paused(repo, paused) -> list[str]
# ConstantsLIFECYCLE_LABELS: list[tuple[str, str, str]]CLAIM_COMMENT_PREFIX: strRELEASE_COMMENT_PREFIX: strPAUSED_REPOS_FILE: PathSee agent_runner API reference for the full module surface.
Source-of-truth label constants
Section titled “Source-of-truth label constants”Every label string lives in lib/labels.py. Import from there rather than duplicating string literals:
from labels import ( IMPLEMENT, # "agent:implement" IN_FLIGHT, # "agent:in-flight" PR_OPEN, # "agent:pr-open" DONE, # "agent:done" DO_NOT_PICKUP, # "do-not-pickup" NEEDS_HUMAN_SCOPE, # "needs:human-scope" AUTHORED, # "agent:authored" LARGE_FEATURE, # "agent:large-feature" bundle_label, # builds "agent:bundle:<slug>" is_legal_transition, # documents the state-machine moves)Cross-repo PR chains and worktrees
Section titled “Cross-repo PR chains and worktrees”For multi-repo features (one issue, N PRs across N repos), Alfred ships:
lib/multi_worktree.py:MultiWorktree(requests, agent, feature_id)context manager that creates per-repo git worktrees with synchronised branch names and cleans them up on exit. Git interaction is injected via aGitRunnerProtocol so tests don’t touch real worktrees.lib/cross_repo_pr.py:CrossRepoPRChainplan/execute coordinator.chain.plan(...)returns aPlandataclass (pure, no I/O);chain.execute(plan)opens each PR, persists state to$ALFRED_HOME/state/pr-chains/<feature_id>.jsonatomically, and refreshes earlier PR bodies as later siblings open so the cross-links stay current.
Both modules use Protocol-based dependency injection so consumers can swap the default subprocess implementation for tests or alternative GitHub clients.