Skip to content

Issue claim state machine

Full doc at docs/STATE_MACHINE.md. This page is the executive summary.

Two actors can race on the same issue:

  • Two agent firings (rare; with_lock serializes per codename, but cross-codename collisions exist).
  • One agent + the operator pushing a manual branch.
  • Two operators if you ever expand beyond solo (out of scope today, but the design accommodates it).

Without a coordination primitive, you get duplicate work. A real failure mode: one agent ships a quick PR for an issue in the morning, then the operator opens a careful PR for the same issue later, neither aware of the other.

State carried entirely on GitHub labels + structured HTML comments. No shared database, no shared filesystem, no Slack lock. GitHub is the synchronisation point.

LabelMeaningSet by
agent:implementEligible for autonomous pickupDrake (or human)
agent:in-flightAn agent is actively working itclaim_issue()
agent:pr-openA PR exists for this issuerelease_issue(transition_to=...)
agent:doneClosed and shippedexternal (PR merge handler)
LabelMeaning
do-not-pickupOperator override; agents skip this issue
needs:human-scopeIssue is too vague; not eligible for autonomous pickup

Posted alongside every label change so the audit trail survives manual label edits:

<!-- agent-claim:codename=lucius firing_id=20260501-194217-643a ts=2026-05-01T19:42:33Z -->
<!-- agent-release:codename=lucius firing_id=20260501-194217-643a outcome=success pr=https://github.com/foo/bar/pull/42 ts=2026-05-01T20:08:11Z -->

find_stale_claims() reads these to decide who currently holds an in-flight claim and how old that claim is, without depending on label-event timestamps.

claim_issue():

  1. Reads current label set; refuses if any blocker label is present.
  2. Atomically adds agent:in-flight + posts the claim comment.
  3. Re-reads recent comments to detect any unreleased earlier claim.
  4. If an earlier claimant exists (by createdAt timestamp), the loser:
    • Removes its own agent:in-flight label
    • Restores agent:implement
    • Posts a release comment with outcome=race-yielded-to=<earlier_codename>:<earlier_firing_id>
  5. The earlier claimant keeps the issue uncontested.

The loser exits the firing without burning a Claude turn on duplicate work. The race window collapses from ~20 minutes (between agent pick + PR open) to the sub-second gap between read-labels and add-label.

A runner crashing between claim_issue and release_issue would normally leave an issue blocked indefinitely. find_stale_claims() reads claim comments and surfaces any in-flight claim with no matching release after max_age_hours (default 4). force_release_stale_claim() then transitions the issue back to agent:implement so the queue picks it up again.

Wire it into your fleet’s daily cleanup runner. The shipped examples/bin/label_state.py helper exposes this as label-state sweep-claims [--max-age-hours N] [--dry-run] once you copy or wrap it in your fleet.

Two ways to take an issue manually without racing an agent:

Terminal window
# Mark a single issue do-not-pickup
label-state claim <repo>#<N>
# ... do your work ...
label-state release <repo>#<N>
Terminal window
# Take a whole repo offline from the fleet
label-state repo pause <repo>
# ... refactor in peace ...
label-state repo resume <repo>

The pre-push git hook (examples/git-hooks/pre-push) enforces this symmetrically. Push a branch whose commits reference Closes #N and that issue is currently in-flight or has a PR open, the push is refused.

Override per-push: git push --no-verify. Override globally: LABEL_STATE_SKIP_DEDUP_CHECK=1 in your shell rc.

# State transitions
claim_issue(repo, num, *, codename, firing_id) -> bool
release_issue(repo, num, *, codename, firing_id,
outcome="success", transition_to=None, pr_url=None) -> bool
# Inspection
issue_dedup_check(repo, num) -> dict
find_stale_claims(repo, *, max_age_hours=4) -> list[dict]
# Recovery
force_release_stale_claim(repo, num, *, sweep_id,
released_codename=None,
released_firing_id=None) -> bool
# Operator overrides
is_repo_paused(repo) -> bool
list_paused_repos() -> list[str]
set_repo_paused(repo, paused) -> list[str]
# Constants
LIFECYCLE_LABELS: list[tuple[str, str, str]]
CLAIM_COMMENT_PREFIX: str
RELEASE_COMMENT_PREFIX: str
PAUSED_REPOS_FILE: Path

See agent_runner API reference for the full module surface.