Skip to content

How it works

This page traces a single agent firing in detail. If Architecture is the why, this is the what-happens-in-order.

The example is Lucius, the feature-dev agent, because it exercises every primitive. Simpler agents (Echo in the tutorial, Gordon’s read-only health check) are subsets of this trace.

The host scheduler owns the schedule. On macOS that is launchd; on Linux it is systemd --user. A Lucius interval of 1200 seconds fires every 20 minutes and execs:

$ALFRED_HOME/bin/agent-launch lucius.py

agent-launch is a thin shell wrapper. Host schedulers do not source shell rc files, so the wrapper sources ~/.alfredrc at firing time, then execs $ALFRED_HOME/bin/lucius.py. The rendered scheduler unit has already set AGENT_CODENAME, LAUNCHD_LABEL, ALFRED_HOME, WORKSPACE_ROOT, and PATH.

There is no daemon. The process exists only for the duration of this one firing.

The runner walks a fixed sequence of cheap checks before it will spend a Claude turn. Any gate that trips exits the firing early.

flowchart TB
    start["lucius.py starts"]
    lock{"with_lock(AGENT)<br/>acquired?"}
    locked["print [LUCIUS-LOCKED]<br/>exit 0"]
    pre{"preflight(spec)<br/>passes?"}
    prefail["print [LUCIUS-PREFLIGHT-FAILED]<br/>exit 0"]
    doc{"doctor_mode()?"}
    docok["print [LUCIUS-DOCTOR-OK]<br/>exit 0"]
    gblock{"is_globally_blocked()?"}
    gb["print [LUCIUS-GLOBAL-BLOCKED]<br/>exit 0"]
    cap{"spend caps OK?"}
    capfail["Slack-post, alfred pause<br/>exit 0"]
    pick["pick_issue()"]

    start --> lock
    lock -- no --> locked
    lock -- yes --> pre
    pre -- no --> prefail
    pre -- yes --> doc
    doc -- yes --> docok
    doc -- no --> gblock
    gblock -- yes --> gb
    gblock -- no --> cap
    cap -- no --> capfail
    cap -- yes --> pick
  • with_lock(AGENT) is a mkdir-atomic per-codename mutex. If a previous Lucius firing is still running (a long claude -p), this firing exits rather than running two Luciuses at once.
  • preflight(spec) checks the PreflightSpec: required CLIs on PATH (gh, git, the engine binary), gh auth still valid, the watched repo checkouts present under WORKSPACE_ROOT. A gap prints [LUCIUS-PREFLIGHT-FAILED] naming each missing piece.
  • doctor_mode() reads ALFRED_DOCTOR. When bin/doctor.sh sets it to 1, the agent emits [LUCIUS-DOCTOR-OK] and exits before doing real work. This is how the host gets verified without burning turns.
  • is_globally_blocked() reads $ALFRED_HOME/state/global-blocked-until.json. If another Claude-backed agent tripped a Claude provider limit in the last hour, this firing exits silently.
  • Spend caps read SpendState(AGENT): turns_today, consecutive_failures. Over the cap, the agent Slack-posts the reason and pauses its own scheduler unit.

Every gate is a plain function call against a file on disk or a subprocess. No network round-trip is needed to decide “should this firing even run.”

pick_issue() queries GitHub for the oldest open issue labelled agent:implement across the agent’s watched repos, skipping any repo in the pause list (is_repo_paused).

Then claim_issue(repo, num, codename=AGENT, firing_id=...) runs the state machine handshake: it adds the agent:in-flight label, posts a structured claim comment, and re-reads recent comments to confirm it won the race. If an earlier claim exists, this firing yields and exits. If claim_issue returns False for any reason (already claimed, repo paused, blocker label present), the firing prints [LUCIUS-DEDUP-SKIP] and exits.

sequenceDiagram
    participant runner as lucius.py
    participant git as git
    participant engine as configured engine
    participant fs as worktree dir

    runner->>git: make_worktree(repo, agent, issue)
    git->>fs: git worktree add from origin/main
    runner->>runner: build prompt (issue body + repo context)
    runner->>engine: invoke prompt (Claude Code, Codex, or hybrid)
    engine->>fs: read, edit, write files, run tests
    engine-->>runner: AgentResult (success, turns, cost, session_id, text)
    runner->>git: git rev-list origin/main..HEAD
    git-->>runner: commit count

make_worktree creates a throwaway git worktree under $ALFRED_HOME/worktrees/eng-lucius-<repo>-<issue>-<ts>/, branched from a fresh origin/main. The claude -p subprocess runs with its cwd pinned to that worktree, so it physically cannot touch your canonical checkout or another firing’s branch.

The runner builds the prompt from the issue body plus repo context such as the repo’s CLAUDE.md, inlines it, and calls the configured engine with a hard max_turns cap and a hard timeout. Drake and code-map-aware review prompts may read $ALFRED_HOME/state/code-map.json when configured. The result comes back in the same shape whether the engine is Claude Code, Codex, or hybrid fallback: success, subtype, num_turns, cost_usd, session_id, result_text.

The runner inspects the result and the git state, then takes exactly one exit path. The exit “codes” are sentinel strings printed to stdout for the scheduler log and Slack.

SentinelWhenWhat happens
[OK] commit <sha>claude -p succeeded and committedPush, gh pr create, label agent:authored, release_issue(transition_to=agent:pr-open), Slack-post success at info
[ALREADY-IMPLEMENTED]The work is already in the codebaseComment on the issue, label done-already, close it. No PR
[PARTIAL]Hit error_max_turnsComment progress, leave the worktree, retry next firing. Not counted as a failure
[BLOCKED]Claude could not resolve an errorSlack-post the reason at warn. Counted as a failure
[LUCIUS-NO-COMMIT]Success returned but no commit landedInspect git status; salvage unstaged changes as a do-not-review draft PR, else count as failure
[SILENT]No agent:implement issue matchedExit 0, no Slack post. The non-event is the signal

Whatever the path, release_issue runs so the issue never stays stuck in agent:in-flight, and remove_worktree cleans up the throwaway directory. Then the process exits and the host goes back to waiting for the next scheduler trigger.

  • Idempotent. Every firing reads its inputs from scratch. A crash mid-run leaves no half-state to resume; the next firing’s make_worktree even prunes orphaned worktrees first.
  • Bounded. max_turns and the firing timeout cap the worst-case spend of any single firing. The schedule caps the worst-case spend of the day.
  • Observable. Every exit path prints a sentinel and posts anything that needs attention to Slack. Codex writes per-firing artifacts under $ALFRED_HOME/state/codex/; Claude transcript capture is planned, not written by the current runner.
  • Isolated. A bad firing trashes its own worktree and nothing else.