How it works
This page traces a single agent firing in detail. If Architecture is the why, this is the what-happens-in-order.
The example is Lucius, the feature-dev agent, because it exercises every primitive. Simpler agents (Echo in the tutorial, Gordon’s read-only health check) are subsets of this trace.
Stage 1: the trigger
Section titled “Stage 1: the trigger”The host scheduler owns the schedule. On macOS that is launchd; on Linux it is
systemd --user. A Lucius interval of 1200 seconds fires every 20 minutes and
execs:
$ALFRED_HOME/bin/agent-launch lucius.pyagent-launch is a thin shell wrapper. Host schedulers do not source shell rc
files, so the wrapper sources ~/.alfredrc at firing time, then execs
$ALFRED_HOME/bin/lucius.py. The rendered scheduler unit has already set
AGENT_CODENAME, LAUNCHD_LABEL, ALFRED_HOME, WORKSPACE_ROOT, and PATH.
There is no daemon. The process exists only for the duration of this one firing.
Stage 2: gates before any spend
Section titled “Stage 2: gates before any spend”The runner walks a fixed sequence of cheap checks before it will spend a Claude turn. Any gate that trips exits the firing early.
flowchart TB
start["lucius.py starts"]
lock{"with_lock(AGENT)<br/>acquired?"}
locked["print [LUCIUS-LOCKED]<br/>exit 0"]
pre{"preflight(spec)<br/>passes?"}
prefail["print [LUCIUS-PREFLIGHT-FAILED]<br/>exit 0"]
doc{"doctor_mode()?"}
docok["print [LUCIUS-DOCTOR-OK]<br/>exit 0"]
gblock{"is_globally_blocked()?"}
gb["print [LUCIUS-GLOBAL-BLOCKED]<br/>exit 0"]
cap{"spend caps OK?"}
capfail["Slack-post, alfred pause<br/>exit 0"]
pick["pick_issue()"]
start --> lock
lock -- no --> locked
lock -- yes --> pre
pre -- no --> prefail
pre -- yes --> doc
doc -- yes --> docok
doc -- no --> gblock
gblock -- yes --> gb
gblock -- no --> cap
cap -- no --> capfail
cap -- yes --> pick
with_lock(AGENT)is amkdir-atomic per-codename mutex. If a previous Lucius firing is still running (a longclaude -p), this firing exits rather than running two Luciuses at once.preflight(spec)checks thePreflightSpec: required CLIs on PATH (gh,git, the engine binary),ghauth still valid, the watched repo checkouts present underWORKSPACE_ROOT. A gap prints[LUCIUS-PREFLIGHT-FAILED]naming each missing piece.doctor_mode()readsALFRED_DOCTOR. Whenbin/doctor.shsets it to1, the agent emits[LUCIUS-DOCTOR-OK]and exits before doing real work. This is how the host gets verified without burning turns.is_globally_blocked()reads$ALFRED_HOME/state/global-blocked-until.json. If another Claude-backed agent tripped a Claude provider limit in the last hour, this firing exits silently.- Spend caps read
SpendState(AGENT):turns_today,consecutive_failures. Over the cap, the agent Slack-posts the reason and pauses its own scheduler unit.
Every gate is a plain function call against a file on disk or a subprocess. No network round-trip is needed to decide “should this firing even run.”
Stage 3: pick and claim the work
Section titled “Stage 3: pick and claim the work”pick_issue() queries GitHub for the oldest open issue labelled agent:implement across the agent’s watched repos, skipping any repo in the pause list (is_repo_paused).
Then claim_issue(repo, num, codename=AGENT, firing_id=...) runs the state machine handshake: it adds the agent:in-flight label, posts a structured claim comment, and re-reads recent comments to confirm it won the race. If an earlier claim exists, this firing yields and exits. If claim_issue returns False for any reason (already claimed, repo paused, blocker label present), the firing prints [LUCIUS-DEDUP-SKIP] and exits.
Stage 4: isolate and invoke
Section titled “Stage 4: isolate and invoke”sequenceDiagram
participant runner as lucius.py
participant git as git
participant engine as configured engine
participant fs as worktree dir
runner->>git: make_worktree(repo, agent, issue)
git->>fs: git worktree add from origin/main
runner->>runner: build prompt (issue body + repo context)
runner->>engine: invoke prompt (Claude Code, Codex, or hybrid)
engine->>fs: read, edit, write files, run tests
engine-->>runner: AgentResult (success, turns, cost, session_id, text)
runner->>git: git rev-list origin/main..HEAD
git-->>runner: commit count
make_worktree creates a throwaway git worktree under $ALFRED_HOME/worktrees/eng-lucius-<repo>-<issue>-<ts>/, branched from a fresh origin/main. The claude -p subprocess runs with its cwd pinned to that worktree, so it physically cannot touch your canonical checkout or another firing’s branch.
The runner builds the prompt from the issue body plus repo context such as the
repo’s CLAUDE.md, inlines it, and calls the configured engine with a hard
max_turns cap and a hard timeout. Drake and code-map-aware review prompts may
read $ALFRED_HOME/state/code-map.json when configured. The result comes back
in the same shape whether the engine is Claude Code, Codex, or hybrid fallback:
success, subtype, num_turns, cost_usd, session_id, result_text.
Stage 5: branch on the outcome
Section titled “Stage 5: branch on the outcome”The runner inspects the result and the git state, then takes exactly one exit path. The exit “codes” are sentinel strings printed to stdout for the scheduler log and Slack.
| Sentinel | When | What happens |
|---|---|---|
[OK] commit <sha> | claude -p succeeded and committed | Push, gh pr create, label agent:authored, release_issue(transition_to=agent:pr-open), Slack-post success at info |
[ALREADY-IMPLEMENTED] | The work is already in the codebase | Comment on the issue, label done-already, close it. No PR |
[PARTIAL] | Hit error_max_turns | Comment progress, leave the worktree, retry next firing. Not counted as a failure |
[BLOCKED] | Claude could not resolve an error | Slack-post the reason at warn. Counted as a failure |
[LUCIUS-NO-COMMIT] | Success returned but no commit landed | Inspect git status; salvage unstaged changes as a do-not-review draft PR, else count as failure |
[SILENT] | No agent:implement issue matched | Exit 0, no Slack post. The non-event is the signal |
Whatever the path, release_issue runs so the issue never stays stuck in agent:in-flight, and remove_worktree cleans up the throwaway directory. Then the process exits and the host goes back to waiting for the next scheduler trigger.
Why this shape holds up unattended
Section titled “Why this shape holds up unattended”- Idempotent. Every firing reads its inputs from scratch. A crash mid-run leaves no half-state to resume; the next firing’s
make_worktreeeven prunes orphaned worktrees first. - Bounded.
max_turnsand the firing timeout cap the worst-case spend of any single firing. The schedule caps the worst-case spend of the day. - Observable. Every exit path prints a sentinel and posts anything that needs attention to Slack. Codex writes per-firing artifacts under
$ALFRED_HOME/state/codex/; Claude transcript capture is planned, not written by the current runner. - Isolated. A bad firing trashes its own worktree and nothing else.
See also
Section titled “See also”- Architecture: the design rationale behind each gate.
- The agent fleet: how multiple firings hand work to each other.
- agent_runner API reference: every primitive named above, with signatures.
- Tutorial: build the smallest agent that uses this whole shape.