A real run, annotated
Case study
Most docs pages explain one mechanism with a sanitized example. This page reads one real receipt end to end: run 20260704_163427, a feature of Orcho’s own CLI, shipped through Orcho at full spend on 2026-07-04. Nothing below is synthetic; the receipt is trimmed for length and every number is unedited.
The outcome
Section titled “The outcome”The run took a feature task — a compressed line-by-line summary grammar for
the CLI’s --output summary view — and carried it through the full pipeline:
plan, plan validation, implementation, review, repair, final acceptance.
✓ plan=ok | validate_plan=ok | implement=ok | review_changes=ok | repair_changes=ok | final_acceptance=ok
Tasks: 3 planned · 3 completed · 0 failed · 0 incompleteRelease: approvedOpen risks: noneTwo details are easy to miss and worth naming:
- The plan did not pass on its first attempt. Both
planandvalidate_planshowattempts=2: plan validation found a P1 finding and pushed the plan back before any implementation started. That P1 was resolved on record. - The plan’s three tasks were decomposed into six DAG subtasks (T1–T6) for implementation, each with its own cost, time, and tool attribution in the full receipt.
There is a pleasant recursion here: the feature this run shipped is the compressed summary grammar for the very receipt surface you are reading about.
The loop that converged
Section titled “The loop that converged”Review and repair are the run’s control flow, and this receipt shows them under real load:
review_changes attempts=7repair_changes attempts=5Review findings: 1 (P1=1) | resolved: 1 | active: 0Seven review rounds and five repair rounds sound expensive until you look at
what they bought: findings fell to zero active, and the reviewer read deeply —
its session peaked at 69% of a 258k context window. The reviewer and the
implementer are also different vendors: Claude (claude-opus-4-8) implements,
Codex (gpt-5.5) reviews, across 8 sessions. The author never grades its own
work.
Mid-run, the engine’s advisor intervened once on its own:
Agent advice: calls=1 · applied_retries=1 · api-equiv $0.09One stuck attempt was pushed to a retry without a human touching the run — at an API-equivalent cost of nine cents.
Where the cost lives
Section titled “Where the cost lives”The headline number is API-equivalent, not a bill — see Cost accounting for the model. What the receipt lets you do is decompose it:
Usage: 109,878,715 tokens (in=109,414,287 out=464,428)API-equiv: ~$116.67
implement 78.1M tok attempts=2 $74.67 (96% cache-read)review_changes 12.1M tok attempts=7 $12.83 (91% cache-read)repair_changes 18.1M tok attempts=5 $25.95 (96% cache-read)plan + validate + final ~$3.23Read the shape, not just the total:
- Output tokens are 464k of 109.9M — about 0.4%. Almost the entire volume is input: agents re-reading their context as the run progresses.
- Around 95% of that input was served from provider cache. Fresh, full-priced token traffic is a small fraction of the headline figure.
- Cost concentrates where the work is: implementation carries 64% of the API-equivalent spend, the repair loop 22%, review 11%.
This is the reason Orcho reports cost per phase and per subtask instead of one number: a $116.67 headline and a “96% cache-read implement phase” describe two very different runs.
The receipt tells on the run
Section titled “The receipt tells on the run”The release was approved — and the receipt still carries two honest warnings.
First, scope expansion. The worker touched 14 files it never declared in the task’s ownership contract (mostly test files it added coverage to, plus one support module):
Scope expansion risk: 14 files flagged — unverified · no-explanationThe detector fired, classified the touches as non-blocking, and printed every path in the full receipt. An approved release does not silence the flags: the next reader sees exactly what the agent did beyond its declared scope.
Second, gate residue. All five verification receipts ran and passed before
final acceptance, and the receipt still marks them stale — they were
recorded before the delivery commit moved HEAD:
pre-final auto-run: 5 ran / 5 passblocking (require): broad-non-e2e, verification-unit, cli-sdk-unitwarning (warn): env-provenance, lint — shipping allowed by policynote: stale = passed before a later HEAD move, not a failed checkstale is a provenance statement, not a failure — the receipt explains this
in its own footnote. Which gates block and which merely warn is policy; see
Verification receipts for the
classification model.
Reading a run like this yourself
Section titled “Reading a run like this yourself”Every number on this page comes from artifacts any Orcho run leaves behind:
the final summary, events.jsonl, metrics.json, findings, and verification
receipts. The Evidence bundle page maps the
artifact set; Feature run anatomy shows the same
stream live, phase by phase.
Related
Section titled “Related”- Cost accounting — the API-equivalent model and cache anatomy.
- Verification receipts — proof that checks ran, and where.
- False-ready delivery — what happens when a run does not converge.
- Handoffs and advisors — the advisor that pushed the $0.09 retry.