Skip to content

Run anatomy

A real Orcho run can look dense because it carries the whole operating model: profile, verification policy, agents, phase DAG, live transcript, subtasks, receipts, review, final acceptance, and usage rollup.

Do not read it as one wall of output. Read it from the top down.

The live stream is the best perception layer, but the stream itself has levels. Start with the run envelope, then read the profile, pipeline, verification gates, phase decisions, and final rollup.

The first lines tell you what kind of run you are watching.

Orcho Run 20260628_000813 feature
Project /repo/orcho-core
Task ADR 0117 — Separate verification cost from blocking tier
Plugin orcho-core
Skills 14: orcho-core-quality-gates, orcho-core-skills-registry,
orcho-core-verification-matrix, orcho-prompt-engine, ...

This is already more than “an agent is working.” It gives you a run id, target project, task contract, and selected operating shape.

The profile explains how much process the task deserves. A feature-shaped run can carry verification gates before implementation, after implementation, before final acceptance, and before delivery.

Verification
mode pro
envs core-local
policy require receipts; missing or failed checks resolve at gate time
gates gate timing run policy kind
env-provenance after_implement auto warn cheap
lint after_implement auto warn cheap
run-state-unit after_implement auto require unknown
verification-unit after_implement auto require unknown
broad-non-e2e after_implement auto require unknown
e2e operator manual suggest unknown

This is the part a quickstart should not lead with. It is essential for a serious feature run, but too much for a first five-minute experience.

The run then exposes who does what.

Agents
PLAN claude-opus-* high
IMPLEMENT claude-opus-* high
REVIEW_CHANGES gpt-* medium
FINAL_ACCEPTANCE gpt-* low
Pipeline
⟳² (▶ plan [Claude] → · validate_plan [Codex]) → · implement [Claude] → ⟳² (· review_changes [Codex] → · repair_changes [Claude])
→ · final_acceptance [Codex]

This is not decorative output. The DSL is Orcho’s compact run-shape language: bounded loops, phase order, active phase, and runtime split are readable before you dive into transcripts. It shows that planning, implementation, review, repair, and acceptance are separate jobs with separate evidence.

Inside a phase, the live output shows what the runtime actually did.

[PLAN] PLAN -- architect creates MD artifacts (round 1)
runtime=claude · role=systems_architect · task=plan · mode=read
cwd=/repo/workspace/runspace/worktrees/wt_20260628_000813/checkout
Transcript
Read: pipeline/verification_selection.py
Read: docs/adr/0117-verification-blocking-tier-independent-of-cost.md
Read: tests/unit/pipeline/test_verification_selection.py
Bash: rg -n "derive_effective_policy|cheap|work_mode" --type py -l
Read: .orcho/multiagent/plugin.py
Plan
goal derive_effective_policy stops accepting cheap
commands ruff check .; pytest test_verification_selection.py; pytest verification slice
risks do not remove cheap/default_cheap contract metadata

This is where live CLI output is stronger than a post-run summary. You see the system gather context, produce a contract, and move through the lifecycle.

A mature run should not only say “I will fix it.” It should produce a contract.

Look for:

  • acceptance criteria;
  • owned files;
  • verification commands;
  • risks and out-of-scope boundaries;
  • review focus;
  • subtasks.

That contract becomes the reference point for implementation, review, and final acceptance.

A real plan contract should look concrete:

Contract
acceptance 7
owned files 5
commands 3
risks 5
review focus 5
tasks 1
Acceptance Criteria
- derive_effective_policy has no cheap parameter.
- require gates remain require in pro and fast.
- suggest gates remain advisory regardless of cost.
- cost metadata remains available for display.
Owned Files
- pipeline/verification_selection.py
- tests/unit/pipeline/test_verification_selection.py
- .orcho/multiagent/plugin.py
- docs/architecture/verification_contract.md
- docs/adr/0117-verification-blocking-tier-independent-of-cost.md

In a deeper run, implementation may be split into subtasks.

ORCHO subtask 1/1 START: T1
goal: Decouple blocking tier from verification cost
runtime: claude
model: claude-opus-*
skill: orcho-core-quality-gates
done_criteria: 7
Attestation
7/7 done-criteria met
derive_effective_policy signature updated
dead cheap plumbing removed from blocking policy
guard tests added
docs and ADR pinned to the same table

This is where Orcho becomes more than “run a prompt.” The run can say what was planned, what was completed, and what evidence backs the claim.

For a larger run, subtasks can form a dependency graph. Read Plan contract and DAG for the deeper version: rejected plan rounds, operator handoffs, compact_dag implementation, per-subtask attestations, and final acceptance receipts.

Gates turn checks into durable evidence.

Verification gates - after_phase(implement)
env-provenance PASS
lint PASS
mcp-mock-smoke PASS
receipts
env-provenance.json
lint.json
mcp-mock-smoke.json

For serious work, this is often the trust layer that matters most. It answers whether required checks actually ran, not only whether the final message sounded confident.

Review and final acceptance are separate gates.

review_changes
verdict APPROVED
summary change matches the contract
final_acceptance
verdict APPROVED
ship_ready yes

This separation is intentional. A worker can complete implementation while a reviewer still rejects the delivery. A final acceptance gate can approve, reject, or require follow-up action.

The end of a run should summarize outcome and cost.

✓ plan · 254.1s · $1.522
Orcho prompt 2.9k tokens
Provider input 791.6k tokens (90% cached, ~$1.23 saved)
Runtime overhead 788.7k tokens
Response 17.5k tokens
Activity tools=13 calls
✓ implement:subtask:T1 · 359.9s · $2.837
Orcho prompt 4.1k tokens
Provider input 2.7M tokens (97% cached, ~$2.46 saved)
Runtime overhead 2.7M tokens
Response 27.0k tokens
Activity tools=34 calls

Usage and cost are not decoration. They help decide whether the profile depth was justified for the task. For a deeper operating view, read Cost accounting.

Use this page after the first-run pages:

  1. Profile semantics explains why profile comes first.
  2. Watch the run explains the live stream.
  3. Read the result explains post-run inspection.
  4. This page shows how the same ideas scale into a real feature run.
  5. Plan contract and DAG shows how a large blast radius becomes controlled subtasks and release gates.

The principle stays the same: start with the simple shape, then reveal the deeper machinery only when the reader is ready for it.