Skip to content

Evidence bundle

Durable proof surface

Evidence is the recorded explanation of a run. Logs tell you what happened in time order. Evidence answers the delivery questions: what was planned, what changed, what was checked, what failed, and what decision the run reached.

recorded resultnot just logsinspectable later

An Orcho run writes artifacts under its run directory. The exact set can vary by profile and phase outcome, but a feature-shaped run commonly leaves this shape:

runs/20260629_1427/
output.log human-readable live stream transcript
events.jsonl durable progress facts
meta.json run state and phase summaries
metrics.json timing, tokens, and API-equivalent cost when available
evidence.json recorded delivery explanation
plan.md accepted plan or latest plan artifact
review.json structured review result when available
diff.patch durable patch for the held run diff
receipts/ verification command receipts

Use Artifacts for the raw file contract. This page is about how the bundle answers delivery questions.

A useful evidence projection should be compact. This sanitized slice shows the kind of information an operator should be able to recover without replaying the whole terminal:

{
"run_id": "20260629_1427",
"profile": "feature",
"task": "Add validation to the login endpoint",
"plan": {
"status": "accepted",
"owned_files": ["api/auth.py", "tests/test_auth.py"]
},
"implementation": {
"changed_files": ["api/auth.py", "tests/test_auth.py"]
},
"review": {
"verdict": "REJECTED",
"blocker": "missing negative-path test",
"required_fix": "add regression coverage for invalid input"
},
"repair": {
"summary": "added missing negative-path test"
},
"final_acceptance": {
"verdict": "APPROVED",
"ship_ready": true
},
"artifacts": {
"output": "output.log",
"events": "events.jsonl",
"diff": "diff.patch"
}
}

The values are sanitized, but the shape is intentionally concrete: task, profile, plan, changed files, review blocker, repair, final gate, artifacts.

Evidence has layers.

LayerUse it forNotes
events.jsonlProgress facts and phase history.Best for observers and reconnecting clients.
output.logHuman-readable live stream replay.Best when you need the operator narrative.
evidence.json / projected evidenceDelivery explanation.Best for review, handoff, and later recall.
diff.patchThe actual proposed code change.Best for code review and apply/retry flows.
receiptsProof that checks ran.Best for final acceptance and audit of verification.

The CLI, MCP, and Web surfaces should project these artifacts. They should not replace them. If projections disagree with raw artifacts, the run directory is the lower-level source to inspect.

Different readers start in different places.

OperatorStarts with status and evidence: current state, final verdict, next action, artifact paths.ReviewerChecks the diff, review findings, receipts, and whether final acceptance had enough proof.MCP clientUses typed status/evidence/event surfaces instead of scraping terminal text.Technical leadReads outcome, retry rate, usage, and API-equivalent cost against delivery value.

Logs are chronological. Evidence is interpretive.

The evidence bundle is shaped for questions such as:

  • What was the task?
  • What changed?
  • What was checked?
  • What failed?
  • What is the next action?

Use logs when you need raw details. Use evidence when you need operational truth.

When a run used a handoff advisor, evidence can include a handoff_advice section. It records recommended actions, applied actions, outcomes, and observe-only advisor usage. Read Handoffs and advisors for the operator model.