Evidence bundle

Durable proof surface

Evidence is the recorded explanation of a run. Logs tell you what happened in time order. Evidence answers the delivery questions: what was planned, what changed, what was checked, what failed, and what decision the run reached.

artifacts summary source readers next

Read a result Artifact reference

recorded resultnot just logsinspectable later

run directoryartifact map

runs/20260629_1427/
output.log
events.jsonl
meta.json
metrics.json
evidence.json
plan.md
review.json
diff.patch
receipts/
  lint.json
  verification-unit.json

Start with projected status/evidence views. Read raw files when debugging, reviewing, or building integrations.

Artifact map

An Orcho run writes artifacts under its run directory. The exact set can vary by profile and phase outcome, but a feature-shaped run commonly leaves this shape:

runs/20260629_1427/
  output.log        human-readable live stream transcript
  events.jsonl      durable progress facts
  meta.json         run state and phase summaries
  metrics.json      timing, tokens, and API-equivalent cost when available
  evidence.json     recorded delivery explanation
  plan.md           accepted plan or latest plan artifact
  review.json       structured review result when available
  diff.patch        durable patch for the held run diff
  receipts/         verification command receipts

Use Artifacts for the raw file contract. This page is about how the bundle answers delivery questions.

Evidence slice

A useful evidence projection should be compact. This sanitized slice shows the kind of information an operator should be able to recover without replaying the whole terminal:

{
  "run_id": "20260629_1427",
  "profile": "feature",
  "task": "Add validation to the login endpoint",
  "plan": {
    "status": "accepted",
    "owned_files": ["api/auth.py", "tests/test_auth.py"]
  },
  "implementation": {
    "changed_files": ["api/auth.py", "tests/test_auth.py"]
  },
  "review": {
    "verdict": "REJECTED",
    "blocker": "missing negative-path test",
    "required_fix": "add regression coverage for invalid input"
  },
  "repair": {
    "summary": "added missing negative-path test"
  },
  "final_acceptance": {
    "verdict": "APPROVED",
    "ship_ready": true
  },
  "artifacts": {
    "output": "output.log",
    "events": "events.jsonl",
    "diff": "diff.patch"
  }
}

The values are sanitized, but the shape is intentionally concrete: task, profile, plan, changed files, review blocker, repair, final gate, artifacts.

Source of truth

Evidence has layers.

Layer	Use it for	Notes
`events.jsonl`	Progress facts and phase history.	Best for observers and reconnecting clients.
`output.log`	Human-readable live stream replay.	Best when you need the operator narrative.
`evidence.json` / projected evidence	Delivery explanation.	Best for review, handoff, and later recall.
`diff.patch`	The actual proposed code change.	Best for code review and apply/retry flows.
receipts	Proof that checks ran.	Best for final acceptance and audit of verification.

The CLI, MCP, and Web surfaces should project these artifacts. They should not replace them. If projections disagree with raw artifacts, the run directory is the lower-level source to inspect.

Reader paths

Different readers start in different places.

OperatorStarts with status and evidence: current state, final verdict, next action, artifact paths.ReviewerChecks the diff, review findings, receipts, and whether final acceptance had enough proof.MCP clientUses typed status/evidence/event surfaces instead of scraping terminal text.Technical leadReads outcome, retry rate, usage, and API-equivalent cost against delivery value.

Evidence is separate from logs

Logs are chronological. Evidence is interpretive.

The evidence bundle is shaped for questions such as:

What was the task?
What changed?
What was checked?
What failed?
What is the next action?

Use logs when you need raw details. Use evidence when you need operational truth.

When a run used a handoff advisor, evidence can include a handoff_advice section. It records recommended actions, applied actions, outcomes, and observe-only advisor usage. Read Handoffs and advisors for the operator model.

Read the result for the first post-run workflow.
Artifacts for raw files.
Events for durable progress facts.
Verification receipts for check proof.
False-ready delivery for why gates matter.