Recovery and resume

Orcho recovery is state-driven. The correct next action depends on why the run stopped.

Resume is not a generic “try again” button. It is one recovery mode among several.

Common situations

Situation	Usual next action
Worker process interrupted	Resume the run.
Provider access failed	Fix runtime access, then resume with the right runtime.
Phase awaits handoff	Record a handoff decision, then resume.
Final acceptance rejected	Start a correction follow-up.
Delivery decision parked	Decide the delivery gate.
Scope or provenance blocker	Inspect evidence before overriding.

Resume modes

Orcho separates checkpoint restore from real follow-up work.

Mode	Trigger	Keeps	Changes
`CHECKPOINT`	`orcho run --resume <run-id>` with no new task	The same run directory, checkpoint store, completed phase records, and original task.	Starts a new process to continue the same run.
`FOLLOWUP`	`orcho run --resume <run-id> --task ...` or a typed follow-up action	Parent run id, parent status, base task, available session seeds, evidence, and retained context.	Creates a new child run with a new task and new run directory.
`from-run-plan`	`orcho run --from-run-plan <run-id>`	The parent `parsed_plan.json` and inherited project/task metadata when omitted.	Starts a new run after the planning block.

All three preserve continuity. Only checkpoint resume keeps the same run as the active subject.

Checkpoint restore example

Use checkpoint resume when a run stopped before its lifecycle was finished.

orcho status --workspace ~/www/my-workspace/.orcho

orcho run \
  --workspace ~/www/my-workspace/.orcho \
  --resume 20260628_125026 \
  --output live

The resumed process loads the existing checkpoint and skips phases that already completed. If the run pauses again on a handoff, decide the handoff first and then resume.

Real checkpoint restore

This sanitized excerpt is from a real interrupted feature run. The important signal is not just resumed; it is the checkpoint line and the pipeline DSL: completed phases are marked with ✓, the active phase is marked with ▶, and Orcho continues from the next unfinished phase.

recovery modecheckpoint restore

orcho run \
--resume 20260629_231549 \
--workspace /repo/workspace-orchestrator

Run 20260629_231549 did not finish (status: interrupted).
What do you want to do?
1) Resume from checkpoint  [default]
   Continue the same run from saved checkpoints.
2) Start a follow-up using this run as context
   Start a new run with parent context.
3) Exit
Choice [1/2/3]: 1

Orcho Run  20260629_231549  feature  resumed

State
session     auto  rounds=1  plan=yes
checkpoint  5 phases completed: plan, validate_plan, plan, validate_plan, implement
output      /repo/workspace/runspace/runs/20260629_231549/output.log
events      /repo/workspace/runspace/runs/20260629_231549/events.jsonl

Pipeline
⟳² (✓ plan [Claude] → ✓ validate_plan [Codex]) → ✓ implement [Claude]
  → ⟳² (▶ review_changes [Codex] → · repair_changes [Claude])
  → · final_acceptance [Codex]

worktree: retained retry subject /repo/workspace/runspace/worktrees/wt_20260629_231549/checkout
✓ Resuming from checkpoint: 5 phases completed

[PLAN] PLAN -- architect creates MD artifacts
↳ skipped: completed earlier in this run (resumed)

[VALIDATE_PLAN] VALIDATE PLAN -- reviewer audits the plan
↳ skipped: completed earlier in this run (resumed)

[IMPLEMENT] IMPLEMENT -- developer applies the change
↳ skipped: completed earlier in this run (resumed)

[REVIEW_CHANGES] review_changes -- Round 1
→ runtime=codex · model=gpt-5.5 · mode=read · session=fresh

What survived: The same run directory, retained worktree, original task, checkpoint store, completed phase records, output log, and event stream.
What restarted: A new process continued the lifecycle; unfinished phases can use fresh provider sessions while Orcho preserves the run context.

This is why checkpoint restore is different from a follow-up. The subject is still the same run. The already completed phases are evidence, not work to repeat.

Correction follow-up example

Use a follow-up when the previous run reached a real decision and the next step is a new correction task.

orcho evidence \
  --workspace ~/www/my-workspace/.orcho \
  --format md

orcho run \
  --workspace ~/www/my-workspace/.orcho \
  --resume 20260628_125026 \
  --task "Address final acceptance blocker R1: add the missing contract test and rerun the focused suite." \
  --output live

This is intentionally not the same as checkpoint restore. The parent stays as evidence; the child run carries the correction.

Plan follow-up example

Use a plan follow-up when planning already succeeded and the implementation run should inherit that plan.

orcho run \
  --workspace ~/www/my-workspace/.orcho \
  --from-run-plan 20260628_125026 \
  --profile feature \
  --output live

This creates a new run that skips the parent planning block and starts from the first downstream phase.

Recovery surfaces

Use the highest-level surface available:

CLI: orcho status, orcho evidence;
MCP: orcho_run_status, orcho_run_diagnose, orcho_delivery_gate;
artifacts: meta.json, events.jsonl, receipts, and diff.patch.

The rule of thumb: do not infer from terminal text when a typed run-control surface already exists.

Decision rule

Ask one question first: did the same run stop mid-lifecycle, or did it finish with a decision that requires new work?

Mid-lifecycle interruption: checkpoint resume.
Recorded handoff decision: resume after the decision is written.
Rejected final acceptance: correction follow-up.
Persisted plan that should become implementation: from-run-plan.
Unknown or unsafe state: diagnose before launching anything.

For rejected delivery states, read Correction follow-ups. For paused decision points, read Handoffs and advisors.