Recovery and resume
Orcho recovery is state-driven. The correct next action depends on why the run stopped.
Resume is not a generic “try again” button. It is one recovery mode among several.
Common situations
Section titled “Common situations”| Situation | Usual next action |
|---|---|
| Worker process interrupted | Resume the run. |
| Provider access failed | Fix runtime access, then resume with the right runtime. |
| Phase awaits handoff | Record a handoff decision, then resume. |
| Final acceptance rejected | Start a correction follow-up. |
| Delivery decision parked | Decide the delivery gate. |
| Scope or provenance blocker | Inspect evidence before overriding. |
Resume modes
Section titled “Resume modes”Orcho separates checkpoint restore from real follow-up work.
| Mode | Trigger | Keeps | Changes |
|---|---|---|---|
CHECKPOINT | orcho run --resume <run-id> with no new task | The same run directory, checkpoint store, completed phase records, and original task. | Starts a new process to continue the same run. |
FOLLOWUP | orcho run --resume <run-id> --task ... or a typed follow-up action | Parent run id, parent status, base task, available session seeds, evidence, and retained context. | Creates a new child run with a new task and new run directory. |
from-run-plan | orcho run --from-run-plan <run-id> | The parent parsed_plan.json and inherited project/task metadata when omitted. | Starts a new run after the planning block. |
All three preserve continuity. Only checkpoint resume keeps the same run as the active subject.
Checkpoint restore example
Section titled “Checkpoint restore example”Use checkpoint resume when a run stopped before its lifecycle was finished.
orcho status --workspace ~/www/my-workspace/.orcho
orcho run \ --workspace ~/www/my-workspace/.orcho \ --resume 20260628_125026 \ --output liveThe resumed process loads the existing checkpoint and skips phases that already completed. If the run pauses again on a handoff, decide the handoff first and then resume.
Real checkpoint restore
Section titled “Real checkpoint restore”This sanitized excerpt is from a real interrupted feature run. The important
signal is not just resumed; it is the checkpoint line and the pipeline DSL:
completed phases are marked with ✓, the active phase is marked with ▶, and
Orcho continues from the next unfinished phase.
orcho run \
--resume 20260629_231549 \
--workspace /repo/workspace-orchestrator
Run 20260629_231549 did not finish (status: interrupted).
What do you want to do?
1) Resume from checkpoint [default]
Continue the same run from saved checkpoints.
2) Start a follow-up using this run as context
Start a new run with parent context.
3) Exit
Choice [1/2/3]: 1
Orcho Run 20260629_231549 feature resumed
State
session auto rounds=1 plan=yes
checkpoint 5 phases completed: plan, validate_plan, plan, validate_plan, implement
output /repo/workspace/runspace/runs/20260629_231549/output.log
events /repo/workspace/runspace/runs/20260629_231549/events.jsonl
Pipeline
⟳² (✓ plan [Claude] → ✓ validate_plan [Codex]) → ✓ implement [Claude]
→ ⟳² (▶ review_changes [Codex] → · repair_changes [Claude])
→ · final_acceptance [Codex]
worktree: retained retry subject /repo/workspace/runspace/worktrees/wt_20260629_231549/checkout
✓ Resuming from checkpoint: 5 phases completed
[PLAN] PLAN -- architect creates MD artifacts
↳ skipped: completed earlier in this run (resumed)
[VALIDATE_PLAN] VALIDATE PLAN -- reviewer audits the plan
↳ skipped: completed earlier in this run (resumed)
[IMPLEMENT] IMPLEMENT -- developer applies the change
↳ skipped: completed earlier in this run (resumed)
[REVIEW_CHANGES] review_changes -- Round 1
→ runtime=codex · model=gpt-5.5 · mode=read · session=fresh- What survived
- The same run directory, retained worktree, original task, checkpoint store, completed phase records, output log, and event stream.
- What restarted
- A new process continued the lifecycle; unfinished phases can use fresh provider sessions while Orcho preserves the run context.
This is why checkpoint restore is different from a follow-up. The subject is still the same run. The already completed phases are evidence, not work to repeat.
Correction follow-up example
Section titled “Correction follow-up example”Use a follow-up when the previous run reached a real decision and the next step is a new correction task.
orcho evidence \ --workspace ~/www/my-workspace/.orcho \ --format md
orcho run \ --workspace ~/www/my-workspace/.orcho \ --resume 20260628_125026 \ --task "Address final acceptance blocker R1: add the missing contract test and rerun the focused suite." \ --output liveThis is intentionally not the same as checkpoint restore. The parent stays as evidence; the child run carries the correction.
Plan follow-up example
Section titled “Plan follow-up example”Use a plan follow-up when planning already succeeded and the implementation run should inherit that plan.
orcho run \ --workspace ~/www/my-workspace/.orcho \ --from-run-plan 20260628_125026 \ --profile feature \ --output liveThis creates a new run that skips the parent planning block and starts from the first downstream phase.
Recovery surfaces
Section titled “Recovery surfaces”Use the highest-level surface available:
- CLI:
orcho status,orcho evidence; - MCP:
orcho_run_status,orcho_run_diagnose,orcho_delivery_gate; - artifacts:
meta.json,events.jsonl, receipts, anddiff.patch.
The rule of thumb: do not infer from terminal text when a typed run-control surface already exists.
Decision rule
Section titled “Decision rule”Ask one question first: did the same run stop mid-lifecycle, or did it finish with a decision that requires new work?
- Mid-lifecycle interruption: checkpoint resume.
- Recorded handoff decision: resume after the decision is written.
- Rejected final acceptance: correction follow-up.
- Persisted plan that should become implementation:
from-run-plan. - Unknown or unsafe state: diagnose before launching anything.
For rejected delivery states, read Correction follow-ups. For paused decision points, read Handoffs and advisors.