Plan contract and DAG implementation

Large Orcho runs should not collapse into one giant agent transcript.

The plan is a delivery contract. It names what must be true, which files are in scope, which commands prove the work, which risks matter, what reviewers must audit, and how implementation is split into dependent subtasks.

task plan contract validate subtask DAG acceptance

Read a large run from left to right. The task becomes a contract, the contract is reviewed before implementation, implementation runs as a dependency graph, and final acceptance checks whether the whole delivery is actually shippable.

Why this matters

One real feature run had a large blast radius: MCP schemas, status projection, diagnosis, evidence, summary, docs, tests, generated schema snapshots, and verification receipts.

The useful part is not only that an agent edited files. The useful part is that Orcho made the shape of the work visible before letting implementation proceed.

sanitized real runcontract size

Orcho Run  20260629_083410  feature
Task       MCP provider-pressure projection
Plugin     orcho-mcp

Verification
mode      pro
envs      mcp-local-core
policy    declared in contract (require, warn)
effect    require receipts; missing/failed resolved at gate time

Contract
acceptance    9
owned files   16
commands      5
risks         8
review focus  7
tasks         4

This is the signal to slow down. A run with many owned files and required receipts needs a plan contract, not a free-form coding session.

Plan sections

A good Orcho plan is not a narrative promise. It is a structured agreement.

Section	What it controls
Goal	The one outcome the run must produce.
Acceptance criteria	The observable facts that must be true at the end.
Owned files	The allowed blast radius and review target.
Commands	The verification receipts the run must produce.
Risks	The assumptions and forbidden shortcuts reviewers should watch.
Review focus	The checks reviewers must prioritize over generic style review.
Tasks	The executable decomposition, including dependencies.

For a large run, the task list is the bridge from planning to implementation. It should say which subtask owns which surface and which earlier subtask must finish before it can start.

Implementation execution policy

The approved plan does not automatically mean “run every subtask as a separate agent turn.” Orcho needs an implementation execution policy.

This policy answers one specific question:

How should the implement phase consume the approved task plan?

It is separate from runtime selection and verification policy:

runtime policy chooses which worker runtime/model handles a phase or subtask;
verification policy chooses which gates and receipts are required;
implementation execution policy chooses the shape of implementation itself.

The reader-facing mental model is:

Policy shape	What happens
`linear`	The implement phase runs as one controlled worker turn. The plan may still contain tasks, but they are guidance inside one implementation context.
`compact_dag`	The implement phase turns planned subtasks into controlled worker turns with dependencies, receipts, and attestations.

Current compact_dag execution is still sequential. The graph is not a promise of parallel work yet. Dependencies are useful today because they define order, compact context, upstream facts, and delivery blocking rules.

policy layerlinear vs compact DAG

Implementation execution policy
linear
  implement once
  one worker turn consumes the approved plan
  one implementation result goes to review

compact_dag
  parse task list into a dependency graph
  invoke each ready subtask as a focused worker turn
  require subtask receipts and done-criteria attestation
  block declared dependents when an upstream subtask is incomplete
  run sequentially today (concurrency=1)

This is the missing control plane between “the plan has subtasks” and “the implement phase actually runs those subtasks as separate controlled units.”

Isolation policy

Implementation execution policy answers how the plan is consumed. Isolation policy answers where the worker edits code.

This matters because Orcho has two different path concepts:

Path	Meaning
`{project}`	The canonical target repository the run is about.
`{checkout}`	The Orcho-provided checkout that commands and workers should treat as the current subject.

When per-run worktree isolation is active, {checkout} is a run-owned worktree and {project} remains the source repository. When isolation is off, the two collapse into the same checkout and the run edits the repository directly.

policy layerworktree vs direct checkout

Isolation policy
per_run worktree
  create a run-owned checkout under the workspace runspace
  run agents and verification against {checkout}
  preserve the diff as the review and repair subject
  apply or deliver back to {project} only through the delivery boundary

off / direct checkout
  {checkout} == {project}
  run agents directly in the target repository
  simpler for current-diff or small intentional edits
  lower protection against accidental source-checkout mutation

This policy is part of the risk model. A large feature run wants a retained worktree subject so review, repair, receipts, and delivery all point at the same diff. A current-diff audit or small direct edit may intentionally use the project checkout.

If the source checkout is dirty and an isolated run would start from HEAD, Orcho can ask how that dirty state should feed the run:

Pre-run intake - uncommitted changes in checkout
  1) include  seed the isolated run worktree with the current diff
  2) exclude  start the run from HEAD and leave local changes untouched
  3) commit   commit the checkout first, then start from that commit
  4) halt     stop before the run starts

For expert readers, this is the policy layer that prevents confusion between “the repo this task is about” and “the checkout this run is allowed to mutate.”

Budgeted plan validation

validate_plan exists to reject a weak plan before code is changed.

In the same run, the first plan versions were rejected. The important part is how the failure stayed controlled: automatic plan rounds were bounded, then a phase handoff asked the operator whether to continue, retry, halt, waive, or ask for advice.

plan gaterejected, then repaired

Plan validation
verdict  REJECTED
finding  F1 P1  RunStatus builder is not owned by the plan

Handoff (fired): validate_plan automatic round 2/2
policy   human_feedback_on_reject
action   retry_feedback
round    validate_plan human retry 1 after REJECTED verdict

Plan validation
verdict  REJECTED
finding  F1 P2  evidence is missing from the AC7 consistency check

Handoff (fired): validate_plan human retry 1 rejected
action   retry_feedback
round    validate_plan human retry 2 after REJECTED verdict

Plan validation
verdict  APPROVED
summary  plan defines one source, four public surfaces, and a cross-surface test

The operator did not have to accept a bad plan. The run paused at the decision point, recorded the choice, fed the reviewer findings back into planning, and continued only after approval.

Subtask DAG

Once the plan is approved, implementation can run as a dependency graph instead of one mixed pile of edits, but only when the implementation execution policy selects the compact DAG path.

implementationsubtask DAG

T1-projection-schema
owns: shared projection source, next-actions helper, wire model

T2-diagnose
depends_on: T1-projection-schema
owns: diagnose condition and safe next actions

T3-status-evidence-summary
depends_on: T1-projection-schema, T2-diagnose
owns: status, evidence, summary, live card, schema snapshot

T4-future-shape-blocker-doc
depends_on: T2-diagnose, T3-status-evidence-summary
owns: future-state fixtures and architecture documentation

The dependencies matter. They keep the worker from implementing summary before the projection source exists, or documenting a future condition before the observable public surfaces agree.

Controlled worker runs

In compact_dag implementation, every subtask is its own controlled runtime turn. It carries a goal, dependencies, done criteria, prompt size, session policy, and upstream count.

subtask runtimecontrolled turn

ORCHO subtask 3/4 START: T3-status-evidence-summary
goal: align status, evidence, diagnose, and summary from one source
runtime: claude
model: claude-opus-*
depends_on: T1-projection-schema, T2-diagnose
done_criteria: 7
prompt_chars: 17082
current_only: true
execution_context: compact_dag
prompt_turn: true
upstream_deps: 2

This is the main difference from asking one worker to “fix everything.” The subtask knows its local contract and its upstream dependencies.

The implementation output also reports how each subtask was rendered:

Subtask renders
  T1-projection-schema: full  (continue_session=false)
  T2-diagnose: delta          (continue_session=true)
  T3-status-evidence-summary: delta  (continue_session=true)
  T4-future-shape-blocker-doc: delta  (continue_session=true)

That makes session strategy visible. The first subtask receives the full local contract; later subtasks receive compact deltas plus their dependency context.

Self-attestation

After a subtask completes, the worker attests against its own done criteria. This is not final approval. It is the worker’s structured claim of completion.

subtask doneattestation

ORCHO subtask 3/4 DONE: T3-status-evidence-summary
attestation: met
[EXIT code=0 duration=4288.90s]

ORCHO subtask 3/4 ATTESTATION (met): T3-status-evidence-summary
7/7 done-criteria met

1. RunStatus.provider_pressure is filled by the real status builder.
2. Evidence carries the same typed provider-pressure object.
3. Summary carries provider_pressure without breaking legacy next_actions.
4. AC7 compares all four surfaces: status, evidence, diagnose, summary.
5. Generic failures remain provider_pressure == None on all four surfaces.
6. The MCP schema snapshot was regenerated and tested.
7. Unit, architecture, ruff, and diff checks were run.

The attestation gives reviewers a checklist. It does not replace review; it makes the review target sharper.

Review, repair, and final acceptance

Orcho separates implementation success from delivery readiness.

In the same run, review approved the implementation, but final acceptance still rejected the delivery because required receipts were missing or stale.

release gatecode ok, delivery blocked

Review
verdict  APPROVED
summary  no substantial defects found; surfaces share one projection/helper

Final acceptance
verdict     REJECTED
ship_ready  no
summary     required receipts are missing or stale

Correction gate
missing required receipts: mcp-mock-smoke
stale required receipts: env-provenance, lint

Contract status
task_contract  incomplete
interfaces     compatible
tests          weak

This is exactly why final acceptance is its own gate. A code reviewer can be satisfied while the delivery protocol still blocks release.

What to copy into your own tasks

For complex work, ask for a plan that names:

acceptance criteria as observable facts;
owned files and explicit out-of-scope boundaries;
required verification commands and receipt policy;
risks and falsifiers;
review focus;
subtasks with depends_on;
done criteria per subtask;
what counts as release-ready evidence.

For small work, this much structure is unnecessary. Use a lighter profile and keep the live run readable. Orcho’s point is not to make every task heavy; it is to make the amount of control match the blast radius.

Run anatomy explains the whole live output shape.
Handoffs and advisors explains operator decision points.
Verification receipts explains why receipts can block delivery.
Cost accounting explains why large DAG subtasks should be measured.