False-ready delivery

Trust boundary

False-ready delivery is the state where a worker appears finished, but the delivery system cannot prove the result is ready. Orcho exists to keep that state visible instead of letting it become a confident final message.

worker says done review gate repair final gate evidence

Read run gates Correction follow-ups

agent done is not enoughgate decides readinessevidence survives

sanitized real patternrejected -> repair

[IMPLEMENT] edited api/auth.py and tests

[REVIEW_CHANGES] verdict=REJECTED
blocker  missing negative-path test
fix      add regression coverage for invalid input

[REPAIR_CHANGES] added missing negative-path test

[FINAL_ACCEPTANCE] verdict=APPROVED
ship_ready  yes

The worker produced a change. Orcho still blocked readiness until the review finding was repaired and the final gate accepted the delivery.

Worker claim

A worker runtime can finish its implementation phase and still leave the delivery unready.

[IMPLEMENT] edited api/auth.py and tests

That line is useful, but it is not a release decision. It says work happened. It does not prove that review passed, required checks ran, or final acceptance approved the result.

Review gate

The review gate is where false-ready delivery becomes visible.

[REVIEW_CHANGES] verdict=REJECTED
  blocker  missing negative-path test
  fix      add regression coverage for invalid input

A normal agent transcript can drift from implementation into a confident final message. Orcho keeps the gate separate. The run can say:

the worker made a change;
the change is not delivery-ready yet;
this exact blocker must be fixed before the run can proceed.

Repair

Repair is not a new vague prompt. It is a continuation of the delivery contract.

[REPAIR_CHANGES] added missing negative-path test

The point is not that rejection is good. The point is that rejection becomes a controlled state with a next action instead of hidden operator anxiety.

When the correction needs a new run, use Correction follow-ups to carry the parent context, blocker, diff, and evidence forward.

Final gate

Final acceptance is the last readiness decision.

[FINAL_ACCEPTANCE] verdict=APPROVED
  ship_ready  yes

If final acceptance rejects, the run should not look ready. It should preserve the blocker and tell the operator whether to repair, follow up, resume, or halt.

Read Run anatomy for where review and final acceptance appear in the live stream.

When code review passes but delivery is still blocked

The strongest version of false-ready is not “the agent looked done.” It is “review approved the code, and delivery was still blocked.” Review and final acceptance are separate gates for exactly this reason: a reviewer can be satisfied with the diff while required verification is missing or stale.

release gatecode ok, delivery blocked

Review
verdict  APPROVED
summary  no substantial defects found

Final acceptance
verdict     REJECTED
ship_ready  no
summary     required receipts are missing or stale

Correction gate
missing required receipts: mcp-mock-smoke
stale required receipts: env-provenance, lint

Contract status
task_contract  incomplete
tests          weak

A confident final message would have shipped this. Orcho holds it: the code was fine, but the proof that it works was not there.

Read Plan contract and DAG for the same gate inside a full run.

Evidence

For a rejected or blocked run, inspect:

final acceptance verdict;
release blockers;
verification receipts;
diff summary;
delivery gate state;
recommended correction or recovery action.

The durable proof surface lives in artifacts such as:

output.log
events.jsonl
plan.md
review.json
diff.patch
receipts/

A receipt is what turns “tests passed” from a sentence into something inspectable. Each one records where and how a check actually ran, so final acceptance can tell apart passed, failed, ran in the wrong place, and never ran:

[
  {
    "name": "lint",
    "status": "stale",
    "command": "ruff check api/",
    "working_dir": "/repo/app",
    "interpreter": "python3.12",
    "source": "api/auth.py",
    "result": "passed against an earlier diff; re-run required",
    "provenance": "ran before the last edit"
  },
  {
    "name": "mcp-mock-smoke",
    "status": "missing",
    "result": "required by the profile, but no receipt was produced"
  },
  {
    "name": "verification-unit",
    "status": "passed",
    "command": "pytest tests/test_auth.py",
    "working_dir": "/repo/app",
    "interpreter": "python3.12",
    "source": "tests/test_auth.py",
    "result": "ran in the expected checkout and passed"
  }
]

The values are sanitized, but the shape is the point: missing and stale receipts are why the run above blocked delivery even though the code review approved the diff.

Use Evidence bundle for the artifact model and Verification receipts when the key question is whether checks ran in the right environment.

What false-ready usually looks like

False-ready delivery can happen when:

required tests were not run;
review found a blocker;
final acceptance rejected;
verification ran against the wrong tree;
the diff touched files outside the declared scope;
a delivery decision is still parked.

The point is not to celebrate rejection. The point is to keep the operator from shipping a result that only looked complete in the worker’s last message.

Run anatomy shows the deeper run stream.
Evidence bundle explains durable artifacts.
Verification receipts explains proof that checks ran.
Correction follow-ups explains recovery after rejection.