Skip to content

False-ready delivery

Trust boundary

False-ready delivery is the state where a worker appears finished, but the delivery system cannot prove the result is ready. Orcho exists to keep that state visible instead of letting it become a confident final message.

agent done is not enoughgate decides readinessevidence survives

A worker runtime can finish its implementation phase and still leave the delivery unready.

[IMPLEMENT] edited api/auth.py and tests

That line is useful, but it is not a release decision. It says work happened. It does not prove that review passed, required checks ran, or final acceptance approved the result.

The review gate is where false-ready delivery becomes visible.

[REVIEW_CHANGES] verdict=REJECTED
blocker missing negative-path test
fix add regression coverage for invalid input

A normal agent transcript can drift from implementation into a confident final message. Orcho keeps the gate separate. The run can say:

  • the worker made a change;
  • the change is not delivery-ready yet;
  • this exact blocker must be fixed before the run can proceed.

Repair is not a new vague prompt. It is a continuation of the delivery contract.

[REPAIR_CHANGES] added missing negative-path test

The point is not that rejection is good. The point is that rejection becomes a controlled state with a next action instead of hidden operator anxiety.

When the correction needs a new run, use Correction follow-ups to carry the parent context, blocker, diff, and evidence forward.

Final acceptance is the last readiness decision.

[FINAL_ACCEPTANCE] verdict=APPROVED
ship_ready yes

If final acceptance rejects, the run should not look ready. It should preserve the blocker and tell the operator whether to repair, follow up, resume, or halt.

Read Run anatomy for where review and final acceptance appear in the live stream.

When code review passes but delivery is still blocked

Section titled “When code review passes but delivery is still blocked”

The strongest version of false-ready is not “the agent looked done.” It is “review approved the code, and delivery was still blocked.” Review and final acceptance are separate gates for exactly this reason: a reviewer can be satisfied with the diff while required verification is missing or stale.

release gatecode ok, delivery blocked
Review
verdict  APPROVED
summary  no substantial defects found

Final acceptance
verdict     REJECTED
ship_ready  no
summary     required receipts are missing or stale

Correction gate
missing required receipts: mcp-mock-smoke
stale required receipts: env-provenance, lint

Contract status
task_contract  incomplete
tests          weak

A confident final message would have shipped this. Orcho holds it: the code was fine, but the proof that it works was not there.

Read Plan contract and DAG for the same gate inside a full run.

For a rejected or blocked run, inspect:

  1. final acceptance verdict;
  2. release blockers;
  3. verification receipts;
  4. diff summary;
  5. delivery gate state;
  6. recommended correction or recovery action.

The durable proof surface lives in artifacts such as:

output.log
events.jsonl
plan.md
review.json
diff.patch
receipts/

A receipt is what turns “tests passed” from a sentence into something inspectable. Each one records where and how a check actually ran, so final acceptance can tell apart passed, failed, ran in the wrong place, and never ran:

[
{
"name": "lint",
"status": "stale",
"command": "ruff check api/",
"working_dir": "/repo/app",
"interpreter": "python3.12",
"source": "api/auth.py",
"result": "passed against an earlier diff; re-run required",
"provenance": "ran before the last edit"
},
{
"name": "mcp-mock-smoke",
"status": "missing",
"result": "required by the profile, but no receipt was produced"
},
{
"name": "verification-unit",
"status": "passed",
"command": "pytest tests/test_auth.py",
"working_dir": "/repo/app",
"interpreter": "python3.12",
"source": "tests/test_auth.py",
"result": "ran in the expected checkout and passed"
}
]

The values are sanitized, but the shape is the point: missing and stale receipts are why the run above blocked delivery even though the code review approved the diff.

Use Evidence bundle for the artifact model and Verification receipts when the key question is whether checks ran in the right environment.

False-ready delivery can happen when:

  • required tests were not run;
  • review found a blocker;
  • final acceptance rejected;
  • verification ran against the wrong tree;
  • the diff touched files outside the declared scope;
  • a delivery decision is still parked.

The point is not to celebrate rejection. The point is to keep the operator from shipping a result that only looked complete in the worker’s last message.