Gates and verification

Orcho does not trust an agent’s claim that work is ready. It trusts declared checks, executed in a declared environment, recorded as durable receipts.

profile declares a gate      →  engine runs it after the phase
gate produces a typed result →  fail strategy applied as data
contract declares the proof  →  receipts written on disk
final acceptance reads receipts, not the worker's last message

This page explains that boundary: what a gate is, what counts as proof, and why a confident final message is never enough.

Gates are data, not code

A quality gate is a registered post-phase check that produces a typed QualityGateResult and applies a declarative fail policy to the run state. The profile declares which gates run after which phase; the engine executes them and persists the result into the phase log.

The fail strategy is also data. The engine reads the strategy enum and mutates state accordingly — there is no ad-hoc branching code per gate. Gates run after the phase handler, and a halting gate short-circuits the remaining gates after the halted phase is persisted.

Two kinds of gate

Kind	Examples	Cost	Scheduling
computational	shell exit-code checks: tests, lint, type check, compile, format check	wall-clock and CPU, cost near zero	inline; blocks the phase
inferential	LLM judges: security review, spec compliance, code review by LLM	wall-clock plus tokens	batched; can be parallelised

The distinction matters for cost accounting — cost_usd is only meaningful for inferential gates — and for scheduling. Computational gates often share a project-level lock; inferential gates can run in waves.

The built-in registry ships one computational gate, tests. Inferential gates are an extension surface for third-party plugins.

Four fail strategies

QualityGate.on_fail is a FailStrategy enum:

Strategy	Effect	When it fits
`HALT`	The run stops immediately.	Failures that make continuing pointless.
`FEED_INTO_NEXT`	The next phase consumes the failure as input.	Test failures that should become the fix prompt.
`TRIGGER_REPLAN`	The failure is treated as critique; the round counter advances and a replan prompt fires next iteration.	Failures that mean the plan, not the patch, is wrong.
`INFORMATIONAL`	Logged and persisted for audit; the run continues unchanged.	Signals worth recording that should never block.

A HALT gate ends the run. It is distinct from a phase handoff, which pauses the run at a declared decision point and resumes from an operator decision. The two mechanisms are independent and do not overlap.

The verification contract

A gate says what must be true. That is not enough on its own: a test can pass against the wrong checkout, the wrong interpreter, or a stale tree, and still print green. The verification contract is project-level configuration that says what counts as proof and in which environment that proof is valid.

The core rule:

Agents may run any native tools while debugging.
Readiness is proven only by declared verification commands executed in the
declared verification environment.

The contract keeps three concepts deliberately distinct:

Quality gate             = what must be true.
Verification environment = where and against what it is valid.
Receipt                  = proof that the native command ran in that environment.

A verification environment names the subject under test — which interpreter, working directory, paths, and dependency checkouts make a result meaningful. Its declared assertions (import path checks, file and command existence, version checks) can be executed on demand, producing an env-assertion receipt. This ends the dispute where an implementer and a reviewer are “both right against different subjects” because each ran a host command against a different checkout.

Declared verification commands are not a new test framework. They are native commands — argv, environment, assertions — whose execution is recorded as a durable command receipt. A verification.required list names the commands that form the required gate.

Missing, failed, and stale are different facts

Receipt classification distinguishes states that a plain pass/fail check collapses:

Status	Meaning
present	The receipt exists, passed, and matches the current checkout.
missing	No receipt on disk. The check never ran; nothing failed.
failed	The receipt records a non-zero exit or a failed declared assertion.
stale	The receipt passed, but the checkout has changed since — or a depended-on dependency checkout moved.

This is the difference between “the tests failed” and “nobody can prove the tests ran against this diff”. Both block a required gate, but they call for different next actions: a failed check needs a fix, a missing or stale receipt needs a re-run.

Final acceptance reads receipts

The final_acceptance reviewer receives a readiness summary built from the receipts on disk: environment status, the delivery-relevant gates, and the required receipts classified present, missing, failed, or stale. Its policy is explicit:

Readiness blockers should be based on missing/failed/stale/invalid declared
receipts, not only an ad-hoc host command mismatch.

Exploratory commands the agent ran while debugging are counted and labelled as not authoritative. Reviewers read receipts, not re-runs; a narrative “tests pass” is not proof.

The summary is advisory, but the classification behind it is load-bearing: after parsing the release verdict, the engine merges its own computed gaps — one per required delivery command classified missing, failed, or stale — and forces the acceptance to a rejection. A reviewer that omits an unproven required gate cannot produce a green acceptance.

For the on-disk receipt shape and worked examples, read Verification receipts.

Deep reference

The canonical engineering docs live with the code: