Skip to content

Gates and verification

Orcho does not trust an agent’s claim that work is ready. It trusts declared checks, executed in a declared environment, recorded as durable receipts.

profile declares a gate → engine runs it after the phase
gate produces a typed result → fail strategy applied as data
contract declares the proof → receipts written on disk
final acceptance reads receipts, not the worker's last message

This page explains that boundary: what a gate is, what counts as proof, and why a confident final message is never enough.

A quality gate is a registered post-phase check that produces a typed QualityGateResult and applies a declarative fail policy to the run state. The profile declares which gates run after which phase; the engine executes them and persists the result into the phase log.

The fail strategy is also data. The engine reads the strategy enum and mutates state accordingly — there is no ad-hoc branching code per gate. Gates run after the phase handler, and a halting gate short-circuits the remaining gates after the halted phase is persisted.

KindExamplesCostScheduling
computationalshell exit-code checks: tests, lint, type check, compile, format checkwall-clock and CPU, cost near zeroinline; blocks the phase
inferentialLLM judges: security review, spec compliance, code review by LLMwall-clock plus tokensbatched; can be parallelised

The distinction matters for cost accounting — cost_usd is only meaningful for inferential gates — and for scheduling. Computational gates often share a project-level lock; inferential gates can run in waves.

The built-in registry ships one computational gate, tests. Inferential gates are an extension surface for third-party plugins.

QualityGate.on_fail is a FailStrategy enum:

StrategyEffectWhen it fits
HALTThe run stops immediately.Failures that make continuing pointless.
FEED_INTO_NEXTThe next phase consumes the failure as input.Test failures that should become the fix prompt.
TRIGGER_REPLANThe failure is treated as critique; the round counter advances and a replan prompt fires next iteration.Failures that mean the plan, not the patch, is wrong.
INFORMATIONALLogged and persisted for audit; the run continues unchanged.Signals worth recording that should never block.

A HALT gate ends the run. It is distinct from a phase handoff, which pauses the run at a declared decision point and resumes from an operator decision. The two mechanisms are independent and do not overlap.

A gate says what must be true. That is not enough on its own: a test can pass against the wrong checkout, the wrong interpreter, or a stale tree, and still print green. The verification contract is project-level configuration that says what counts as proof and in which environment that proof is valid.

The core rule:

Agents may run any native tools while debugging.
Readiness is proven only by declared verification commands executed in the
declared verification environment.

The contract keeps three concepts deliberately distinct:

Quality gate = what must be true.
Verification environment = where and against what it is valid.
Receipt = proof that the native command ran in that environment.

A verification environment names the subject under test — which interpreter, working directory, paths, and dependency checkouts make a result meaningful. Its declared assertions (import path checks, file and command existence, version checks) can be executed on demand, producing an env-assertion receipt. This ends the dispute where an implementer and a reviewer are “both right against different subjects” because each ran a host command against a different checkout.

Declared verification commands are not a new test framework. They are native commands — argv, environment, assertions — whose execution is recorded as a durable command receipt. A verification.required list names the commands that form the required gate.

Missing, failed, and stale are different facts

Section titled “Missing, failed, and stale are different facts”

Receipt classification distinguishes states that a plain pass/fail check collapses:

StatusMeaning
presentThe receipt exists, passed, and matches the current checkout.
missingNo receipt on disk. The check never ran; nothing failed.
failedThe receipt records a non-zero exit or a failed declared assertion.
staleThe receipt passed, but the checkout has changed since — or a depended-on dependency checkout moved.

This is the difference between “the tests failed” and “nobody can prove the tests ran against this diff”. Both block a required gate, but they call for different next actions: a failed check needs a fix, a missing or stale receipt needs a re-run.

The final_acceptance reviewer receives a readiness summary built from the receipts on disk: environment status, the delivery-relevant gates, and the required receipts classified present, missing, failed, or stale. Its policy is explicit:

Readiness blockers should be based on missing/failed/stale/invalid declared
receipts, not only an ad-hoc host command mismatch.

Exploratory commands the agent ran while debugging are counted and labelled as not authoritative. Reviewers read receipts, not re-runs; a narrative “tests pass” is not proof.

The summary is advisory, but the classification behind it is load-bearing: after parsing the release verdict, the engine merges its own computed gaps — one per required delivery command classified missing, failed, or stale — and forces the acceptance to a rejection. A reviewer that omits an unproven required gate cannot produce a green acceptance.

For the on-disk receipt shape and worked examples, read Verification receipts.

The canonical engineering docs live with the code: