Cost accounting

Cost accounting is optional. Orcho can run without dollar estimates and still record tokens, duration, phases, artifacts, and evidence.

Enable accounting when cost becomes an operating question: which runs were expensive, which phases consumed the most, whether a profile is too heavy for the task, and how your workload compares with subscription-style access.

Run economics

Orcho reports API-equivalent cost: what a metered API would likely have charged for the same token workload. Treat it as operational accounting, not as a provider bill.

tokenstoken planescachedurationphase costprovider mixwhat-if

first useful reportcost

ORCHO_ACCOUNTING=1 orcho run \
--project ~/www/my-workspace/my-project \
--task "Add input validation to the login endpoint." \
--profile feature

orcho cost --window 30d
orcho pricing show

Use cost reports after several real runs. One run tells you little; a window shows workload shape.

What Orcho records

The durable run artifacts include metrics.json. Depending on provider output and local accounting settings, it can include:

tokens in, tokens out, and total tokens;
duration and phase attempts;
per-phase and per-agent breakdown;
API-equivalent cost when available;
whether a cost was estimated from local pricing.

The key phrase is API-equivalent. On a subscription-style provider account, your marginal user-facing bill for a run may be zero. Orcho’s dollar field is still useful because it answers a different question:

What would this workload look like under metered API economics?

Token accounting planes

Token accounting starts before dollars. The first question is not “what did it cost?” but “which part of the delivery lifecycle consumed context?”

Read tokens across these planes:

Plane	What it explains	Where it shows up
Window	How much agent work happened over `7d`, `30d`, or all time.	`orcho cost --window 30d`, `orcho metrics --last 5`
Run	One run’s total input, output, duration, and result.	`metrics.json`, `orcho_run_metrics`
Phase	Whether plan, implementation, review, repair, or final acceptance dominates.	`metrics.json.phases`
Attempt / retry	Whether repeated validation or repair rounds consumed the budget.	`metrics.json.phase_attempts`
Runtime / model	Which worker runtime or model family is expensive.	cost report agent breakdown
Project / subtask	Which participant or implement subtask carried the load when available.	cross-run rollups, `metrics.json.subtasks`
Cache	How much provider input was reused, created, or fresh.	`tokens_in_cache_read`, `tokens_in_cache_create`

This matters because the same total can mean very different things:

a large plan phase can mean the task was too broad;
a large review_changes phase can mean the evidence contract is expensive;
repeated repair_changes can mean the gate is catching real misses;
a large subtask can reveal the actual hotspot inside a feature;
a low cache hit on a warm run can mean the prompt prefix changed.

Real accounting excerpt

The live CLI output can show cost at the same moment it shows phase progress. This sanitized excerpt is from a real feature run:

real feature runphase accounting

✓ plan · 254.1s · $1.522
  Orcho prompt      2.9k tokens
  Provider input    791.6k tokens (90% cached, ~$1.23 saved)
  Runtime overhead  788.7k tokens
  Response          17.5k tokens
  Activity          tools=13 calls
  Live context      81.2k / 1.0M (8% full)

✓ implement:subtask:T1 · 359.9s · $2.837
  Orcho prompt      4.1k tokens
  Provider input    2.7M tokens (97% cached, ~$2.46 saved)
  Runtime overhead  2.7M tokens
  Response          27.0k tokens
  Activity          tools=34 calls
  Live context      95.2k / 1.0M (10% full)

What the operator learns: The implementation phase carried more token load than planning, but most provider input was cache-readable in both phases.
Why overhead is shown separately: The worker CLI can add system prompt and tool schema context. Orcho separates its assembled prompt from runtime-injected overhead when it can estimate the gap.

Who spent the tokens

There are three different consumers to keep separate:

Consumer	Counted by Orcho run metrics?	Notes
Orcho-controlled worker calls	Yes, when the runtime/provider reports usage or Orcho can estimate it.	These are the phase calls Orcho launches: plan, implement, review, repair, gates, advice.
Runtime-injected overhead	Partly visible when provider input is larger than Orcho’s assembled prompt estimate.	Agent CLIs can add their own system prompt and tool schemas. Orcho surfaces the gap when it can compute it honestly.
Surrounding MCP/LLM client	No, unless that client’s usage is separately reported outside the run.	The client may spend tokens reading status, evidence, and diagnosis. That is real overhead, but it is not automatically part of the run’s `metrics.json`.

This is the reason MCP captain mode should be evaluated carefully. The run metrics show Orcho’s controlled delivery work. The client may spend additional tokens to manage the lifecycle. The payoff is only real when that extra context prevents blind retries, wrong resumes, or human reconstruction of run state.

Cached tokens

Provider input can contain cached and fresh parts. Do not add cached tokens to input as if they were a separate bucket.

tokens_in = fresh input + cache-read input + cache-create input

Orcho uses these fields when the runtime exposes them:

Field	Meaning
`tokens_in`	Total provider input for the call. Cached input is already inside this number.
`tokens_out`	Provider output for the call.
`tokens_in_cache_read`	Input tokens served from provider cache. Usually cheaper than fresh input.
`tokens_in_cache_create`	Input tokens written into cache during this call. Often a cold/priming call.
`tokens_exact`	Whether the count came from provider/runtime usage rather than estimation.

Two caveats matter:

Low cache_read alone does not always mean the cache failed. A cold call can write a large stable prefix into cache with high cache_create and low cache_read; the next call may read it back.
The stronger “prefix changed” signal is low coverage: cache_read + cache_create is small compared with tokens_in, so most of the prompt was genuinely fresh.

When a provider exposes only cached-read tokens and not cache-creation tokens, Orcho falls back to a weaker read-ratio interpretation. It should be treated as an operating signal, not an exact cache diagnosis.

What token accounting cannot prove

Token accounting is observability, not a moral score.

It cannot prove:

the provider’s exact invoice;
hidden client-side reasoning cost outside Orcho;
whether a subscription provider will keep the same policy next month;
whether a large prompt was wasteful or necessary without reading the phase result;
whether cached-token pricing matches your actual terms unless the local pricing table is verified.

It can prove something more useful for day-to-day operation: where the workload went and which layer deserves tuning.

Enable accounting

Use either config or environment:

ORCHO_ACCOUNTING=1

or set:

{
  "accounting": {
    "enabled": true
  }
}

Then inspect reports:

orcho metrics --last 5
orcho cost --window 30d
orcho pricing show

Refresh local pricing when estimates matter:

orcho pricing refresh --provider openai

The local pricing table is a user-controlled estimate. Verify important rates against your actual provider terms before using the number for decisions.

Subscription value

Accounting helps answer whether a subscription is carrying real work or mostly sitting idle.

The cautious way to read it:

Question	Read
How much work did I route through agents this month?	total tokens, runs, phase breakdown
Which runs consumed most of the workload?	top expensive runs
Which phases dominate spend?	phase breakdown
Which runtime/provider dominates?	agent breakdown
Would metered API usage have exceeded my monthly subscription price?	API-equivalent total versus subscription cost

Do not treat this as exact billing. Subscription terms can include quotas, fair-use limits, model access rules, queueing, changing policies, and workload constraints that Orcho cannot infer from a run artifact.

The useful question is more practical:

If this month's Orcho workload had been billed as metered API usage,
would the subscription still look rational for how I actually work?

What-if scenarios

Cost accounting also gives you a way to reason about pricing changes without rewriting your memory of the work.

Use cases:

provider prices change;
a subscription becomes less generous;
a model is moved to a different access plan;
a project starts needing heavier feature or complex_feature profiles;
MCP captain mode adds context calls but reduces blind retries;
a team wants to know whether review depth is worth the token budget.

The what-if frame is deliberately conservative:

Keep the workload record: runs, phases, models, tokens, duration.
Update the local pricing table.
Re-run the cost report over the same window.
Compare profile depth, runtime mix, and avoidable retries.
Decide whether to change profiles, models, gates, or subscription strategy.

Profile ROI

Cost is not only about providers. It also helps decide whether Orcho’s own workflow shape is proportionate.

Signal	Possible interpretation
Review and repair dominate cost.	The task may be underspecified or gates are too broad.
Planning dominates cost.	Use a smaller profile or split the task.
Final acceptance repeatedly rejects.	Add clearer verification receipts or stronger project tuning.
MCP adds overhead but prevents reruns.	Captain mode may be paying for itself.
One runtime dominates cost.	Consider runtime assignment or model/effort overrides.

The goal is not to minimize tokens at all costs. The goal is to avoid spending tokens where they do not buy delivery confidence.

Feature run anatomy shows where usage appears in the live run.
LLM captain mode explains when MCP overhead is rational.
Artifact reference lists metrics.json.