Skip to content

Cost accounting

Cost accounting is optional. Orcho can run without dollar estimates and still record tokens, duration, phases, artifacts, and evidence.

Enable accounting when cost becomes an operating question: which runs were expensive, which phases consumed the most, whether a profile is too heavy for the task, and how your workload compares with subscription-style access.

Run economics

Orcho reports API-equivalent cost: what a metered API would likely have charged for the same token workload. Treat it as operational accounting, not as a provider bill.

tokenstoken planescachedurationphase costprovider mixwhat-if

The durable run artifacts include metrics.json. Depending on provider output and local accounting settings, it can include:

  • tokens in, tokens out, and total tokens;
  • duration and phase attempts;
  • per-phase and per-agent breakdown;
  • API-equivalent cost when available;
  • whether a cost was estimated from local pricing.

The key phrase is API-equivalent. On a subscription-style provider account, your marginal user-facing bill for a run may be zero. Orcho’s dollar field is still useful because it answers a different question:

What would this workload look like under metered API economics?

Token accounting starts before dollars. The first question is not “what did it cost?” but “which part of the delivery lifecycle consumed context?”

Read tokens across these planes:

PlaneWhat it explainsWhere it shows up
WindowHow much agent work happened over 7d, 30d, or all time.orcho cost --window 30d, orcho metrics --last 5
RunOne run’s total input, output, duration, and result.metrics.json, orcho_run_metrics
PhaseWhether plan, implementation, review, repair, or final acceptance dominates.metrics.json.phases
Attempt / retryWhether repeated validation or repair rounds consumed the budget.metrics.json.phase_attempts
Runtime / modelWhich worker runtime or model family is expensive.cost report agent breakdown
Project / subtaskWhich participant or implement subtask carried the load when available.cross-run rollups, metrics.json.subtasks
CacheHow much provider input was reused, created, or fresh.tokens_in_cache_read, tokens_in_cache_create

This matters because the same total can mean very different things:

  • a large plan phase can mean the task was too broad;
  • a large review_changes phase can mean the evidence contract is expensive;
  • repeated repair_changes can mean the gate is catching real misses;
  • a large subtask can reveal the actual hotspot inside a feature;
  • a low cache hit on a warm run can mean the prompt prefix changed.

The live CLI output can show cost at the same moment it shows phase progress. This sanitized excerpt is from a real feature run:

real feature runphase accounting
✓ plan · 254.1s · $1.522
  Orcho prompt      2.9k tokens
  Provider input    791.6k tokens (90% cached, ~$1.23 saved)
  Runtime overhead  788.7k tokens
  Response          17.5k tokens
  Activity          tools=13 calls
  Live context      81.2k / 1.0M (8% full)

✓ implement:subtask:T1 · 359.9s · $2.837
  Orcho prompt      4.1k tokens
  Provider input    2.7M tokens (97% cached, ~$2.46 saved)
  Runtime overhead  2.7M tokens
  Response          27.0k tokens
  Activity          tools=34 calls
  Live context      95.2k / 1.0M (10% full)
What the operator learns
The implementation phase carried more token load than planning, but most provider input was cache-readable in both phases.
Why overhead is shown separately
The worker CLI can add system prompt and tool schema context. Orcho separates its assembled prompt from runtime-injected overhead when it can estimate the gap.

There are three different consumers to keep separate:

ConsumerCounted by Orcho run metrics?Notes
Orcho-controlled worker callsYes, when the runtime/provider reports usage or Orcho can estimate it.These are the phase calls Orcho launches: plan, implement, review, repair, gates, advice.
Runtime-injected overheadPartly visible when provider input is larger than Orcho’s assembled prompt estimate.Agent CLIs can add their own system prompt and tool schemas. Orcho surfaces the gap when it can compute it honestly.
Surrounding MCP/LLM clientNo, unless that client’s usage is separately reported outside the run.The client may spend tokens reading status, evidence, and diagnosis. That is real overhead, but it is not automatically part of the run’s metrics.json.

This is the reason MCP captain mode should be evaluated carefully. The run metrics show Orcho’s controlled delivery work. The client may spend additional tokens to manage the lifecycle. The payoff is only real when that extra context prevents blind retries, wrong resumes, or human reconstruction of run state.

Provider input can contain cached and fresh parts. Do not add cached tokens to input as if they were a separate bucket.

tokens_in = fresh input + cache-read input + cache-create input

Orcho uses these fields when the runtime exposes them:

FieldMeaning
tokens_inTotal provider input for the call. Cached input is already inside this number.
tokens_outProvider output for the call.
tokens_in_cache_readInput tokens served from provider cache. Usually cheaper than fresh input.
tokens_in_cache_createInput tokens written into cache during this call. Often a cold/priming call.
tokens_exactWhether the count came from provider/runtime usage rather than estimation.

Two caveats matter:

  • Low cache_read alone does not always mean the cache failed. A cold call can write a large stable prefix into cache with high cache_create and low cache_read; the next call may read it back.
  • The stronger “prefix changed” signal is low coverage: cache_read + cache_create is small compared with tokens_in, so most of the prompt was genuinely fresh.

When a provider exposes only cached-read tokens and not cache-creation tokens, Orcho falls back to a weaker read-ratio interpretation. It should be treated as an operating signal, not an exact cache diagnosis.

Token accounting is observability, not a moral score.

It cannot prove:

  • the provider’s exact invoice;
  • hidden client-side reasoning cost outside Orcho;
  • whether a subscription provider will keep the same policy next month;
  • whether a large prompt was wasteful or necessary without reading the phase result;
  • whether cached-token pricing matches your actual terms unless the local pricing table is verified.

It can prove something more useful for day-to-day operation: where the workload went and which layer deserves tuning.

Use either config or environment:

Terminal window
ORCHO_ACCOUNTING=1

or set:

{
"accounting": {
"enabled": true
}
}

Then inspect reports:

Terminal window
orcho metrics --last 5
orcho cost --window 30d
orcho pricing show

Refresh local pricing when estimates matter:

Terminal window
orcho pricing refresh --provider openai

The local pricing table is a user-controlled estimate. Verify important rates against your actual provider terms before using the number for decisions.

Accounting helps answer whether a subscription is carrying real work or mostly sitting idle.

The cautious way to read it:

QuestionRead
How much work did I route through agents this month?total tokens, runs, phase breakdown
Which runs consumed most of the workload?top expensive runs
Which phases dominate spend?phase breakdown
Which runtime/provider dominates?agent breakdown
Would metered API usage have exceeded my monthly subscription price?API-equivalent total versus subscription cost

Do not treat this as exact billing. Subscription terms can include quotas, fair-use limits, model access rules, queueing, changing policies, and workload constraints that Orcho cannot infer from a run artifact.

The useful question is more practical:

If this month's Orcho workload had been billed as metered API usage,
would the subscription still look rational for how I actually work?

Cost accounting also gives you a way to reason about pricing changes without rewriting your memory of the work.

Use cases:

  • provider prices change;
  • a subscription becomes less generous;
  • a model is moved to a different access plan;
  • a project starts needing heavier feature or complex_feature profiles;
  • MCP captain mode adds context calls but reduces blind retries;
  • a team wants to know whether review depth is worth the token budget.

The what-if frame is deliberately conservative:

  1. Keep the workload record: runs, phases, models, tokens, duration.
  2. Update the local pricing table.
  3. Re-run the cost report over the same window.
  4. Compare profile depth, runtime mix, and avoidable retries.
  5. Decide whether to change profiles, models, gates, or subscription strategy.

Cost is not only about providers. It also helps decide whether Orcho’s own workflow shape is proportionate.

SignalPossible interpretation
Review and repair dominate cost.The task may be underspecified or gates are too broad.
Planning dominates cost.Use a smaller profile or split the task.
Final acceptance repeatedly rejects.Add clearer verification receipts or stronger project tuning.
MCP adds overhead but prevents reruns.Captain mode may be paying for itself.
One runtime dominates cost.Consider runtime assignment or model/effort overrides.

The goal is not to minimize tokens at all costs. The goal is to avoid spending tokens where they do not buy delivery confidence.