Cost accounting
Cost accounting is optional. Orcho can run without dollar estimates and still record tokens, duration, phases, artifacts, and evidence.
Enable accounting when cost becomes an operating question: which runs were expensive, which phases consumed the most, whether a profile is too heavy for the task, and how your workload compares with subscription-style access.
Run economics
Orcho reports API-equivalent cost: what a metered API would likely have charged for the same token workload. Treat it as operational accounting, not as a provider bill.
What Orcho records
Section titled “What Orcho records”The durable run artifacts include metrics.json. Depending on provider output
and local accounting settings, it can include:
- tokens in, tokens out, and total tokens;
- duration and phase attempts;
- per-phase and per-agent breakdown;
- API-equivalent cost when available;
- whether a cost was estimated from local pricing.
The key phrase is API-equivalent. On a subscription-style provider account, your marginal user-facing bill for a run may be zero. Orcho’s dollar field is still useful because it answers a different question:
What would this workload look like under metered API economics?
Token accounting planes
Section titled “Token accounting planes”Token accounting starts before dollars. The first question is not “what did it cost?” but “which part of the delivery lifecycle consumed context?”
Read tokens across these planes:
| Plane | What it explains | Where it shows up |
|---|---|---|
| Window | How much agent work happened over 7d, 30d, or all time. | orcho cost --window 30d, orcho metrics --last 5 |
| Run | One run’s total input, output, duration, and result. | metrics.json, orcho_run_metrics |
| Phase | Whether plan, implementation, review, repair, or final acceptance dominates. | metrics.json.phases |
| Attempt / retry | Whether repeated validation or repair rounds consumed the budget. | metrics.json.phase_attempts |
| Runtime / model | Which worker runtime or model family is expensive. | cost report agent breakdown |
| Project / subtask | Which participant or implement subtask carried the load when available. | cross-run rollups, metrics.json.subtasks |
| Cache | How much provider input was reused, created, or fresh. | tokens_in_cache_read, tokens_in_cache_create |
This matters because the same total can mean very different things:
- a large
planphase can mean the task was too broad; - a large
review_changesphase can mean the evidence contract is expensive; - repeated
repair_changescan mean the gate is catching real misses; - a large subtask can reveal the actual hotspot inside a feature;
- a low cache hit on a warm run can mean the prompt prefix changed.
Real accounting excerpt
Section titled “Real accounting excerpt”The live CLI output can show cost at the same moment it shows phase progress. This sanitized excerpt is from a real feature run:
✓ plan · 254.1s · $1.522
Orcho prompt 2.9k tokens
Provider input 791.6k tokens (90% cached, ~$1.23 saved)
Runtime overhead 788.7k tokens
Response 17.5k tokens
Activity tools=13 calls
Live context 81.2k / 1.0M (8% full)
✓ implement:subtask:T1 · 359.9s · $2.837
Orcho prompt 4.1k tokens
Provider input 2.7M tokens (97% cached, ~$2.46 saved)
Runtime overhead 2.7M tokens
Response 27.0k tokens
Activity tools=34 calls
Live context 95.2k / 1.0M (10% full)- What the operator learns
- The implementation phase carried more token load than planning, but most provider input was cache-readable in both phases.
- Why overhead is shown separately
- The worker CLI can add system prompt and tool schema context. Orcho separates its assembled prompt from runtime-injected overhead when it can estimate the gap.
Who spent the tokens
Section titled “Who spent the tokens”There are three different consumers to keep separate:
| Consumer | Counted by Orcho run metrics? | Notes |
|---|---|---|
| Orcho-controlled worker calls | Yes, when the runtime/provider reports usage or Orcho can estimate it. | These are the phase calls Orcho launches: plan, implement, review, repair, gates, advice. |
| Runtime-injected overhead | Partly visible when provider input is larger than Orcho’s assembled prompt estimate. | Agent CLIs can add their own system prompt and tool schemas. Orcho surfaces the gap when it can compute it honestly. |
| Surrounding MCP/LLM client | No, unless that client’s usage is separately reported outside the run. | The client may spend tokens reading status, evidence, and diagnosis. That is real overhead, but it is not automatically part of the run’s metrics.json. |
This is the reason MCP captain mode should be evaluated carefully. The run metrics show Orcho’s controlled delivery work. The client may spend additional tokens to manage the lifecycle. The payoff is only real when that extra context prevents blind retries, wrong resumes, or human reconstruction of run state.
Cached tokens
Section titled “Cached tokens”Provider input can contain cached and fresh parts. Do not add cached tokens to input as if they were a separate bucket.
tokens_in = fresh input + cache-read input + cache-create inputOrcho uses these fields when the runtime exposes them:
| Field | Meaning |
|---|---|
tokens_in | Total provider input for the call. Cached input is already inside this number. |
tokens_out | Provider output for the call. |
tokens_in_cache_read | Input tokens served from provider cache. Usually cheaper than fresh input. |
tokens_in_cache_create | Input tokens written into cache during this call. Often a cold/priming call. |
tokens_exact | Whether the count came from provider/runtime usage rather than estimation. |
Two caveats matter:
- Low
cache_readalone does not always mean the cache failed. A cold call can write a large stable prefix into cache with highcache_createand lowcache_read; the next call may read it back. - The stronger “prefix changed” signal is low coverage:
cache_read + cache_createis small compared withtokens_in, so most of the prompt was genuinely fresh.
When a provider exposes only cached-read tokens and not cache-creation tokens, Orcho falls back to a weaker read-ratio interpretation. It should be treated as an operating signal, not an exact cache diagnosis.
What token accounting cannot prove
Section titled “What token accounting cannot prove”Token accounting is observability, not a moral score.
It cannot prove:
- the provider’s exact invoice;
- hidden client-side reasoning cost outside Orcho;
- whether a subscription provider will keep the same policy next month;
- whether a large prompt was wasteful or necessary without reading the phase result;
- whether cached-token pricing matches your actual terms unless the local pricing table is verified.
It can prove something more useful for day-to-day operation: where the workload went and which layer deserves tuning.
Enable accounting
Section titled “Enable accounting”Use either config or environment:
ORCHO_ACCOUNTING=1or set:
{ "accounting": { "enabled": true }}Then inspect reports:
orcho metrics --last 5orcho cost --window 30dorcho pricing showRefresh local pricing when estimates matter:
orcho pricing refresh --provider openaiThe local pricing table is a user-controlled estimate. Verify important rates against your actual provider terms before using the number for decisions.
Subscription value
Section titled “Subscription value”Accounting helps answer whether a subscription is carrying real work or mostly sitting idle.
The cautious way to read it:
| Question | Read |
|---|---|
| How much work did I route through agents this month? | total tokens, runs, phase breakdown |
| Which runs consumed most of the workload? | top expensive runs |
| Which phases dominate spend? | phase breakdown |
| Which runtime/provider dominates? | agent breakdown |
| Would metered API usage have exceeded my monthly subscription price? | API-equivalent total versus subscription cost |
Do not treat this as exact billing. Subscription terms can include quotas, fair-use limits, model access rules, queueing, changing policies, and workload constraints that Orcho cannot infer from a run artifact.
The useful question is more practical:
If this month's Orcho workload had been billed as metered API usage,would the subscription still look rational for how I actually work?What-if scenarios
Section titled “What-if scenarios”Cost accounting also gives you a way to reason about pricing changes without rewriting your memory of the work.
Use cases:
- provider prices change;
- a subscription becomes less generous;
- a model is moved to a different access plan;
- a project starts needing heavier
featureorcomplex_featureprofiles; - MCP captain mode adds context calls but reduces blind retries;
- a team wants to know whether review depth is worth the token budget.
The what-if frame is deliberately conservative:
- Keep the workload record: runs, phases, models, tokens, duration.
- Update the local pricing table.
- Re-run the cost report over the same window.
- Compare profile depth, runtime mix, and avoidable retries.
- Decide whether to change profiles, models, gates, or subscription strategy.
Profile ROI
Section titled “Profile ROI”Cost is not only about providers. It also helps decide whether Orcho’s own workflow shape is proportionate.
| Signal | Possible interpretation |
|---|---|
| Review and repair dominate cost. | The task may be underspecified or gates are too broad. |
| Planning dominates cost. | Use a smaller profile or split the task. |
| Final acceptance repeatedly rejects. | Add clearer verification receipts or stronger project tuning. |
| MCP adds overhead but prevents reruns. | Captain mode may be paying for itself. |
| One runtime dominates cost. | Consider runtime assignment or model/effort overrides. |
The goal is not to minimize tokens at all costs. The goal is to avoid spending tokens where they do not buy delivery confidence.
Related
Section titled “Related”- Feature run anatomy shows where usage appears in the live run.
- LLM captain mode explains when MCP overhead is rational.
- Artifact reference lists
metrics.json.