The estimator (src/core/estimator.ts) provides pre-run cost and time projections. It counts tokens locally using tiktoken, models the expected LLM call volume, and computes wall-clock time estimates factoring in concurrency efficiency. The anatoly estimate command uses this module to let users preview costs before committing to a full run.
Token Counting#
Token counting uses the tiktoken library with the cl100k_base encoding, which is compatible with Claude models.
const enc = get_encoding('cl100k_base');
const tokens = enc.encode(text).length;
enc.free();The encoder is allocated and freed per call in countTokens(). For bulk estimation in estimateTasksTokens(), a single encoder instance is reused across all files to avoid repeated allocation overhead.
Per-file Token Model#
For each .task.json file, the estimator reads the actual source file from disk and counts its tokens. The input token estimate for a single file is:
inputTokens = SYSTEM_PROMPT_TOKENS + fileTokens + PER_FILE_OVERHEAD_TOKENS| Constant | Value | Purpose |
|---|---|---|
SYSTEM_PROMPT_TOKENS |
600 | Estimated tokens for the axis system prompt |
PER_FILE_OVERHEAD_TOKENS |
50 | JSON framing, symbol list, section headers |
If the source file has been deleted since the scan, the estimator falls back to a heuristic: (line_end - line_start + 1) * 8 tokens per symbol.
Output tokens are estimated as:
outputTokens = OUTPUT_BASE_PER_FILE + symbolCount * OUTPUT_TOKENS_PER_SYMBOL| Constant | Value | Purpose |
|---|---|---|
OUTPUT_BASE_PER_FILE |
300 | Base output tokens regardless of symbol count |
OUTPUT_TOKENS_PER_SYMBOL |
150 | Additional output tokens per symbol reviewed |
Time Estimation#
Time estimation uses a linear model based on symbol count:
fileSeconds = BASE_SECONDS + symbolCount * SECONDS_PER_SYMBOL| Constant | Value | Notes |
|---|---|---|
BASE_SECONDS |
4 | Fixed overhead per file (file read, prompt assembly, RAG) |
SECONDS_PER_SYMBOL |
0.8 | LLM output scales with symbol count |
Example estimates:
- 5 symbols: ~8 seconds
- 20 symbols: ~20 seconds
Concurrency Adjustment#
The sequential time total is adjusted for parallel execution:
effectiveSeconds = sequentialSeconds / (concurrency * CONCURRENCY_EFFICIENCY)CONCURRENCY_EFFICIENCY is 0.75 (25% overhead), accounting for:
- API rate limits and contention
- Tail effects (last workers finish alone while others idle)
- Network latency variance
The result is rounded up to the nearest minute.
LLM Call Count#
Each file is evaluated by 7 independent axis evaluators, so:
estimatedCalls = files * AXIS_COUNTThe 7 axes are split between two model tiers:
| Tier | Axes | Count |
|---|---|---|
| Haiku (fast/cheap) | utility, duplication, overengineering, tests, documentation | 5 |
| Sonnet (deep/costly) | correction, best_practices | 2 |
Full Project Estimate#
estimateProject() loads all .task.json files from .anatoly/tasks/ and returns a complete EstimateResult:
{
files: number; // total files to review
symbols: number; // total symbols across all files
inputTokens: number; // estimated input tokens (all axes combined)
outputTokens: number; // estimated output tokens
estimatedMinutes: number; // sequential wall-clock estimate
estimatedCalls: number; // total LLM API calls (files * 6)
}Per-Step Forecast (forecastRun)#
forecastRun() is the richer projection consumed by anatoly estimate. It returns a RunForecast with a steps[] array — one entry per pipeline phase that hits an LLM or embedding API — alongside aggregate llm / embed totals and a calibrated wall-clock estimate.
Step categories#
Steps are grouped under a fixed set of categories (FORECAST_STEP_CATEGORY_ORDER), in the canonical execution order:
| Category | Generated by | Notes |
|---|---|---|
axis |
One per active axis × eval-tier files | Each axis is a full pass; cache modeling applies |
deliberation |
When deliberationModel is set + deliberation enabled |
Heuristic: DELIBERATION_COVERAGE × axis token volume against the deliberation model |
summary |
When RAG + summaryModel set + at least one function |
One entry, name = ''. Input ≈ embed.codeTokens, output ≈ codeUnits × NLP_TOKENS_PER_FUNCTION |
embed |
When RAG enabled | Two entries (code and text); model id is the resolved embedding model |
internal-doc |
When docContext provided with pageCount > 0 |
Heuristic DOC_BOOTSTRAP_PER_PAGE or DOC_UPDATE_PER_PAGE; approximate: true |
Each ForecastStep carries:
{
category: ForecastStepCategory;
name: string; // empty for single-canonical-entry categories
model: string; // resolved model id (or 'local' for local embeds)
billingMode: 'subscription' | 'api' | 'local';
inputTokens: number;
outputTokens: number;
cacheReadTokens?: number; // Anthropic prompt cache, when modeled
cacheCreationTokens?: number;
costUsd: number; // pay-per-token equivalent
approximate?: boolean; // true for heuristic-based steps (doc, deliberation)
}Cache modeling for axis steps#
For each axis, the SYSTEM prompt is cached after the first call (Anthropic prompt cache). For E eval files, the per-axis split becomes:
- fresh input:
sumFileTokens + E × PER_FILE_OVERHEAD_TOKENS(file content + per-call instructions) - cache_creation:
SYSTEM_PROMPT_TOKENS(one write, on the first call) - cache_read:
SYSTEM_PROMPT_TOKENS × (E - 1)(subsequent calls re-read the cache) - output:
sumOutput
Models without cache rates in the pricing cache (most non-Anthropic providers) fall back to the input rate via pricing.cacheReadInput ?? pricing.input — this yields the naive cost, which is the right behavior when caching isn't available.
Doc-generation heuristic#
Internal-doc steps model the multi-turn agentic Read-tool conversation: per page, ~3-7 turns where the accumulated context gets cached after the first turn. The two exported constants encode the per-page token budget:
DOC_BOOTSTRAP_PER_PAGE = { fresh: 3000, cacheRead: 7000, cacheCreation: 600, output: 3500 };
DOC_UPDATE_PER_PAGE = { fresh: 2000, cacheRead: 4000, cacheCreation: 600, output: 2000 };Bootstrap is heavier because the agent reads more files for context. These constants are tunable from observed runs (analogous to recalibrateFromRuns for axis durations).
Billing-mode resolution#
forecastRun takes a resolveBillingMode(modelId) callback so it can tag each step with subscription (OAuth Claude Code), api (per-token), or local (no-API local runtime). The callback is supplied by the caller (commands/estimate.ts or commands/run.ts) and consults config.providers[provider].mode.
The resulting step.billingMode lets the rendered table show a mode column and lets the caller compute two distinct totals:
consumptionUsd— sum of all step costs (pay-per-token equivalent)billedUsd— sum ofapi-mode steps only (what the user actually pays)
Display Formatting#
formatTokenCount() converts raw token numbers into human-readable strings:
| Input | Output |
|---|---|
| 1,200,000 | ~1.2M |
| 340,000 | ~340K |
| 500 | ~500 |
Key Source Paths#
- Estimator:
src/core/estimator.ts - Task loading:
src/core/estimator.ts(loadTasks()) - Forecast helper:
src/core/estimator.ts(forecastRun()) - Task schema:
src/schemas/task.ts - Estimate command (rendered table + JSON output):
src/commands/estimate.ts