The estimator (src/core/estimator.ts) provides pre-run cost and time projections. It counts tokens locally using tiktoken, models the expected LLM call volume, and computes wall-clock time estimates factoring in concurrency efficiency. The anatoly estimate command uses this module to let users preview costs before committing to a full run.

Token Counting#

Token counting uses the tiktoken library with the cl100k_base encoding, which is compatible with Claude models.

const enc = get_encoding('cl100k_base');
const tokens = enc.encode(text).length;
enc.free();

The encoder is allocated and freed per call in countTokens(). For bulk estimation in estimateTasksTokens(), a single encoder instance is reused across all files to avoid repeated allocation overhead.

Per-file Token Model#

For each .task.json file, the estimator reads the actual source file from disk and counts its tokens. The input token estimate for a single file is:

inputTokens = SYSTEM_PROMPT_TOKENS + fileTokens + PER_FILE_OVERHEAD_TOKENS

Constant	Value	Purpose
`SYSTEM_PROMPT_TOKENS`	600	Estimated tokens for the axis system prompt
`PER_FILE_OVERHEAD_TOKENS`	50	JSON framing, symbol list, section headers

If the source file has been deleted since the scan, the estimator falls back to a heuristic: (line_end - line_start + 1) * 8 tokens per symbol.

Output tokens are estimated as:

outputTokens = OUTPUT_BASE_PER_FILE + symbolCount * OUTPUT_TOKENS_PER_SYMBOL

Constant	Value	Purpose
`OUTPUT_BASE_PER_FILE`	300	Base output tokens regardless of symbol count
`OUTPUT_TOKENS_PER_SYMBOL`	150	Additional output tokens per symbol reviewed

Time Estimation#

Time estimation uses a linear model based on symbol count:

fileSeconds = BASE_SECONDS + symbolCount * SECONDS_PER_SYMBOL

Constant	Value	Notes
`BASE_SECONDS`	4	Fixed overhead per file (file read, prompt assembly, RAG)
`SECONDS_PER_SYMBOL`	0.8	LLM output scales with symbol count

Example estimates:

5 symbols: ~8 seconds
20 symbols: ~20 seconds

Concurrency Adjustment#

The sequential time total is adjusted for parallel execution:

effectiveSeconds = sequentialSeconds / (concurrency * CONCURRENCY_EFFICIENCY)

CONCURRENCY_EFFICIENCY is 0.75 (25% overhead), accounting for:

API rate limits and contention
Tail effects (last workers finish alone while others idle)
Network latency variance

The result is rounded up to the nearest minute.

LLM Call Count#

Each file is evaluated by 7 independent axis evaluators, so:

estimatedCalls = files * AXIS_COUNT

The 7 axes are split between two model tiers:

Tier	Axes	Count
Haiku (fast/cheap)	utility, duplication, overengineering, tests, documentation	5
Sonnet (deep/costly)	correction, best_practices	2

Full Project Estimate#

estimateProject() loads all .task.json files from .anatoly/tasks/ and returns a complete EstimateResult:

{
  files: number;           // total files to review
  symbols: number;         // total symbols across all files
  inputTokens: number;     // estimated input tokens (all axes combined)
  outputTokens: number;    // estimated output tokens
  estimatedMinutes: number; // sequential wall-clock estimate
  estimatedCalls: number;  // total LLM API calls (files * 6)
}

Per-Step Forecast (`forecastRun`)#

forecastRun() is the richer projection consumed by anatoly estimate. It returns a RunForecast with a steps[] array — one entry per pipeline phase that hits an LLM or embedding API — alongside aggregate llm / embed totals and a calibrated wall-clock estimate.

Step categories#

Steps are grouped under a fixed set of categories (FORECAST_STEP_CATEGORY_ORDER), in the canonical execution order:

Category	Generated by	Notes
`axis`	One per active axis × eval-tier files	Each axis is a full pass; cache modeling applies
`deliberation`	When `deliberationModel` is set + deliberation enabled	Heuristic: `DELIBERATION_COVERAGE × axis token volume` against the deliberation model
`summary`	When RAG + `summaryModel` set + at least one function	One entry, `name = ''`. Input ≈ `embed.codeTokens`, output ≈ `codeUnits × NLP_TOKENS_PER_FUNCTION`
`embed`	When RAG enabled	Two entries (`code` and `text`); model id is the resolved embedding model
`internal-doc`	When `docContext` provided with `pageCount > 0`	Heuristic `DOC_BOOTSTRAP_PER_PAGE` or `DOC_UPDATE_PER_PAGE`; `approximate: true`

Each ForecastStep carries:

{
  category: ForecastStepCategory;
  name: string;                 // empty for single-canonical-entry categories
  model: string;                // resolved model id (or 'local' for local embeds)
  billingMode: 'subscription' | 'api' | 'local';
  inputTokens: number;
  outputTokens: number;
  cacheReadTokens?: number;     // Anthropic prompt cache, when modeled
  cacheCreationTokens?: number;
  costUsd: number;              // pay-per-token equivalent
  approximate?: boolean;        // true for heuristic-based steps (doc, deliberation)
}

Cache modeling for axis steps#

For each axis, the SYSTEM prompt is cached after the first call (Anthropic prompt cache). For E eval files, the per-axis split becomes:

fresh input: sumFileTokens + E × PER_FILE_OVERHEAD_TOKENS (file content + per-call instructions)
cache_creation: SYSTEM_PROMPT_TOKENS (one write, on the first call)
cache_read: SYSTEM_PROMPT_TOKENS × (E - 1) (subsequent calls re-read the cache)
output: sumOutput

Models without cache rates in the pricing cache (most non-Anthropic providers) fall back to the input rate via pricing.cacheReadInput ?? pricing.input — this yields the naive cost, which is the right behavior when caching isn't available.

Doc-generation heuristic#

Internal-doc steps model the multi-turn agentic Read-tool conversation: per page, ~3-7 turns where the accumulated context gets cached after the first turn. The two exported constants encode the per-page token budget:

DOC_BOOTSTRAP_PER_PAGE = { fresh: 3000, cacheRead: 7000, cacheCreation: 600, output: 3500 };
DOC_UPDATE_PER_PAGE    = { fresh: 2000, cacheRead: 4000, cacheCreation: 600, output: 2000 };

Bootstrap is heavier because the agent reads more files for context. These constants are tunable from observed runs (analogous to recalibrateFromRuns for axis durations).

Billing-mode resolution#

forecastRun takes a resolveBillingMode(modelId) callback so it can tag each step with subscription (OAuth Claude Code), api (per-token), or local (no-API local runtime). The callback is supplied by the caller (commands/estimate.ts or commands/run.ts) and consults config.providers[provider].mode.

The resulting step.billingMode lets the rendered table show a mode column and lets the caller compute two distinct totals:

consumptionUsd — sum of all step costs (pay-per-token equivalent)
billedUsd — sum of api-mode steps only (what the user actually pays)

Display Formatting#

formatTokenCount() converts raw token numbers into human-readable strings:

Input	Output
1,200,000	`~1.2M`
340,000	`~340K`
500	`~500`

Key Source Paths#

Estimator: src/core/estimator.ts
Task loading: src/core/estimator.ts (loadTasks())
Forecast helper: src/core/estimator.ts (forecastRun())
Task schema: src/schemas/task.ts
Estimate command (rendered table + JSON output): src/commands/estimate.ts

Estimator