anatoly

4. Core Modules

Estimator

Complexity estimation and prioritization scoring

The estimator (src/core/estimator.ts) provides pre-run cost and time projections. It counts tokens locally using tiktoken, models the expected LLM call volume, and computes wall-clock time estimates factoring in concurrency efficiency. The anatoly estimate command uses this module to let users preview costs before committing to a full run.

Token Counting#

Token counting uses the tiktoken library with the cl100k_base encoding, which is compatible with Claude models.

const enc = get_encoding('cl100k_base');
const tokens = enc.encode(text).length;
enc.free();

The encoder is allocated and freed per call in countTokens(). For bulk estimation in estimateTasksTokens(), a single encoder instance is reused across all files to avoid repeated allocation overhead.

Per-file Token Model#

For each .task.json file, the estimator reads the actual source file from disk and counts its tokens. The input token estimate for a single file is:

inputTokens = SYSTEM_PROMPT_TOKENS + fileTokens + PER_FILE_OVERHEAD_TOKENS
Constant Value Purpose
SYSTEM_PROMPT_TOKENS 600 Estimated tokens for the axis system prompt
PER_FILE_OVERHEAD_TOKENS 50 JSON framing, symbol list, section headers

If the source file has been deleted since the scan, the estimator falls back to a heuristic: (line_end - line_start + 1) * 8 tokens per symbol.

Output tokens are estimated as:

outputTokens = OUTPUT_BASE_PER_FILE + symbolCount * OUTPUT_TOKENS_PER_SYMBOL
Constant Value Purpose
OUTPUT_BASE_PER_FILE 300 Base output tokens regardless of symbol count
OUTPUT_TOKENS_PER_SYMBOL 150 Additional output tokens per symbol reviewed

Time Estimation#

Time estimation uses a linear model based on symbol count:

fileSeconds = BASE_SECONDS + symbolCount * SECONDS_PER_SYMBOL
Constant Value Notes
BASE_SECONDS 4 Fixed overhead per file (file read, prompt assembly, RAG)
SECONDS_PER_SYMBOL 0.8 LLM output scales with symbol count

Example estimates:

  • 5 symbols: ~8 seconds
  • 20 symbols: ~20 seconds

Concurrency Adjustment#

The sequential time total is adjusted for parallel execution:

effectiveSeconds = sequentialSeconds / (concurrency * CONCURRENCY_EFFICIENCY)

CONCURRENCY_EFFICIENCY is 0.75 (25% overhead), accounting for:

  • API rate limits and contention
  • Tail effects (last workers finish alone while others idle)
  • Network latency variance

The result is rounded up to the nearest minute.

LLM Call Count#

Each file is evaluated by 7 independent axis evaluators, so:

estimatedCalls = files * AXIS_COUNT

The 7 axes are split between two model tiers:

Tier Axes Count
Haiku (fast/cheap) utility, duplication, overengineering, tests, documentation 5
Sonnet (deep/costly) correction, best_practices 2

Full Project Estimate#

estimateProject() loads all .task.json files from .anatoly/tasks/ and returns a complete EstimateResult:

{
  files: number;           // total files to review
  symbols: number;         // total symbols across all files
  inputTokens: number;     // estimated input tokens (all axes combined)
  outputTokens: number;    // estimated output tokens
  estimatedMinutes: number; // sequential wall-clock estimate
  estimatedCalls: number;  // total LLM API calls (files * 6)
}

Per-Step Forecast (forecastRun)#

forecastRun() is the richer projection consumed by anatoly estimate. It returns a RunForecast with a steps[] array — one entry per pipeline phase that hits an LLM or embedding API — alongside aggregate llm / embed totals and a calibrated wall-clock estimate.

Step categories#

Steps are grouped under a fixed set of categories (FORECAST_STEP_CATEGORY_ORDER), in the canonical execution order:

Category Generated by Notes
axis One per active axis × eval-tier files Each axis is a full pass; cache modeling applies
deliberation When deliberationModel is set + deliberation enabled Heuristic: DELIBERATION_COVERAGE × axis token volume against the deliberation model
summary When RAG + summaryModel set + at least one function One entry, name = ''. Input ≈ embed.codeTokens, output ≈ codeUnits × NLP_TOKENS_PER_FUNCTION
embed When RAG enabled Two entries (code and text); model id is the resolved embedding model
internal-doc When docContext provided with pageCount > 0 Heuristic DOC_BOOTSTRAP_PER_PAGE or DOC_UPDATE_PER_PAGE; approximate: true

Each ForecastStep carries:

{
  category: ForecastStepCategory;
  name: string;                 // empty for single-canonical-entry categories
  model: string;                // resolved model id (or 'local' for local embeds)
  billingMode: 'subscription' | 'api' | 'local';
  inputTokens: number;
  outputTokens: number;
  cacheReadTokens?: number;     // Anthropic prompt cache, when modeled
  cacheCreationTokens?: number;
  costUsd: number;              // pay-per-token equivalent
  approximate?: boolean;        // true for heuristic-based steps (doc, deliberation)
}

Cache modeling for axis steps#

For each axis, the SYSTEM prompt is cached after the first call (Anthropic prompt cache). For E eval files, the per-axis split becomes:

  • fresh input: sumFileTokens + E × PER_FILE_OVERHEAD_TOKENS (file content + per-call instructions)
  • cache_creation: SYSTEM_PROMPT_TOKENS (one write, on the first call)
  • cache_read: SYSTEM_PROMPT_TOKENS × (E - 1) (subsequent calls re-read the cache)
  • output: sumOutput

Models without cache rates in the pricing cache (most non-Anthropic providers) fall back to the input rate via pricing.cacheReadInput ?? pricing.input — this yields the naive cost, which is the right behavior when caching isn't available.

Doc-generation heuristic#

Internal-doc steps model the multi-turn agentic Read-tool conversation: per page, ~3-7 turns where the accumulated context gets cached after the first turn. The two exported constants encode the per-page token budget:

DOC_BOOTSTRAP_PER_PAGE = { fresh: 3000, cacheRead: 7000, cacheCreation: 600, output: 3500 };
DOC_UPDATE_PER_PAGE    = { fresh: 2000, cacheRead: 4000, cacheCreation: 600, output: 2000 };

Bootstrap is heavier because the agent reads more files for context. These constants are tunable from observed runs (analogous to recalibrateFromRuns for axis durations).

Billing-mode resolution#

forecastRun takes a resolveBillingMode(modelId) callback so it can tag each step with subscription (OAuth Claude Code), api (per-token), or local (no-API local runtime). The callback is supplied by the caller (commands/estimate.ts or commands/run.ts) and consults config.providers[provider].mode.

The resulting step.billingMode lets the rendered table show a mode column and lets the caller compute two distinct totals:

  • consumptionUsd — sum of all step costs (pay-per-token equivalent)
  • billedUsd — sum of api-mode steps only (what the user actually pays)

Display Formatting#

formatTokenCount() converts raw token numbers into human-readable strings:

Input Output
1,200,000 ~1.2M
340,000 ~340K
500 ~500

Key Source Paths#

  • Estimator: src/core/estimator.ts
  • Task loading: src/core/estimator.ts (loadTasks())
  • Forecast helper: src/core/estimator.ts (forecastRun())
  • Task schema: src/schemas/task.ts
  • Estimate command (rendered table + JSON output): src/commands/estimate.ts