Status: Accepted Date: 2025-01-15 Deciders: Core team

Context#

Anatoly runs 7 LLM calls per file (one per axis). For a 200-file project, that is 1,400 API calls. Each call includes the full file content plus contextual data in the prompt. Without optimization, auditing a medium-sized codebase would cost tens of dollars and take over an hour.

The fundamental tension is between thoroughness and cost. A cheaper tool that misses issues is useless; an accurate tool that costs $50 per run will not be used regularly. Anatoly needs to be thorough where it matters and frugal where it does not.

Decision#

Minimize API costs through three complementary pre-processing strategies, all of which run locally with zero API cost:

Triage -- skip files that cannot produce meaningful findings.
Usage graph -- pre-compute import relationships to eliminate redundant tool calls.
Local token estimation -- let users predict costs before committing to a run.

Strategy 1: Triage (skip trivial files)#

The triage system (src/core/triage.ts) classifies every scanned file into one of two tiers before any API call is made:

Tier	Criteria	API calls
skip	Barrel exports (re-export only, no own symbols), trivial files (<10 lines, 0-1 symbols), type-only files (all symbols are types/enums), constants-only files	0
evaluate	Everything else	7 (one per axis)

Skipped files receive a synthetic CLEAN review (generateSkipReview) with is_generated: true and a skip_reason field. They appear in the final report as reviewed (so there are no gaps) but cost nothing.

Typical savings: In a standard TypeScript project, 15-30% of files are barrel exports, type definitions, or constant declarations. Triage eliminates these at zero cost.

Strategy 2: Usage graph (pre-compute imports)#

The usage graph (src/core/usage-graph.ts) performs a single local pass over all project files, parsing import/export statements with regex to build a complete map of which symbols are imported by which files.

The graph tracks:

Runtime imports (import { X } from './path') -- the primary signal for dead code detection.
Type-only imports (import type { X } from './path') -- tracked separately because a symbol used only as a type may still be "dead" from a runtime perspective.
Re-exports (export { X } from './path', export * from './path') -- counted as usage to avoid false positives on barrel-exported symbols.
Namespace imports (import * as X from './path') -- resolved against the export map to credit all exported symbols.

This pre-computed data is injected directly into the utility axis prompt. Instead of the LLM needing to grep across the project to determine if a function is used (which would require multiple tool calls, each consuming tokens), it receives a definitive answer:

- buildUsageGraph (exported): runtime-imported by 2 files: src/core/runner.ts, src/commands/run.ts
- formatTokenCount (exported): imported by 0 files -- LIKELY DEAD
- resolveImportPath (not exported): internal only -- check local usage in file

Quantified savings: Without the usage graph, the utility axis would need to make tool calls (grep, read) for each exported symbol to check if it is imported elsewhere. For a file with 10 exported symbols, that is approximately 10-20 tool calls, each adding ~500-1,000 tokens of round-trip overhead. The usage graph replaces all of these with a single pre-computed section in the prompt (~50 tokens per symbol). For a 200-file project with an average of 5 exported symbols per file, this eliminates roughly 1,000 redundant tool calls -- approximately a 90% reduction in utility-axis token usage.

The graph also detects orphan symbols (exported but never imported by any file) during construction, logging them for diagnostics:

usage graph built { files: 45, runtimeImports: 312, typeImports: 87, totalExports: 198, orphanCount: 23 }

Strategy 3: Local token estimation#

The estimator (src/core/estimator.ts) uses tiktoken (cl100k_base encoding) to count actual tokens in each file's source code, then projects total input and output token usage for the entire run:

Input tokens per file: system prompt overhead (600 tokens) + actual file content tokens + per-file overhead (50 tokens)
Output tokens per file: base per file (300 tokens) + per-symbol output (150 tokens per symbol)
Time estimation: base seconds per file (4s) + per-symbol time (0.8s per symbol), adjusted for concurrency efficiency (75%)

This runs via anatoly estimate with zero API calls, giving users a clear picture before they commit. The rendered view splits the projection into a per-step Cost breakdown (one row per axis, deliberation, summary, embedding, internal-doc) and a Forecast headline:

  Cost breakdown
  ──────────────
  category       step              cost   mode           model
  axis           correction       $0.15   subscription   anthropic/claude-sonnet-4-6
                 utility          $0.05   subscription   anthropic/claude-haiku-4-5
                 ...
  deliberation                   ~$0.53   subscription   anthropic/claude-opus-4-6
  summary                         $0.02   subscription   anthropic/claude-haiku-4-5
  embed          code             $0.00   local          jina-v2 768d (local)
                 text             $0.00   local          MiniLM-L6 384d (local)
  internal-doc   bootstrap       ~$0.53   subscription   anthropic/claude-sonnet-4-6
 
  total billed                    $0.00
  consumption                    ~$1.93
 
  Forecast
  ──────────────
  files    12 of 15  (3 skipped by triage)
  tokens   ~64K in / ~113K out + ~6K embed
  cost     $0 in subscription mode  (ensure quota for ~$1.93)
  time     ~10m  (default)

The mode column distinguishes work covered by an OAuth subscription (subscription — you pay nothing, the cost is informative quota magnitude) from real per-token API spending (api) and free local runtimes (local). Two totals close the breakdown: total billed (what comes out of your pocket) versus consumption (the API equivalent — useful for sizing your subscription quota). For programmatic consumers, anatoly estimate --json emits the same data as a versioned JSON payload (schemaVersion: 1).

Model tiering#

In addition to the three pre-processing strategies, Anatoly uses model tiering to reduce per-call cost:

5 axes use Haiku (faster, cheaper): utility, duplication, overengineering, tests, documentation
2 axes use Sonnet (deeper, more expensive): correction, best_practices

This is configurable per-axis via .anatoly.yml, but the defaults reflect the observation that utility/duplication judgments are more mechanical (pattern matching against pre-computed evidence) while correction/best-practices require deeper reasoning.

Consequences#

Positive#

Predictable costs. Users can run anatoly estimate before every audit and know within 20% what the run will cost.
90% fewer redundant tool calls. The usage graph eliminates the most wasteful pattern in LLM-based code review: asking the model to grep for usages that could be statically determined.
15-30% fewer API calls from triage. Barrel exports, type files, and trivial files are handled locally at zero cost.
No accuracy loss. Triage only skips files that genuinely cannot produce findings (pure re-exports, type definitions). The usage graph provides more accurate data than LLM-driven grep (it covers the entire project in one pass, with no risk of missed files).
Sub-second overhead. The usage graph builds in <100ms for a 200-file project. Token estimation takes <1 second. Triage is instant (pure in-memory classification).

Negative#

Triage may skip edge cases. A "constants-only" file that exports a misconfigured object could have a real bug, but triage skips it. This is an acceptable trade-off because constant files rarely contain logic errors.
Usage graph is regex-based. It does not use a full TypeScript resolver, so dynamic imports (import()) and computed property exports are not tracked. These are rare in practice and do not significantly affect dead code detection accuracy.
Token estimation is approximate. The cl100k_base tokenizer is Claude-compatible but not identical. Actual costs may vary by 10-20% from estimates, depending on prompt caching behavior and retry rates.

Alternatives Considered#

Alternative	Why rejected
No triage (review everything)	Wastes 15-30% of the budget on files that cannot produce findings. Barrel exports and type-only files generate noise, not signal.
LLM-driven import analysis	Using tool calls (grep/read) during evaluation to check symbol usage is 10-20x more expensive in tokens and non-deterministic. The model might grep for the wrong string or miss re-exports through barrel files.
TypeScript compiler API for usage graph	More accurate than regex, but adds a hard dependency on `typescript` (50+ MB) and requires a valid `tsconfig.json`. The regex approach handles 95%+ of real-world import patterns and works on any TypeScript-like project regardless of configuration.
Cost caps / budget limits	Useful as a safety net (and could be added later), but does not reduce the actual cost. Pre-processing reduces the work itself, not just the spending limit.

Cost Optimization