anatoly

Public log

Changelog

Every release of Anatoly, in reverse chronological order.

Pulled live from GitHub Releases. Subscribe via RSS.

  1. v0.9.7 — npm homepage → anatoly.cloud

    · v0.9.7

    ## Maintenance release
    
    A metadata-only release on top of v0.9.6.
    
    - npm `homepage` now points to the official site, [anatoly.cloud](https://anatoly.cloud), instead of the GitHub README.
    - Version bumped to 0.9.7 (package + lockfile).
    
    No functional or behavioral changes to the audit engine.
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.9.6...v0.9.7
  2. v0.9.6 — Estimate/Scan unification, RAG hardening & OSS hygiene

    · v0.9.6

    ## Highlights
    
    A consolidation release on top of v0.9.5: the `scan` command is now folded into `estimate`, the `estimate` forecast was recalibrated against real runs, RAG / local-embeddings handling was hardened, LLM cost bookkeeping was centralised, and the project gained a full open-source hygiene layer (SECURITY, CONTRIBUTING, CoC, CI).
    
    ---
    
    ## 🔭 Estimate / Scan unification
    
    The standalone `anatoly scan` command is gone — its capabilities (new / modified / cached file accounting) now live inside `estimate`, which is the single entry point for forecasting.
    
    - `anatoly scan` removed; new/modified/cached are surfaced directly in the `estimate` view.
    - New `--no-cache` flag for a from-scratch forecast that ignores prior cache state.
    - Cached files now excluded from the token + cost forecast (no more double-counting).
    - Stale tasks pointing to deleted/missing source files are pruned from the estimator.
    - Deliberation shard count derived from distinct directories rather than a flat `files / 20` heuristic.
    - Bootstrap and coherence per-page token constants recalibrated from real runs.
    - Estimate user-guide added under `docs/`.
    
    ## 🧭 Scan / config schema cleanup
    
    - Glob include/exclude detection runs at wizard time, with a new `respect_gitignore` knob.
    - `auto_detect` removed — `include` / `exclude` are now strictly authoritative.
    - TypeScript-specific defaults dropped from the scan config schema (the project is multi-language).
    - Language detection delegated to [`linguist-js`](https://www.npmjs.com/package/linguist-js) for accurate per-language stats.
    
    ## 🧠 RAG & local embeddings
    
    - LanceDB tables auto-rebuild on embedding-dimension drift, so swapping providers no longer corrupts the index.
    - System-local providers are now identified by name (`local-advanced`), not by URL heuristics — fixes routing when users override `base_url`.
    - The local sidecar is unified under the canonical `local-advanced` name, with per-axis URLs routed through the slot map.
    - `ANATOLY_LOCAL_DUMMY_KEY` footgun removed: local providers declare `auth: none` and skip API-key plumbing entirely.
    - `local-embeddings` gained `init`, `cleanup`, and `downgrade` subcommands; YAML formatting is preserved across config edits.
    
    ## 📊 Metrics & telemetry
    
    - Centralised LLM cost bookkeeping behind a single `recordLlmCost` helper — every code path (tier3, phase, refinement) now goes through one accounting hook.
    - Refinement tokens propagate correctly through tier3 → phase → `recordLlmCost` (previously dropped on the floor).
    - `llm_call` events are now emitted for the coherence-review and content-review passes too, so the cost ndjson covers the full pipeline.
    - `runDocContentReview` is wrapped in `runWithContext` so its events carry the right ndjson phase tag.
    
    ## ✂️ Output concision discipline
    
    - New "output concision discipline" prompt block added to every axis and to non-RAG services.
    - Empirical study (`docs/concision-discipline-study.md`) documents the calibration methodology behind the new prompts.
    
    ## 📚 Docs pipeline
    
    - The `content-review` pass is now merged into the updater loop, removing one full pipeline phase and the round-trip it cost.
    
    ## 🛠️ CLI, build & Makefile
    
    - `anatoly --version` now prints the git commit SHA alongside the package version.
    - `make update` supports a `BRANCH=` variable to refresh from a non-`main` branch.
    - `make update` supports a `COMMIT=` variable to pin a specific revision.
    - `sync-motd` build step is idempotent (no spurious diffs on repeated builds).
    
    ## 🔒 Open-source hygiene
    
    - New `SECURITY.md` with endpoint inventory and threat model.
    - New `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, plus a CI workflow.
    - `README.md` got a TL;DR section.
    
    ## 🛡️ Security & dependencies
    
    - `protobufjs` and `ip-address` pinned via `overrides` to clear critical CVEs.
    - Stale `hasInstallScript` flag dropped from the lockfile.
    
    ## 🔧 Refactoring
    
    - `RunConfig` type and builder extracted into a dedicated core module (`run.ts` is no longer the source of truth for config assembly).
    
    ---
    
    ## Full changelog
    
    `git log v0.9.5..v0.9.6` — 34 commits.
    
    **Compare:** https://github.com/r-via/anatoly/compare/v0.9.5...v0.9.6
  3. v0.9.5 — External Embeddings, Config v3 & Estimate Forecast

    · v0.9.5

    ## Highlights
    
    This release lands two major epics — **Epic 50 (External Embedding Providers)** and **Epic 48/49 (First-Run Onboarding)** — alongside a complete overhaul of the cost forecasting / `estimate` view, a new **declarative v3 config schema**, and a runtime pricing registry sourced from upstream catalogs.
    
    ---
    
    ## 🌐 Epic 50 — External Embedding Providers
    
    Anatoly is no longer tied to local GGUF / ONNX runtimes. You can now bring your own embedding provider — OpenAI, Voyage, Cohere, Mistral, Qwen (via OpenRouter), Azure OpenAI, HuggingFace Inference Endpoints, or any custom OpenAI-compatible base URL.
    
    - **New \`external\` tier** in the first-run wizard with provider sub-prompts and best-of-breed defaults (`voyage-code-3` for code, Qwen3 for NLP).
    - **Per-axis providers**: pick distinct providers/models for `code` vs `nlp`, or reuse the same with one click.
    - **Vercel AI SDK embedding factory** with dim-probe and signature cache — replaces the legacy native GGUF runtime end-to-end.
    - **Provider registry** (`KNOWN_EMBEDDING_PROVIDERS`) covering OpenAI, Voyage, Cohere, Mistral, Qwen, plus a `Custom (manual)` path with `base_url` + `env_key` validation.
    - **Pre-flight connectivity check** for the configured embedding backend before audits start.
    - **Config write**: fully self-explanatory `.anatoly.yml` emitted on first run, no implicit defaults.
    - **Documentation**: new `docs/embedding-providers.md` covering Azure OpenAI internal, self-hosted GGUF clusters, and HF Inference Endpoints.
    - **Cost projection**: `estimate` now projects embedding token counts and prices them per axis for SDK-backed providers.
    
    ## 🪄 Epic 48/49 — First-Run Onboarding
    
    A fully reworked onboarding flow that takes a brand-new user from `npx anatoly` to a working audit without manual config.
    
    - **First-run wizard** with tier (`lite` / `advanced` / `external`) and mode prompts.
    - **Inline ONNX prefetch** for lite tier and **inline GGUF download** with streaming SHA-256 verification, partial-file cleanup, and post-download verify.
    - **Subprocess `setup-embeddings`** (now renamed `local-embeddings <upgrade|status>`) runs automatically after GGUF download.
    - **Always-write `.anatoly.yml`** with sane defaults so config is discoverable.
    - **Cross-project preferences** stored in `~/.anatoly/preferences.yml`.
    - **Plain-mode parity**: tier comparison table and transparency notice render correctly without TTY/colors.
    - **Quick-win runtime filter** + summary suggestion to surface the cheapest first audit.
    - **Recovery prompts** for download failures (retry / fallback to lite / abort).
    - **End-of-setup 3-choice prompt** and post-audit progressive education hint.
    - **`--defaults-settings` flag** for fully non-interactive CI runs.
    - **Visual setup-to-audit transition** so users know when the wizard hands off to the audit.
    
    ## 📊 Estimate / Forecast — Major Refresh
    
    The `estimate` command went from a flat dump to an actionable, table-driven forecast.
    
    - **Unified \`Cost breakdown\` table** powered by `cli-table3`, with full model IDs, embedding labels, total footer, and a ` based on latest public provider price` caption.
    - **Per-step billing mode** (billed vs consumption) and per-step cost breakdown, including **Anthropic prompt-cache modeling** and a doc-generation heuristic (Pass A).
    - **Forecast block reordered** for CLI-friendly reading: Forecast last, merged Configuration / Pipeline Plan first.
    - **Scenario flags**: `--files`, `--axes`, `--no-deliberation`, `--no-internal-docs`.
    - **Step IDs split into category + name**, plus `--json` output for programmatic use.
    - **Deliberation cost** now modeled as a fixed cost per shard (not per axis × file), with per-shard tokens calibrated to opus-4-6 pricing.
    - **Per-axis output multipliers** calibrated from R3 actuals and rebalanced to average 1.0.
    - **Real bootstrap page count**, RAG-unfiltered scope, and the previously-missing coherence step are now reflected.
    - **Pipeline Summary refond** and dropped opaque rows like ‘usage graph N edges'.
    - **NLP summarizer cost** included in LLM forecast totals (Pass 1) and `summaryModel` wired through.
    
    ## ⚙️ Config v3 Schema
    
    - **Declarative providers + routing** in a single v3 schema — replaces the v0/v1/v2 migration chain (legacy migrations dropped).
    - **Annotated YAML template** emitted on first run.
    - **External tier wiring**: runtime correctly resolves the external embedding tier through the RAG pipeline.
    
    ## 💰 Pricing Registry
    
    - **Runtime pricing registry** sourced from **litellm** + **OpenRouter** catalogs — replaces the hardcoded `MODEL_PRICING` map.
    - **Fail-loud strict mode**: runs are blocked when any model has no pricing, instead of silently estimating zero.
    - **Pricing gate moved** to fire after the first-run wizard, never before.
    
    ## 🔌 Provider & Transport Improvements
    
    - **OpenRouter** integrated as an aggregator for Qwen3-Embedding-8B, with app-attribution headers.
    - **Cache-token capture** fixed for Vercel AI SDK v6 + Gemini and Claude Agent SDK (snake_case `cache_creation_input_tokens` / `cache_read_input_tokens`).
    - **Per-call LLM telemetry** persisted on disk; Anthropic token capture hardened.
    - **Auth column** in providers list now derived from the v3 declaration and covers every provider.
    - **Unified provider-auth notice** for Anthropic + Google with inline A./B. labels.
    - **Anthropic pre-flight probe** before starting the run.
    
    ## 🩹 Wizard / UX Fixes
    
    - Stop saying ‘Embeddings (lite) ready' when advanced is the active tier.
    - Unified embeddings tier notice + comparison into one block; renamed `default` → `lite`.
    - OpenRouter (Qwen3-8B) promoted as the recommended NLP provider.
    - External setup exits cleanly instead of failing the run.
    - `--file` glob with zero matches now fails fast with a clear error.
    - Run-summary shows input/output tokens in the cost line; hardcoded subscription hint removed.
    - Pre-summary hint detector for missing init / lite RAG upgrade.
    - `local-embeddings` patches config and skips the wizard prompt after upgrade.
    - Doc-bootstrap: `scaffold-status` tag now certifies docs validity (no more file-presence guessing).
    
    ## 🛠 Install / Build / DevEx
    
    - **Lazy-load model download** on first run — drops the postinstall download.
    - **`make update`** recipe to refresh from `origin/main` and reinstall in one step; merged into a single shell so ‘up to date' skips install.
    - **WSL guard** for Windows-installed Node, with install doc note.
    - **`prepare` script** runs `tsup` so `npm install <git-url>` works; self-heals devDeps when npm git-install skips them.
    - Migrated `@xenova/transformers` → `@huggingface/transformers`.
    - Bumped `@google/gemini-cli-core` 0.35.2 → 0.40.1.
    
    ## 🧹 Misc
    
    - Centralised default model identifiers in `core/default-models.ts`.
    - Code-review hardening pass.
    - `removeRunIfEmpty` + `readLatestPointer` exports restored on `run.ts`.
    - README links to `anatoly.cloud` + free/star CTA.
    - README + CLI / module / cost-optimization docs updated to match the new `estimate` view.
    
    ---
    
    **Full changelog**: https://github.com/r-via/anatoly/compare/v0.9.4...v0.9.5
  4. v0.9.4 — Background Worktree Review & Internal Docs Injection

    · v0.9.4

    ## Highlights
    
    ### Background worktree review
    Run audits in isolated git worktree snapshots without locking your main checkout. The new background mode forks the audit into its own process, persists per-run status, and notifies you on completion — letting you keep working while a long review runs.
    
    - Isolated worktree per run (no conflicts with WIP changes on `HEAD`)
    - Parallel runs supported via per-run lock policy (global lock skipped in background mode)
    - `anatoly status` enriched with tracked background runs (PID, phase, elapsed)
    - Desktop notifications on completion (`notify-send` / equivalent)
    - New `anatoly cleanup` command — prune stale worktrees, lock files, and orphaned run dirs
    
    ### Internal docs as ground truth in business-logic axes
    The internal docs scaffolder (`.anatoly/docs/`, agent-curated business overview / architecture / invariants) is now injected as authoritative project context in the `correction`, `best_practices`, and `overengineering` axes — not just in `documentation`. Findings can cite the source page path, making the chain of reasoning auditable. Zero additional LLM cost: the docs are already generated by the existing scaffolder phase.
    
    ### Industry-domain prompting
    When the model can confidently infer your project's domain (gambling/casino, finance/payments, healthcare/PII, cryptography, gaming RNG, real-time systems) from filenames, imports, package metadata, README, or internal docs — it now applies well-known industry rules from its pretrained knowledge. Examples: `Math.random()` flagged as non-certifiable for regulated gaming, floating-point arithmetic flagged on monetary code, deprecated cryptographic primitives (MD5/SHA-1/ECB) flagged as critical. Each such finding cites both the inferred domain and the rule, keeping the speculative chain auditable.
    
    ### Per-defect correction findings
    Symbols carrying multiple independent defects now split into one row per defect in the report instead of collapsing into a single prose paragraph. Each finding has its own `line_start` / `line_end` / `detail` for clearer signals and reproducible verdicts.
    
    ### Transport-Level Resilience
    Per-provider semaphores and circuit breakers are now centralized in `TransportRouter` with a unified `acquireSlot` / `release` API. Replaces the previous mix of manual semaphores and the legacy `GeminiCircuitBreaker`. All agentic call sites migrated; agentic and single-turn calls share concurrency policy by provider.
    
    ## Bench progression (anatoly-bench / slot-engine fixture)
    
    Each run is a full audit of the [`slot-engine`](https://github.com/r-via/anatoly-bench/tree/main/catalog/slot-engine) fixture, scored against a curated ground-truth catalog. Global F1 is the unweighted mean of per-axis F1s.
    
    | Run | Date | Global F1 | correction | utility | duplication | overengineering | best-practices |
    |-----|------|----------:|-----------:|--------:|------------:|----------------:|---------------:|
    | v6  | 2026-04-24 | 56.8%     | 54.5%      | 60.0%   | **66.7%**   | 66.7%           | 36.4%          |
    | v7  | 2026-04-26 | 65.5%     | 61.5%      | 60.0%   | 66.7%       | 66.7%           | **72.7%**      |
    | v8  | 2026-04-27 | 62.7%     | 36.4%      | **85.7%** | 66.7%     | 75.0%           | 50.0%          |
    | v9  | 2026-04-27 | 61.0%     | 46.2%      | 85.7%   | 66.7%       | 66.7%           | 40.0%          |
    | v10 | 2026-04-28 | 65.0%     | 53.3%      | 85.7%   | 66.7%       | 75.0%           | 44.4%          |
    | v11 | 2026-04-28 | 57.8%     | 44.4%      | 85.7%   | 66.7%       | 33.3%           | 58.8%          |
    | **v12** | 2026-04-28 | **67.8%** | 53.3% | 85.7% | 66.7% | **66.7%** | **66.7%** |
    
    Six fixes landed during this release window, each measured against the previous baseline:
    
    - **v6 — duplication tier-1 invariant** ([44f0617](https://github.com/r-via/anatoly/commit/44f0617)). Tier-1 refinement was downgrading `DUPLICATE` verdicts when RAG similarity stayed below 0.68, even with a concrete `duplicate_target`. duplication: 0% → 66.7%.
    - **v8 — per-axis triage policy** ([b784caf](https://github.com/r-via/anatoly/commit/b784caf)). Triage skip-tier was binary: type-only / trivial / barrel files bypassed every axis with safe defaults — utility lost real DEAD signals. Now skip decisions are per-axis, with usage graph consulted for utility on skipped files. utility: 66.7% → 85.7%.
    - **v9 — multi-defect findings per symbol** ([75cdf08](https://github.com/r-via/anatoly/commit/75cdf08)). Correction now returns an optional `findings[]` array per symbol; the shard renderer emits one row per defect.
    - **v10 — internal-docs injection into business-logic axes** ([a584b80](https://github.com/r-via/anatoly/commit/a584b80)). Anatoly's existing `.anatoly/docs/` already produced high-quality business context, used only by the documentation axis. Now also fed into `correction` / `best_practices` / `overengineering` with a ground-truth framing in the system prompt. correction: 46.2% → 53.3%; INV-ROUND detected.
    - **v11 — industry-knowledge prompting** ([d0068a2](https://github.com/r-via/anatoly/commit/d0068a2)). Prompt rule inviting the model to apply well-known industry-specific rules (gaming RNG / monetary arithmetic / deprecated cryptographic primitives) when domain inference is confident, with mandatory citation of both inferred domain and rule. best-practices recall hit 100% (5/5) for the first time; BP-RNG (`Math.random()` in gaming) detected.
    - **v12 — anti-collapse rules + temperature pin** ([d8fd931](https://github.com/r-via/anatoly/commit/d8fd931), [ebb8505](https://github.com/r-via/anatoly/commit/ebb8505)). Two changes: (1) "flag the source of a defect, not its consumer" rule on correction / OE / best-practices prompts — fixes run-to-run-flapping verdicts where the LLM oscillated between flagging one consumer-side finding vs N source-side findings. (2) `temperature: 0` pinned in the Vercel SDK transport for evaluator reproducibility (Anthropic Claude Agent SDK and Gemini CLI use SDK defaults — they do not expose temperature). OE: 33.3% → 66.7% with 100% precision; global F1: 57.8% → 67.8%.
    
    **Net result for v0.9.4**: global F1 climbed from **56.8%** (start of cycle) to **67.8%** (+11.0 percentage points), with structural improvements on every axis. Full per-run baselines: [`anatoly-bench/baselines/`](https://github.com/r-via/anatoly-bench/tree/main/baselines).
    
    ## Fixes
    
    - **Review progress counter** — display could show impossible values like `13/12` when triage skip-tier files had partial-axes policies (trivial files keeping correction/duplication/utility). `evaluateTotal` now mirrors the handler's actual evaluator-runs decision.
    - **Refinement / DUPLICATE preservation** — never downgrade a `DUPLICATE` verdict in tier 2 when `duplicate_target` is populated; preserve `DUPLICATE` on dead code instead of collapsing to `DEAD`.
    - **Triage** — per-axis skip policy keeps real signal on trivial / barrel-export / type-only / constants-only files (correction + duplication + utility still run on trivial files); usage-graph utility evaluation correctly resolves DEAD on skipped type/constant exports.
    - **Anti-collapse rule** — correction and overengineering prompts now instruct: flag the source of a defect, not its consumer (defects have one canonical home — where they're defined).
    - **Telegram notifications** — disabled axes excluded from the scorecard so cosmetic placeholder rows don't pollute the message.
    - **Vercel SDK transport** — pinned `temperature=0` for evaluator reproducibility (deterministic verdicts on identical input).
    
    ## Internals
    
    - BMAD/Ralph workflow integration improvements: parser tolerates story / epic heading variants, sprint-status sync points clarified.
    - `domain-digest` feature explored, implemented, then reverted in favor of internal docs injection (same goal, no parallel extraction pipeline, zero additional LLM cost). Original spec preserved as deprecated history in `anatoly-bench/docs/02-domain-digest-spec.md`.
    
    ## Migration
    
    Drop-in upgrade from 0.9.3. No config changes required; new flags are opt-in:
    
    \`\`\`bash
    anatoly run                    # default — same as before
    anatoly status                 # now shows background runs
    anatoly cleanup                # new — prune stale worktrees / locks
    \`\`\`
    
    The internal-docs injection only activates when `.anatoly/docs/` already exists for a project — generated automatically on first run by the existing scaffolder phase.
    
    **Full changelog**: https://github.com/r-via/anatoly/compare/v0.9.3...v0.9.4
  5. v0.9.3

    · v0.9.3

    ## What's Changed since v0.9.2
    
    ### Features
    
    - **Overengineering axis — usage-graph signal + duplication invariant** ([b129cd6](https://github.com/r-via/anatoly/commit/b129cd6)): the overengineering evaluator now factors in the usage graph (symbols with few runtime importers are weighted differently from hot code) and enforces a duplication invariant preventing contradictory verdicts.
    - **Global refinement cache with freshFiles invalidation** ([3071dfc](https://github.com/r-via/anatoly/commit/3071dfc)): per-finding cache in `.anatoly/cache/` now survives across runs with freshly-reviewed files auto-evicted, preventing redundant tier-3 investigations.
    - **Zod validation retry in agenticQuery** ([f617d1a](https://github.com/r-via/anatoly/commit/f617d1a)): transport layer retries once on schema validation failures before bubbling the error, reducing flaky runs caused by transient LLM output drift.
    
    ### Fixes
    
    - **Review coherence and cache invariants** ([3c27a0a](https://github.com/r-via/anatoly/commit/3c27a0a)): multiple inter-axis coherence bugs and stale-cache edge cases resolved.
    - **extract-json fence matching** ([fdd9c77](https://github.com/r-via/anatoly/commit/fdd9c77)): only matches ` ```json ` fences, no longer swallows ` ```rust ` or other-language blocks that happen to look JSON-like.
    - **Tier1/tier2 progress output** ([e6a14a8](https://github.com/r-via/anatoly/commit/e6a14a8)): finding totals now surface in refinement progress so you can see how many findings each tier is processing.
    - **Transport hardening — adversarial review #1-#10** ([b10e5c0](https://github.com/r-via/anatoly/commit/b10e5c0)): ten findings from the adversarial transport review addressed.
    
    ### Docs
    
    - **Advanced Configuration rewrite for v2 schema** ([d6f0143](https://github.com/r-via/anatoly/commit/d6f0143)): [docs/03-Guides/02-Advanced-Configuration.md](https://github.com/r-via/anatoly/blob/main/docs/03-Guides/02-Advanced-Configuration.md) rewritten to match the current v2 config schema.
    
    ### Chore
    
    - **Lint cleanup** ([06441cc](https://github.com/r-via/anatoly/commit/06441cc)): removed unused imports/vars, routed Telegram warnings through the central logger.
    
    **Full changelog**: https://github.com/r-via/anatoly/compare/v0.9.2...v0.9.3
  6. v0.9.2

    · v0.9.2

    ## What's Changed since v0.9.1
    
    ### 3-Tier Refinement Pipeline (replaces per-file Deliberation)
    
    The per-file Opus deliberation pass is replaced by a post-review refinement pipeline that processes all findings in batch:
    
    - **Tier 1 — Deterministic auto-resolve** (0 tokens): usage graph confirms DEAD exports are truly unreferenced, AST validates line ranges, RAG confirms duplication candidates. Resolves ~40% of findings instantly
    - **Tier 2 — Inter-axis coherence** (0 tokens): detects contradictions (DEAD + NEEDS_FIX is moot, type-only importers can't be OVER, LOW_VALUE coherence checks). Deterministic rules, no LLM
    - **Tier 3 — Agentic investigation** (Opus): launches an agent with full tool access (Read, Grep, Bash, WebFetch) to investigate ambiguous findings with empirical evidence. Conversation transcripts dumped per finding
    - **Global refinement cache** — per-finding persistence in `.anatoly/cache/` survives across runs; freshly-reviewed files are auto-evicted; `--no-cache` clears it
    - **[CACHED] shard display** — cached shards show `[CACHED]` tag in deliberation output, matching review phase style
    - **Finding totals in progress** — tier 1/tier 2 now show resolved/total and confirmed counts for full visibility
    - **Results**: -22% faster, -20% cheaper, +150% CLEAN files vs legacy deliberation
    - Per-shard progress display with finding-level granularity
    
    ### Multi-Provider Transport Architecture (Epic 43)
    
    Complete rewrite of the LLM transport layer:
    
    - **Mode-aware TransportRouter** — routes models to native transports (subscription) or Vercel AI SDK (API billing) based on provider config
    - **Vercel AI SDK transport** — unified API billing for any provider (Anthropic, Google, OpenAI) with cost calculation
    - **Config v2 format** — `providers:`, `models:`, `agents:`, `runtime:` sections replace flat `llm.*` paths. Automatic v1→v2 migration
    - **`anatoly init` wizard** — interactive multi-provider setup with model selection
    - **Per-provider semaphores and circuit breakers** (Epic 46) — `acquireSlot()`/`release()` pattern with automatic success/failure tracking
    - **`extractProvider()`/`stripPrefix()`** — model prefix inference (`google/gemini-2.5-flash` or bare `gemini-2.5-flash`)
    - **Agentic query** — `agenticQuery()` on TransportRouter for tier 3 dispatch with Bash tool + web search
    - **Zod validation retry** in agentic queries
    
    ### Telegram Notifications (Epic 45)
    
    - **`anatoly notifications create-bot`** — interactive setup wizard for Telegram bot
    - **`anatoly notifications test`** — send a test notification
    - **`anatoly report --notify`** — send notification after report generation
    - **Auto-notify** after each `anatoly run` — single photo+caption with compressed banner, health bars, severity breakdown, token stats
    - Budget-aware findings truncation to fit Telegram's 1024 caption limit
    - Fire-and-forget: delivery failures never break the pipeline
    
    ### User Instructions — ANATOLY.md
    
    - **`ANATOLY.md`** project-level instructions file — custom rules injected into axis system prompts
    - Loader with frontmatter parsing and section extraction
    - Per-axis skip patterns in config (`axes.*.skip`)
    - Show custom rules in configuration table during setup
    
    ### Review Engine
    
    - **Duplication auto-UNIQUE** — skip LLM when no RAG similarity candidates exist
    - **Utility retry** — retry when LLM omits symbols instead of crashing
    - **Correction refinement** — 4 new deterministic rules reduce false positives
    - Remove projectTree injection from overengineering and tests axes (token savings)
    - Pass `userInstructions` to all evaluators
    
    ### RAG
    
    - Detect and log NLP name mismatches that cause infinite re-indexing
    - Garbage-collect orphaned cache entries after parser line shifts
    - Truncate NLP summaries instead of rejecting on >400 chars
    - Route NLP summarization through TransportRouter
    - Remove duplicate `rag:` prefix from log messages
    
    ### CLI & UX
    
    - Dynamic provider display in pipeline header
    - Per-provider concurrency slots display (Claude + Google)
    - Move run-only options from global scope to run command
    - Unify deliberation step as single "Deliberation" task in UI
    - Include file paths and messages in error summary
    - Doc scaffold conversation transcripts
    - `anatoly runs` command + latest pointer helpers
    - Fix `--no-color` flag
    
    ### Report
    
    - Add `--debug` flag for report generation
    - Fix `--notify` to use real health percentages
    
    ### Clean Loop
    
    - Kill spawned claude process on SIGINT/SIGTERM
    - Restore original branch on interrupt (stash + checkout + pop)
    
    ### Config
    
    - Gold-set: 6 fixtures covering all 7 axes
    - Config v1.0 schema with validation tests
    - Per-axis skip patterns
    
    ### Bug Fixes
    
    - Fix `extract-json` to only match ` ```json ` fences, not ` ```rust ` or other langs
    - Fix `isGeminiModel` crash, use `stripPrefix` everywhere
    - Suppress `console.error` from gemini-cli-core rate limit retries
    - Fix Zod v4 refine compatibility in utility axis
    - Fix cached file metrics in triage
    - Fix triage to respect enabled axes in skip reviews
    - Adversarial reviews: Epic 41 (10 fixes), Epic 42 (5 fixes), Epic 43 (7 fixes), Epic 46 (10 fixes)
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.9.1...v0.9.2
  7. v0.9.1

    · v0.9.1

    ## What's Changed since v0.9.0
    
    ### Gemini Transport
    
    - **GenAI SDK transport** — added `@google/genai` SDK as alternative Gemini transport alongside `gemini-cli-core`, with concurrency stress test and token optimizations
    
    ### RAG
    
    - **Scoped `--rebuild-rag`** — when used with `--file`, only purges vector store entries and caches for matching files instead of dropping the entire table
    - **Gemini semaphore** — pass `geminiSemaphore` through the full RAG pipeline (orchestrator → nlp-summarizer → runSingleTurnQuery)
    
    ### Report
    
    - **Health bar severity scaling** — degrade health bar color based on high-severity finding count, scale thresholds by codebase size
    - Remove `buildReportsBaseUrl` from report command
    
    ### Scripts
    
    - Portable `awk` in `free_port`, bounded timeout in `wait_for_gguf`
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.9.0...v0.9.1
  8. v0.9.0

    · v0.9.0

    ## What's Changed since v0.8.2
    
    ### Multi-Provider LLM — Gemini 2.5 Flash (Experimental)
    
    **LlmTransport abstraction** — pluggable provider layer with `AnthropicTransport` and `GeminiTransport`:
    - `LlmTransport` interface and `TransportRouter` for model-to-provider routing (Story 37.1)
    - `AnthropicTransport` wraps existing Claude SDK calls (Story 37.2)
    - `GeminiTransport` wraps `@google/gemini-cli-core` with Google OAuth (Story 37.3)
    - Auth check with graceful fallback to Claude when Gemini is unavailable (Story 37.5)
    - Circuit breaker: auto-falls back to Claude on repeated Gemini failures
    
    **Axis routing** — utility, duplication, overengineering now routed to Gemini 2.5 Flash (Story 38.1):
    - 100% accuracy on gold-set benchmarks, 2-5s latency, implicit caching (96% hit rate on 2nd call)
    - Correction, tests, best practices, documentation remain on Claude (quality-critical)
    - Deliberation stays on Claude Opus (non-negotiable safety net)
    
    **NLP summarization** routed to Gemini Flash (Story 39.1) — 100% schema validity, $0/token
    
    **Impact:** ~69% reduction in Claude API calls, ~74% cost reduction, ~35-40% faster runs
    
    **New commands:**
    - `anatoly providers` — verify LLM connectivity (Claude + Gemini status)
    
    **Infrastructure:**
    - Dual semaphores for Claude and Gemini concurrency management
    - Provider field in logs and run metrics (Story 39.2)
    
    ### Review Engine
    
    - **AST-based import extraction** — replaced regex with tree-sitter AST traversal for `require()`, Python `from-import`, and Bash recursion
    - **False positive reduction** — filter private symbols, deduplicate actions, calibrate severity, conservative test coherence
    - **Test discovery** — expanded test file discovery, inject tests into deliberation context, deduplication
    - **Deliberation memory overhaul** — group by symbol instead of per-axis entries, merge stale entries, escape regex, rebuild on corrupted JSON, truncate `original_detail` to reclassified axes only
    - **`--flush-memory` flag** — reset deliberation memory before a run
    - **Rate limit handling** — sleep until rate limit reset instead of degrading reviews
    - Raise default review concurrency from 4 to 8
    
    ### Report
    
    - **`public_report.md`** — new polished public-facing report layout
    - **Report upstream extracted** — migrated to standalone script in `anatoly-reports` repo
    - **Redesigned report sections** — merged Findings Summary into Axes table, emoji health bars, verdict breakdown for all-clear axes, doc coverage section, total findings count in hero block
    - Absolute links to anatoly-reports + breadcrumb navigation
    - Executive summary with all 7 axes, fix token metrics
    
    ### Clean Loop
    
    - **Subcommand rename** — `clean-run` → `clean run`, `clean-sync` → `clean sync`, etc.
    - `clean generate`, `clean run`, `clean sync` as proper subcommands with tests
    - Bump default iterations from 10 to 50
    - Rename Ralph → clean loop in source code
    
    ### Documentation Pipeline
    
    - **Smart chunking** — programmatic H2+H3+paragraph splitting replaces Haiku LLM chunking ($0)
    - **Coherence optimization** — single-pass Sonnet, auto-fix, content injection (was multi-pass Opus)
    - **Incremental updates** — scope doc updates to touched modules only
    - **Doc deduplication** — auto-sync project docs from internal when trees are identical, `docs identity` and `docs reset-project` commands
    - **Batch doc embeddings** — one mega-batch instead of per-file
    - Auto-fix missing h1 heading, strip preamble from multi-turn agent output
    - Raise scaffold maxTurns from 5 to 15, coherence injection budget to 50%
    - Track RAG costs, expose cached status for doc section indexing
    - Modular type-context injection + remove dead strategy2
    
    ### CLI & UX
    
    - Show "Done !" instead of "0/N checks left" on file completion
    - Clear screen on terminal resize to prevent rendering glitches
    - Show "done" on completed pipeline tasks instead of stale counters
    - Two-column models table, doc cache refresh, estimate doc stats
    - Cache breakdown in pipeline output, error dumps, `[review]` prefix
    - Reduce path truncation in progress display
    
    ### Rust Support
    
    - Chained `super::` resolution, workspace glob expansion, cross-crate usage tracking
    - Cargo workspace detection as Monorepo
    - Language-aware output formatting
    
    ### Security
    
    - Shell injection fixes — `execFileSync`, validate NWO format
    
    ### JSDoc Coverage (clean loop auto-fix)
    
    - Added JSDoc for ~200 symbols across 70 files (auto-generated via clean loop)
    
    ### Bug Fixes
    
    - RAG: block fallback to lite when index was built with advanced-gguf
    - Symlink guard on sync/reset, conditional sync task
    - Rate limiter max standby cycles, doc-gen rate limit detection
    - Workspace module extraction, gap detection, Cargo detection
    - Adversarial review — 12 fixes across CLI, RAG, core, commands
    - Breadcrumb links, fork detection, doc coverage on cached runs
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.8.2...v0.9.0
  9. v0.8.3

    · v0.8.3

    ## What's Changed since v0.8.2
    
    ### Review Engine
    
    - **AST-based import extraction** — replaced regex with tree-sitter AST traversal for `require()`, Python `from-import`, and Bash recursion
    - **False positive reduction** — filter private symbols, deduplicate actions, calibrate severity, conservative test coherence
    - **Test discovery** — expanded test file discovery, inject tests into deliberation context, deduplication
    - **Deliberation memory overhaul** — group by symbol instead of per-axis entries, merge stale entries, escape regex, rebuild on corrupted JSON, truncate `original_detail` to reclassified axes only
    - **`--flush-memory` flag** — reset deliberation memory before a run
    - **Rate limit handling** — sleep until rate limit reset instead of degrading reviews
    - Raise default review concurrency from 4 to 8
    
    ### Report
    
    - **`public_report.md`** — new polished public-facing report layout
    - **Report upstream extracted** — `report upstream` migrated to standalone script in `anatoly-reports` repo
    - **Redesigned report sections** — merged Findings Summary into Axes table, emoji health bars, verdict breakdown for all-clear axes, doc coverage section, total findings count in hero block
    - Absolute links to anatoly-reports + breadcrumb navigation
    - Executive summary with all 7 axes, fix token metrics
    
    ### Clean (Ralph)
    
    - **Subcommand rename** — `clean-run` → `clean run`, `clean-sync` → `clean sync`, etc.
    - `clean generate`, `clean run`, `clean sync` as proper subcommands with tests
    - Bump default Ralph iterations from 10 to 50
    
    ### Documentation Pipeline
    
    - **Coherence optimization** — single-pass Sonnet, auto-fix, content injection (was multi-pass Opus)
    - **Incremental updates** — scope doc updates to touched modules only
    - **Doc deduplication** — auto-sync project docs from internal when trees are identical, `docs identity` and `docs reset-project` commands
    - **Batch doc chunking** — N Haiku calls → 1 per file, maximize concurrency
    - **Batch doc embeddings** — one mega-batch instead of per-file
    - Auto-fix missing h1 heading by deriving title from filename
    - Strip preamble from multi-turn agent output before writing
    - Raise scaffold maxTurns from 5 to 15, coherence review injection budget to 50%
    - Track RAG costs, expose cached status for doc section indexing
    - Modular type-context injection + remove dead strategy2
    
    ### CLI & UX
    
    - Show "Done !" instead of "0/N checks left" on file completion
    - Clear screen on terminal resize to prevent rendering glitches
    - Show "done" on completed pipeline tasks instead of stale counters
    - Reduce path truncation in progress display
    
    ### Rust Support
    
    - Chained `super::` resolution, workspace glob expansion, cross-crate usage tracking
    - Cargo workspace detection as Monorepo
    - Language-aware output formatting
    
    ### Security
    
    - Shell injection fixes — `execFileSync`, validate NWO format
    
    ### JSDoc Coverage (Ralph auto-clean)
    
    - Added JSDoc for ~200 symbols across 70 files (auto-generated via Ralph loop)
    
    ### Bug Fixes
    
    - RAG: block fallback to lite when index was built with advanced-gguf
    - Symlink guard on sync/reset, conditional sync task
    - Rate limiter max standby cycles, doc-gen rate limit detection
    - Workspace module extraction, gap detection, Cargo detection
    - Adversarial review — 12 fixes across CLI, RAG, core, commands
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.8.2...v0.8.3
  10. v0.8.2

    · v0.8.2

    ## What's Changed since v0.8.1
    
    ### Internal Documentation Pipeline (Major)
    
    **Full `docs scaffold` pipeline** — 6-step pipeline replacing `docs rebuild`:
    - Scaffold → Coherence review → RAG index → Gap detection → Update → Lint
    - Opus coherence review agent for cross-page structural + content consistency
    - Deterministic heading linter replaces LLM-based structure review
    - Neighbor page injection for cross-referencing during doc generation
    - Site map context injected into doc-writer prompts
    - `docs scaffold project` copies internal docs to `docs/` for publishing
    - README.md injected as context for scaffold page generation
    
    **Gap detection v2** — three-strategy architecture:
    - Pre-computed `doc_vector` for gap detection (no runtime embedding, $0)
    - Domain splitting for large codebases (sub-domains by file)
    - Internal and project scope support
    
    **New commands:**
    - `anatoly docs index` — standalone RAG indexing (code + NLP + doc chunks)
    - `anatoly docs update` — incremental doc updates with shared logic
    - `anatoly docs lint` — deterministic structure lint
    - `anatoly docs coherence` — lint + Opus coherence review
    - `anatoly docs gap-detection internal|project` — scope-aware gap analysis
    
    ### RAG Engine
    
    - Always dual embedding (code + NLP) — removed `dualEmbedding` flag
    - Doc chunk cache to avoid re-chunking on rebuild
    - `--drop-cache` flag for full rebuild (purges NLP summary + doc caches)
    - `docSummary` added to function cards for doc gap detection
    - Cumulative progress tracking for doc chunking (project + internal)
    - Auto-purge doc caches when store has no doc sections
    
    ### CLI & UX
    
    - Bump default SDK concurrency to 24
    - Show "deliberating" state in UI
    - Show "no project/internal docs" instead of "done" when none exist
    - All pipeline steps show file activity in "In progress"
    - Track current page during Sonnet update
    - Semaphore passed to Opus agents for accurate UI counter
    - Rate limit retry with exponential backoff for doc executor
    - Plain mode improvements: log task transitions, error details, pipeline crashes
    - Fix deadlock from double semaphore acquire
    
    ### Setup & Configuration
    
    - Shared models directory across projects (skip redundant pulls)
    - Remove deprecated `--ab-test` flag from `setup-embeddings`
    
    ### Bug Fixes
    
    - Fix doc_vector migration warning and scaffold task gap
    - Fix stale review-internal refs and coherence prompt contradictions
    - Anti-hallucination + overwrite guard in content review prompt
    - Linter detects unnumbered files in numbered directories
    - Generic reference matching + deduplicate `pagesToUpdate`
    - `--plain` no longer implies `--yes` (require explicit `-y` for destructive ops)
    - Increase Opus agent maxTurns from 50 to 200
    - Fix `onFileDone` for files with 0 sections (stuck "In progress")
    - Block `docs index` if scaffold-only pages detected
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.8.1...v0.8.2
  11. v0.8.1

    · v0.8.1

    ## What's Changed since v0.8.0
    
    ### Prompt System Overhaul (Epic 33 + 34)
    
    **Epic 33 — Universal Prompt Registry**
    - Migrated all 28 system prompts from inline strings to `.system.md` files (`src/prompts/axes/`)
    - Extracted 8 additional inline prompts to dedicated files
    - Extended `prompt-resolver.ts` to universal registry (37 entries)
    - Added bidirectional coherence tests for the registry
    - Adversarial review & auto-fixes
    
    **Epic 34 — Prompt Reinforcement (14 edge cases fixed)**
    - **Story 34.1**: Removed contradictory JSON fences from 6 axis prompts, dynamic axis count
    - **Story 34.2**: Added `guard-rails.system.md` anti-hallucination layer (confidence floor, symbol validation, line-range enforcement)
    - **Story 34.3**: Score calibration anchoring for all 12 best-practices prompts (TypeScript + 11 languages)
    - **Story 34.4**: Edge case handling for generated code, reinforced doc-writer and nlp-summarizer
    - **Story 34.5**: Dynamic Zod schema → JSON example injection into axis system prompts
    - **Story 34.6**: Gold-set integration test suite (8 fixtures, real LLM validation)
    - **Story 34.7**: Adversarial validation pass across all stories
    
    ### Doc Generation
    - Enriched LLM context with real project prerequisites
    - LLM detects system dependencies from code (Docker, Redis, etc.)
    - Prevented hallucinated API key instructions
    - Separated bootstrap from update phase
    - Added `anatoly docs` command with rebuild and status subcommands
    - Switched default model from Haiku to Sonnet for better quality
    
    ### CLI & Infrastructure
    - Added `--keep-docs` flag to reset command
    - Extracted setup table renderer, enriched estimate command
    - Added plain mode output for doc generation phases
    
    ### Scanner
    - Download tree-sitter grammars on-the-fly with local caching
    
    ### RAG
    - Parallelized doc chunking
    - Skip scaffolded-only doc pages from RAG indexing
    - Fixed stale "Saving index" display
    
    ### Report
    - Replaced page ratio with symbol-based coverage metric
    
    ### Review Engine
    - Wired prompt cascade into evaluators
    - Dynamic code fence tags (no more hardcoded TypeScript)
    - Derived axis count dynamically from evaluator registry
    
    ### Bug Fixes & Quality
    - Adversarial reviews & auto-fixes for Epics 28, 29, 31, 33, 34
    - Resolved pre-existing TypeScript compilation errors
    - Code generation marker rule shared via guard-rails (all axes)
    - Fixed gold-set test regex, symbol line numbers, and test assertions
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.8.0...v0.8.1
  12. v0.8.0 — Multi-Language Support

    · v0.8.0

    ## Highlights
    
    ### Multi-Language Support (Epic 31)
    - **10 languages**: TypeScript, TSX, Python, Rust, Go, Java, C#, Bash/Shell, SQL, YAML, JSON
    - **Auto-detection**: language distribution by file extension + framework detection (React, Next.js) by project markers
    - **AST parsing**: tree-sitter grammar manager with dynamic WASM loading per language
    - **Language-specific best-practices**: dedicated prompts for each language (PyGuard, RustGuard, GoGuard, ShellGuard, JavaGuard, CSharpGuard, SqlGuard, YamlGuard, JsonGuard)
    - **Framework-aware evaluation**: React and Next.js specific prompts for best-practices and documentation axes
    - **Auto-detect scanning**: `scan.auto_detect` discovers project files across all supported languages automatically
    - **Usage graph**: multi-language import tracking (Python, Rust, Go, Bash source/dot imports)
    
    ### Documentation Pipeline (Stories 29.16-29.21)
    - **LLM doc generation**: semaphore-bounded concurrent doc page generation via Haiku
    - **Dynamic module pages**: scaffolder injects module-specific pages based on codebase analysis
    - **Dual doc context**: documentation axis pulls from both `docs/` (user) and `.anatoly/docs/` (internal reference)
    - **Configurable docs_path**: `documentation.docs_path` in `.anatoly.yml`
    - **Distinct coverage tracking**: separate project vs internal export documentation coverage
    - **Pipeline decoupling**: internal docs, post-review update, and bootstrap run as independent stages
    
    ### Fixes
    - Review cache is now per-axis (prevents stale results when switching axes)
    - Pipeline CLI display: proper left/right alignment in task list
    - Ctrl+C interrupt no longer dumps noisy error traces
    - Pre-existing TypeScript type errors resolved across tests and adapters
  13. v0.7.0

    · v0.7.0

    ## Highlights
    
    - **Documentation scaffold pipeline** — full doc-scaffold system: project type detection, structure scaffolding, source code analysis, LLM page generation, scoring, recommendations, and Ralph sync mode (Epic 29)
    - **SDK concurrency control** — global semaphore bounds parallel API calls to prevent rate-limit storms (Epic 30)
    - **Docker GGUF embedding backend** — replace Python sidecar with Docker-based GGUF containers for GPU-accelerated embeddings (Epic 28)
    - **RAG documentation indexing** — doc section extraction, NLP embeddings, and type-based filtering for review context
    - **Structured metrics & events** — timeline, conversationStats, per-file/per-axis structured events in run-metrics.json
    - **Report restructuring** — reports organized by axis with independent indexes
    
    ## Features
    
    - **Doc scaffold pipeline** — project type detection, module granularity resolution, contextual scaffolding hints, code-to-doc mapping, LLM page content generation, incremental SHA-256 cache, 5-dimension scoring, dual-output recommendations, Ralph sync mode, and full CLI integration via `anatoly run` (Stories 29.1–29.15)
    - **Concurrency** — global SDK semaphore wired through evaluation pipeline with configurable concurrency limits (Story 30.1)
    - **Flat render** — code review fixes for Story 31.1
    - **Docker GGUF containers** — VRAM detection, tier selection, A/B test for bf16 vs GGUF, container lifecycle management, setup wizard with Docker + NVIDIA Container Toolkit install (Epic 28)
    - **RAG enhancements** — doc section indexing with NLP embeddings, sidecar model swap, per-function NLP summary cache, documentation axis in deliberation system, `--docs` flag for RAG status
    - **Structured events** — per-file and per-axis structured events, unified run context for all commands, conversation dump infrastructure per LLM call
    - **Timeline & metrics** — timeline phases, conversationStats with byModel breakdown in run-metrics.json
    - **RAG UX** — animated spinners, phase checkmarks, concurrent file display, 3-phase progress (code/NLP/doc)
    - **Haiku semantic chunking** — doc sections refined via Haiku before embedding
    - **Batch embeddings** — batched NLP and code embedding requests for performance
    
    ## Fixes
    
    - Triage phase added to timeline, byModel fix in conversationStats
    - Normalize averaged NLP vector, restore lite-mode doc fallback
    - GGUF container lifecycle: kill zombies on startup, stop on force exit, verify alive before reuse
    - RAG progress display fixes (output stacking, concurrency display, Listr corruption)
    - Setup-embeddings routes correctly per backend tier
    - SHA-based cache for doc section indexing
    - Guard test ensuring Anatoly never writes to `docs/`
  14. v0.6.0

    · v0.6.0

    ## Highlights
    
    - **7th evaluation axis: Documentation** — evaluates JSDoc coverage on exports and `/docs/` synchronization
    - **Deliberation memory** — generalized across all axes with learning loop feedback
    - **AGPL-3.0 + dual licensing** — migrate from Apache-2.0, add commercial license option
    - **Dry-run mode** — simulate the full pipeline without API calls (`--dry-run`)
    - **Calibrated ETA** — per-axis timing based on historical runs
    - **Sentence-transformers sidecar** — GPU-accelerated embeddings via nomic-embed-code 7B (3584d)
    
    ## Features
    
    - **Documentation axis** — `DOCUMENTED` / `PARTIAL` / `UNDOCUMENTED` verdicts, docs_coverage in review JSON (Epic 26)
    - **Deliberation memory** — persistent false-positive registry covers all axes, feeds back into learning loop
    - **Holistic deliberation** — covers all axes per symbol with transitive usage-graph refs, NIH detection
    - **Calibrated ETA** — estimate and pipeline summary display calibrated per-axis timing
    - **Branch isolation** — `clean-run` enforces branch isolation before Ralph loop
    - **Run lock** — block concurrent commands while a run is in progress
    - **`--dry-run`** — phase-based estimate, calibrated timing, no runDir creation
    - **`--axes` CLI option** — run specific axes (e.g. `--axes correction,tests`)
    - **Tests axis enrichment** — test file content, callers, and project tree in context
    - **Run Statistics & Axis Summary** — new report sections
    - **`init` & `setup-embeddings` commands** — one-command project setup
    - **RAG observability** — `rag-status` shows lite+advanced indexes
    - **Dual code+NLP embedding** — hybrid similarity search with configurable weights
    - **Hardware detection** — auto-select embedding models based on available hardware
    - **Colored MOTD banner**, sidecar lifecycle overhaul, Ralph circuit breaker
    
    ## Fixes
    
    - Documentation axis calibration, merge pipeline, prefix matching
    - Adversarial review: wire docsTree, fix 6→7-axis refs
    - Accumulative cache for complete reports across runs
    - Best practices & tests findings trigger NEEDS_REFACTOR
    - Calibration: max(axis) for parallel model, remove 3s sleep
    - Deliberation: only reviews symbols with findings, always active by default
    - Report: hide non-executed axes, full deliberation reasoning
    - RAG: Arrow FloatVector crash, ONNX fallback
    - Severity labels: French → English (CRITIQUE→CRITICAL, HAUTE→HIGH, MOYENNE→MEDIUM)
    - Sidecar: correct dimensions, venv isolation, loading progress
    
    ## Breaking Changes
    
    - **License**: Apache-2.0 → AGPL-3.0 (commercial license available, see COMMERCIAL.md)
    - `fix`/`fix-sync` commands renamed to `clean`/`clean-sync`
    - `--dual-embedding` replaced by `--rag-lite` / `--rag-advanced`
    - Hook subcommands renamed to `on-edit`/`on-stop`
  15. v0.5.1 — MOTD Banner, RAG Modes & Sidecar Overhaul

    · v0.5.1

    ## Features
    
    - Colored MOTD ASCII banner at startup with auto-generated sync script
    - Sidecar lifecycle overhaul — cleanup, scoped spawn, idle timeout
    - Show RAG file count in setup summary table
    - List available axes in --axes help text
    - Add --lite / --advanced RAG mode selection with separate indexes
    - Show '-' for unevaluated axes when using --axes filter
    - Auto-disable dual embedding when nomic-7B sidecar is active
    - Show sidecar loading progress in CLI spinner with elapsed time
    - Replace Ollama with sentence-transformers sidecar for GPU embeddings
    - Add Ollama runtime for GPU-accelerated embeddings and fix RAG schema mismatch
    - Add --axes CLI option for runtime axis selection (Epic 26)
    - Add Ollama backend for GPU-accelerated code embeddings via nomic-embed-code
    - Harden Ralph clean loop with circuit breaker, anti-placeholder guards, and adaptive PRD
    - Add RAG pipeline evaluation framework with ground-truth benchmarks
    - Add hardware detection and configurable dual embedding models
    - Add dual code+NLP embedding for improved semantic duplication detection
    - Add final clean-sync after clean-run completes
    - Rename fix → clean commands to match Anatoly branding
    - Native TypeScript Ralph loop via Claude Agent SDK
    
    ## Fixes
    
    - Resolve pre-existing TypeScript compilation errors
    - Review progress counter matches triage evaluate count
    - Exclude NLP-failed cards from cache so they get retried
    - Purge RAG cache when vector store is empty
    - Remove Arrow table migration — drop legacy table instead
    - Align ConfigSchema default dual_embedding to true
    - ONNX fallback always uses Jina, not the sidecar model
    - Start sidecar BEFORE model resolution to avoid chicken-and-egg
    - Correct nomic-embed-code dimension to 3584d (7B hidden state)
    - Add model load timer to --check and increase sidecar timeout to 180s
    - Correct model download size to ~14 GB (7B model in FP16)
    - Use correct model ID nomic-ai/nomic-embed-code (public, no auth)
    - Move embedding venv to .anatoly/.venv to avoid project collision
    - Fallback to ONNX when Ollama embed fails instead of truncating
    - Truncate input to Ollama embed to avoid GGML_ASSERT overflow
    
    ## Refactoring
    
    - Replace --dual-embedding with --rag-lite / --rag-advanced
    - Simplify estimate and triage CLI output
    - Use Claude Code CLI instead of SDK for Ralph loop
    
    ---
    
    > *"Can I clean here?"* — Anatoly looked at the codebase, mop in hand, and sighed. Fifty-one files. Eighteen thousand lines of dead imports. A `TODO` from 2019. He wrung the mop. *"Da. I clean everywhere."*
  16. v0.5.0 — Fix Command & Documentation Restructure

    · v0.5.0

    ## Highlights
    
    **`anatoly fix`** — The cleaning man now fixes what he finds. Parse an audit report shard, generate Ralph artifacts, and launch an autonomous correction loop. Every finding gets a deterministic checkbox ID for traceable remediation.
    
    ### Features
    
    - **`anatoly fix <report-file>`** — Generate prd.json + CLAUDE.md + ralph.sh from a report shard
    - **`anatoly fix-sync <report-file>`** — Sync completed fixes back to the report (shard + index checklist)
    - **Checkbox rendering** — Every action gets `- [ ] <!-- ACT-{hash}-{id} -->` for deterministic matching
    - **Aggregated Checklist** — Report index now includes a severity-sorted checklist of all actions
    
    ### Fixes
    
    - `--file` filter now correctly scopes estimate and triage phases (was estimating full project)
    - Removed dead `summary`/`keyConcepts`/`behavioralProfile` fields from `rag-status` display
    
    ### Documentation
    
    - **Complete restructure**: 27 files across 7 sections, all cross-referenced against source code
    - Getting Started, Architecture, CLI Reference, Core Modules, Integration, Development, Design Decisions
    - 32 factual corrections applied after adversarial review
    - README updated with new doc links and [Anatoly Shmondenko](https://www.youtube.com/@vladimirfitness) origin story
    
    See [CHANGELOG.md](CHANGELOG.md) for full details.
  17. v0.4.2

    · v0.4.2

    ## What's Changed
    
    ### Enriched Review Reports (.rev.md)
    
    - **Best Practices section** — Score, rules table (WARN/FAIL only), and suggestions with before/after code blocks now rendered in individual file reviews
    - **Structured symbol details** — Pipe-delimited axis output parsed into per-axis bullets (Utility, Duplication, Correction, Overengineering, Tests)
    - **Categorized actions** — Quick Wins / Refactors / Hygiene sections with effort estimates
    - **Exported column** added to symbols table
    - **Defaulted axes flagged** — When an evaluator didn't produce a result, the review now shows *(default — evaluator did not produce a result)*
    
    ### Transcript Persistence
    
    - Axis evaluation transcripts are now persisted to runDir/logs/<file>.log for both run and review commands, enabling diagnosis of failed evaluators
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.4.1...v0.4.2
  18. v0.4.1

    · v0.4.1

    ## What's Changed
    
    ### Bug Fixes & Improvements
    
    - **Dependency-aware evaluations** — Correction and best-practices axes now receive project dependency versions, reducing false positives (e.g., no longer flags missing try/catch when Commander v14+ handles async rejections natively)
    - **Implicit no-cache for explicit commands** — `anatoly review` and `anatoly run --file` always re-evaluate files instead of skipping cached results
    - **Renamed "dead code" → "utility"** in CLI display for clarity
    
    ### New in the Axis Pipeline (Epic 19)
    
    - 6 axis evaluators: utility, duplication, overengineering, tests, correction, best_practices
    - Axis merger, file evaluator, and simplified review pipeline
    - Worker pool with compact axis progress display
    - Updated estimator and reporter for the axis pipeline
    
    ### Internal
    
    - New `dependency-meta` module with 14 unit tests
    - Schemas v2 for multi-axis review output
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.4.0...v0.4.1
  19. v0.4.0 — Triage, Fast Review & Sharded Reports

    · v0.4.0

    ## What's New
    
    ### Triage Pipeline (Epic 16)
    Files are now automatically classified into three tiers before review:
    - **Skip** — barrels, type-only, constants → synthetic CLEAN review, zero API calls
    - **Fast** — simple files (< 50 lines, < 3 symbols) → single-turn review (~5s)
    - **Deep** — complex files → full agentic investigation (~45s)
    
    Use `--no-triage` to disable and review all files with the full agent.
    
    ### Pre-computed Usage Graph (Epic 16)
    A full import graph is built across all project files in a single local pass (< 1s). The agent receives pre-computed usage data in its prompt, eliminating ~90 redundant Grep tool calls per review for dead code verification.
    
    ### Fast Reviewer (Epic 17)
    Simple files (fast tier) are reviewed in a single `query()` call with no tools — all context (file content, symbols, usage graph, RAG results) is included inline. If Zod validation fails after 2 attempts, the file is automatically promoted to deep review.
    
    Optional `fast_model` config field lets you use a cheaper model (e.g., `claude-haiku-4-5-20251001`) for fast-tier reviews.
    
    ### Sharded Reports (Epic 18)
    Reports are now split into:
    - `report.md` — compact index (~100 lines) with executive summary, severity table, and checkbox links to shards
    - `report.N.md` — per-shard detail files (max 10 files each), sorted by severity
    
    When triage is active, the index includes a **Performance & Triage** section showing skip/fast/deep distribution and estimated time saved.
    
    ### Code Review Fixes
    - Removed 180-line legacy `renderReport()` (dead code, used raw LLM verdict instead of `computeFileVerdict`)
    - Deduplicated `loadTasks()` calls in run pipeline (3x → 1x disk I/O)
    - Fast transcripts use `.fast.transcript.md` suffix to avoid overwrite on deep promotion
    - Added `export * from` handling in usage graph (prevents false DEAD on star re-exports)
    
    ## Commits
    
    - feat(triage): add file triage module for skip/fast/deep classification
    - feat(usage-graph): add pre-computed import usage graph
    - feat(prompt): inject pre-computed usage graph into agent prompt
    - feat(pipeline): integrate triage and usage graph into run pipeline
    - feat(fast-reviewer): add simplified single-turn reviewer for fast-tier files
    - feat(pipeline): dispatch fast-tier files to fast-reviewer with deep promotion
    - feat(reporter): shard report into index + per-shard files (max 10 files each)
    - feat(reporter): add Performance & Triage section to index when triage active
    - fix: code review fixes for v0.4.0 (epics 16-18)
    - chore: bump version to 0.4.0 and update README
    
    **Full Changelog**: https://github.com/r-via/anatoly/compare/v0.3.0...v0.4.0
  20. v0.3.0 — Conformity Audit & Fixes

    · v0.3.0

    ## Changelog (v0.2.0 → v0.3.0)
    
    ### Features
    
    - **Parallel reviews** — Worker pool with `--concurrency N` (default 4), thread-safe ProgressManager with serialized writes (`c18b8d8`)
    - **Rate limiting** — Exponential backoff for Sonnet reviews (base 5s, max 120s, jitter ±20%, 5 retries) (`00dbd72`)
    - **Multi-file renderer** — Worker slots showing `[1] reviewing...` per concurrent review, completion order in flow zone (`fe03e10`)
    - **Claude Code hooks** — `anatoly hook post-edit` (async background review), `hook stop` (quality gate + feedback injection), `hook init` (template generator) (`ed8424f`, `f501d44`)
    - **`min_confidence` config** — Filter hook findings below threshold (default 70) (`f501d44`)
    - **`max_stop_iterations` config** — Anti-loop protection for hook stop cycle (default 3) (`c1276a6`)
    - **RAG pre-resolved** — Similarity results injected statically in prompt, MCP tool removed (`96fe7d2`)
    - **Parallel RAG indexation** — Haiku calls distributed via worker pool with rate limiting (base 2s, max 30s) (`87f7794`, `d7c2ca4`)
    - **`index_model` config** — Configure RAG indexing model separately (default `claude-haiku-4-5-20251001`) (`c5df16d`)
    - **RAG on by default** — `--no-rag` to disable, launch banner shows model info (`29c9169`)
    - **RAG garbage collection** — Stale index entries for deleted/renamed files are automatically purged on re-index (`0538f73`)
    
    ### Fixes
    
    - **`$NO_COLOR` env var** — Respects [no-color.org](https://no-color.org) standard (`c1276a6`)
    - **`review` command renderer** — Now uses `renderer.ts` with progress bar and counters instead of raw `console.log` (`c1276a6`)
    - **Hook stop anti-loop** — `stop_count` tracked in HookState, exits silently at max iterations (`c1276a6`)
    - **Hook spec alignment** — `decision: "block"` output format per Claude Code Stop hook protocol (`2f557a7`)
    - **RAG code review fixes** — 4 issues from Epic 12 (orchestrator, cache, error handling) (`c7d514a`)
    - **Progress bar stuck** — Fixed renderer scrolling up during review (`9ffa954`)
    - **Ctrl+C reliability** — Interrupt now works in all commands (`e447ef7`)
    - **Error messages** — Improved hints, verdictColor DRY, README update (`25edcce`)
    
    ### Self-Audit Fixes (`daeb3d9`)
    
    Anatoly audited its own codebase — 14 files reviewed, 29 findings. Applied in a single pass:
    
    - **scanner** — `abstract_class_declaration` now recognized as `class` symbol kind
    - **run** — AbortController renewal race fixed, dynamic `index_model` label, retry count no longer hardcoded
    - **reviewer** — dead `retries` counter removed, `tools`/`allowedTools` deduplication
    - **cache** — `readProgress` validates with `ProgressSchema.safeParse`, `atomicWriteJson` cleanup on failure
    - **vector-store** — `distanceToCosineSimilarity` computed once per row, `safeParseJsonArray` type-filters elements
    - **format** — `formatResultLine` uses `verdictColor()` instead of inlined switch
    - **hook-state** — null-review guard fixed (`typeof null === 'object'`)
    - **process** — `isProcessRunning` returns `true` on `EPERM` (cross-user process)
    - **lock** — explicit `unlinkSync` for stale lock cleanup
    - **config** — 6 dead sub-schema type exports removed
    - **task** — `TaskSchema`, `SymbolKindSchema`, `SymbolInfoSchema`, `CoverageDataSchema` exported
    - **report** — shared `progressPath`/`errorFiles` extraction hoisted out of branches
    
    ### Refactoring
    
    - **RAG orchestrator decoupled** — `processFileForIndex()` as pure function, `needsReindex()` extracted, batch upsert post-pool (`eb88562`)
    
    ### Docs
    
    - **README rewrite** — Stronger positioning, hook documentation, Mermaid architecture diagram (`42b068e`)
    
    ### Stats
    - **280 tests passing**
    - Build: 134.85 KB (ESM)
    - 19 commits