Use case

Audit your code on local LLMs. No network egress.

Run the full multi-axis audit on a model you host yourself. With local RAG embeddings, no source byte ever crosses your firewall — works with Ollama, LM Studio, vLLM, or any OpenAI-compatible server.

Why local LLMs for code review

Most AI code review tools assume your code can leave your network. It rarely can. Banking, healthcare, defense, public sector, tech due diligence under NDA — these are environments where shipping source to a hosted SaaS endpoint is simply a non-starter.

Anatoly was designed from day one to run as a CLI on your machine with no required outbound calls beyond the model provider you choose. Choose a local provider and the audit pipeline - multi-LLM orchestration, semantic indexing, evidence gathering, and the three-tier deliberation pass - runs entirely inside your perimeter.

your network · no outbound

your perimeter

CLI

anatoly

your code

LLM

Ollama

localhost:11434

RAG

GGUF / Jina

on-device

┌─ no third-party API ─ no SaaS ─ no telemetry ─┐

Wire it up in two minutes

Drop a .anatoly.yml at your project root, point Anatoly at your local server, and run.

.anatoly.ymllocal-only mode

version: 3

providers:
  ollama:
    transport: openai_compatible
    base_url: http://localhost:11434/v1
    models:
      - qwen2.5-coder:14b
  local-embeddings:
    transport: onnxruntime_node    # in-process, no API key
    models:
      - jinaai/jina-embeddings-v2-base-code

routing:
  generation:
    quality: ollama/qwen2.5-coder:14b
  embeddings:
    code: local-embeddings/jinaai/jina-embeddings-v2-base-code

~/your-project

$ ollama serve # in another shell
$ npx anatoly run --plain
→ scanning · 142 files · TS/TSX
→ embedding · 1842 chunks · jina-v3-cpu (4m12s)
→ axes (parallel) · qwen2.5-coder:14b · 7/7 OK
→ deliberation · 18 findings → 9 confirmed
✓ .anatoly/runs/2026-05-05_103245/public_report.md

Supported local backends

Ollama - point Anatoly at http://localhost:11434/v1 via the OpenAI-compatible transport. Works with any GGUF model Ollama can serve.
LM Studio - same OpenAI-compatible transport, different default port. The CLI auto-detects the local server.
vLLM - production-grade local inference with continuous batching. Recommended when you have a GPU and run audits on large repos.
llama.cpp / TGI / any OpenAI-compatible - if it speaks the OpenAI chat-completion API, Anatoly can drive it.

Local RAG embeddings

The semantic index is the part that often forces teams back to a hosted embeddings API. Anatoly ships two local options:

GGUF embeddings on GPU via any OpenAI-compatible server (e.g. nomic-embed-code-gguf) - fast, accurate, fully local.
ONNX embeddings in-process via the onnxruntime_node transport (e.g. Jina v2 code, MiniLM) - portable, runs anywhere, no API key.

Wire either as a provider in .anatoly.yml, then point routing.embeddings.code at it.

What "no network egress" actually means

With a local provider configured for both generation and embeddings, Anatoly makes zero outbound calls. Even the dependency-README pass that grounds the Correction axis reads READMEs from your local node_modules/ - never from npm, GitHub, or any registry.

Air-gapped boxes, defense procurement, NDA-bound client perimeters: no firewall exception to request, no egress proxy to whitelist.

Who this fits

Sovereign & defense procurement — EU, FR, public sector, where prompts and source must stay under your jurisdiction.
Regulated industries — banking, healthcare, where source code can't touch a SaaS endpoint.
Tech-DD & expert witness — audits that run inside the client's perimeter, with no data residency questions to answer in the report.

What you get

The same Markdown report as the cloud-backed runs: a headline verdict, a 7-axis scorecard, and findings that name file + line + evidence - every one proven by grep, file reads, and RAG queries against the local index. The reports are reproducible and live in your repo under .anatoly/.

Other use cases

Lovable · Cursor · v0 · Bolt

Audit your code on local LLMs. No network egress.

Why local LLMs for code review

Wire it up in two minutes

Supported local backends

Local RAG embeddings

What "no network egress" actually means

Who this fits

What you get

Vibe-coded site audit

Claude Code audit

Tech due diligence

AI code audit in your CI