Research
Articles & deep-dives
Technical research from the Anatoly team: benchmarks, comparisons, and findings reports on AI-assisted code auditing.
Featured
AI Code Audit vs AI Code Review in 2026: The 14 Tools That Matter, Sorted by What They Actually Do
A taxonomy and curated comparison of 14 AI code audit and AI code review tools available in 2026. Sorted by whether they ship at PR time (review) or scan existing codebases (audit), with pricing, self-hosting, local-model support, and honest tradeoffs for each.
Concision discipline: a Pareto-improving prompt strategy for code-audit agents
An empirical study showing that a 12-line anti-filler instruction added to a code-audit agent's system prompt simultaneously cut output tokens by 24.7%, cost by 20.7%, wall-clock duration by 27.9%, and improved F1 recall by 9.1 points on the slot-engine fixture.
All articles
Detecting Semantic Conflicts Between Documents: A Pragmatic Pipeline
A four-stage pipeline for finding where two documents contradict each other, not just where they overlap: chunking and embedding, cosine pre-filtering, section deduplication and neighbor expansion, then NLI or LLM inversion detection. Includes a CPU-only deployment path and a sub-ten-cent cost model on a realistic workload.
Should we swap Anatoly's RAG for PageIndex? An open question with a measurable answer
Framing a tooling question instead of decreeing it. The two systems are shaped differently: a function-level vector lookup on one side, an LLM-driven TOC walk on the other. Whether one should replace the other depends on workload economics and test conditions we have not yet measured. This note states the question honestly, lays out our prior, and describes the bounded experiment that would settle it.