Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
- No LLM SDK at install time. All provider deps are
optional extras. Two required runtime deps:
structlog,jinja2. - Pluggable corpus. Use attune-help (the default), any
markdown directory, or your own
CorpusProtocol. - Returns a prompt string + citation records by default
—
pipeline.run()never opens a network connection. You call your own LLM however you like. Optional provider adapters ship convenience wrappers. - Optional hybrid retrieval.
QueryExpanderandLLMRerankerlayer Claude Haiku on top of keyword retrieval to improve recall and precision — both opt-in, both fail-safe.
Most RAG libraries ship features. attune-rag ships measured
quality numbers and gates merges against them. The CI badge
isn't "tests pass" — it's P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686 (locked at
docs/specs/release-quality-baseline/baseline-1.md)
plus per-axis CPU + wall-clock perf thresholds (locked at
docs/specs/downstream-validation/perf-baseline.md).
A PR that drops mean_faithfulness below 0.9686 fails CI
automatically. Same for any latency hot-path regressing past
mean + 2σ. That's the differentiator.
| attune-rag | LangChain | LlamaIndex | |
|---|---|---|---|
| Required runtime deps | 2 | many (transitively, ~30+) | many (~25+) |
| LLM SDK at install | none | bundled | bundled |
| Published quality regression thresholds | yes (P@1, R@3, faithfulness) | no | no |
| Published perf thresholds (wall + CPU) | yes | no | no |
| Citation primitives built-in | yes | add-on | add-on |
| "Get a string back, call your own LLM" | default | possible w/ effort | possible w/ effort |
LangChain and LlamaIndex are fantastic frameworks if you want batteries-included orchestration. attune-rag is the alternative when you want a RAG component you can drop into an existing app without buying into a framework — and want the quality bar quantified, not implied.
Beyond drop-in retrieval, attune-rag is the grounding foundation
for the attune-* family's content-quality discipline. The
attune-author polish/fact-check pipeline uses attune-rag's
retrieval + faithfulness primitives to verify generated help
content is grounded in source material before it's marked
authoritative — the same mean_faithfulness ≥ 0.9686 discipline
that gates this library's own benchmarks, extended to the
authoring loop.
Honest exclusions, so you can self-disqualify if you need any of these:
- Not an agent framework. No multi-step chains, no tool-use orchestration, no agent loops.
- Not a document-parsing toolkit. Bring your markdown
already-parsed; use
unstructured.ioor similar upstream. - Not a vector DB integration. Keyword retrieval is the
default; you wire your own vector store if you need one (an
EmbeddingRetrieveris on the post-freeze roadmap — see below). - Not a one-line-install batteries-included framework. That's LangChain / LlamaIndex. attune-rag is for the case where that's too much.
pip install attune-rag # core only
pip install 'attune-rag[attune-help]' # + bundled help corpus
pip install 'attune-rag[claude]' # + Claude adapter
pip install 'attune-rag[gemini]' # + Gemini adapter
pip install 'attune-rag[all]' # everythingpip install 'attune-rag[attune-help,claude]'import asyncio
from attune_rag import RagPipeline
async def main():
pipeline = RagPipeline() # defaults to AttuneHelpCorpus
response, result = await pipeline.run_and_generate(
"How do I run a security audit with attune?",
provider="claude",
)
print(response)
print("\nSources:", [h.entry.path for h in result.citation.hits])
asyncio.run(main())pip install 'attune-rag[attune-help,gemini]'response, result = await pipeline.run_and_generate(
"...", provider="gemini", model="gemini-1.5-pro",
)from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus
pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")
# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.📖 Building a quality corpus. See
docs/USER_CORPUS_GUIDE.mdfor the corpus-authoring discipline that produced the bundled attune-help corpus's 100% / 100% baseline + 100% paraphrased R@3: frontmatter aliases, multi-token intent, theMIN_ALIAS_OVERLAPknob, stemmer traps, the override file pattern, and the strict-dominance measurement loop. The guide is the v0 forerunner of the v1.0.0 framework framing (user-corpus-onboardingspec).
QueryExpander and LLMReranker require the [claude] extra and an
ANTHROPIC_API_KEY. Both are opt-in and fail-safe — any API error
falls back to keyword-only order automatically.
from attune_rag import RagPipeline, LLMReranker, QueryExpander
# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())
# Expander + reranker (max coverage):
pipeline = RagPipeline(
expander=QueryExpander(),
reranker=LLMReranker(),
)Headless toolkit for tools that need to validate, lint, and refactor a
template corpus — used by the attune-gui
template editor and the attune-author
edit CLI, but works standalone with any
CorpusProtocol.
| API | What it does |
|---|---|
load_schema() |
Loads template_schema.json (the v1 frontmatter contract: required type enum + name; optional tags, aliases, summary, source, hash; additionalProperties: true). |
parse_frontmatter(text) / validate_frontmatter(data) |
Split a template into frontmatter + body and report typed FrontmatterIssues — used by linters and editors. |
lint_template(text, rel_path, corpus) |
Returns Diagnostic[] for schema violations, broken [[alias]] references, and depth-marker sequence errors. 1-indexed line/col ranges. |
autocomplete_tags(corpus, prefix, limit) / autocomplete_aliases(corpus, prefix, limit) |
Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates. |
find_references(corpus, name, kind) |
Locate every alias/tag/path occurrence across body, frontmatter, and cross_links.json. |
plan_rename(corpus, old, new, kind) |
Build a RenamePlan (one FileEdit per affected file with unified-diff hunks) for kind="alias" or "tag". Raises RenameCollisionError on existing alias targets. |
apply_rename(corpus, plan) |
Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths. |
Schema, lint, and rename are pure functions over CorpusProtocol — no I/O,
no global state. All three pieces are tested as a unit and used live by the
attune-gui editor's /api/corpus/<id>/lint, /autocomplete, and
/refactor/rename/{preview,apply} routes.
from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename
corpus = DirectoryCorpus(Path("./templates")).load()
# Validate a template before saving
diagnostics = lint_template(
text=Path("./templates/concepts/foo.md").read_text(),
rel_path="concepts/foo.md",
corpus=corpus,
)
# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)attune-rag dashboard show # live terminal dashboard
attune-rag dashboard render --out report.html # HTML snapshotattune-rag locks two baselines, both gated by CI. Thresholds
are empirically derived (mean ± 2σ) from back-to-back
benchmark runs on an unchanged HEAD — grounded, not guessed.
| Metric | Threshold (current) | Source |
|---|---|---|
precision_at_1 |
≥ 0.95 | retrieval, deterministic |
recall_at_3 |
= 1.00 | retrieval, deterministic |
mean_faithfulness |
≥ 0.9686 | Claude judge, σ ≈ 0.005 |
Gated by .github/workflows/benchmark.yml.
Faithfulness gating engages when the PR touches retrieval,
reranker, expander, pipeline, prompts, or eval paths, or when
the PR title contains [full-bench]. Methodology + raw numbers
in docs/specs/release-quality-baseline/.
Locked dual-axis (wall-clock + CPU-time) thresholds on the four benchmarks. CPU-time is the gating axis (deterministic); wall-clock is advisory.
Numbers measured under the V2 multi-run methodology (5
invocations × 20 runs = 100 measurements per metric) on the
locked-baseline runner (Linux ubuntu-latest, CPython 3.11.15).
Inter-run and intra-run variance are tracked separately;
thresholds are mean + 2σ × inter_run_stdev. Full 8-row
dual-axis table + hardware fingerprint + per-metric noise
profile:
docs/specs/downstream-validation/perf-baseline.md.
Why two threshold styles in the locked table:
keyword_retriever_retrievehas a wider CPU band because measured intra-run variance reflects cold-cache effects on the first few iterations — empirically derived, not tuned for tightness.llm_reranker_rerankis wall-clock-only because Anthropic network variance dominates the CPU axis; the gate is set generously.
Gated by .github/workflows/perf.yml
per-PR (blocking on the CPU axis as of W3.1).
Most RAG libraries A/B-test internally and ship the result. attune-rag publishes the thresholds, gates merges against them, and re-measures whenever the corpus, judge prompt, or hardware changes. The receipts are checked in.
The repo ships a polished .help/ corpus that documents
attune-rag's own surface — 143 templates across 13 features ×
11 kinds (concept, task, reference, quickstart, faq,
error, warning, tip, note, comparison,
troubleshooting). Generated by
attune-author with
strict fact-check; queryable via AttuneHelpCorpus or as the
bundled default for RagPipeline(). See
.help/features.yaml for the feature
map and .help/templates/ for the content.
The 13 features: pipeline, retrieval, corpus, prompts,
provenance, providers, eval, benchmark, cli, editor,
dashboard, expander, reranker.
Faithfulness scores how well an answer is grounded in the retrieved
passages — 1.0 means every claim in the answer is supported by a
cited source; lower scores mean some claims have no support in the
context. It catches hallucination in a way that precision_at_k and
recall_at_k can't: those only measure whether the right documents
were retrieved, not whether the generated answer actually used them.
attune-rag uses Claude as the judge via Anthropic's tool-use API
to produce a structured score in [0.0, 1.0] for each
(query, answer, retrieved_context) triple. The reported metric is
the mean over the golden query set. Aggregate σ ≈ 0.005 over 40
queries even though per-query judge non-determinism can swing 40+
percentage points on individual queries — averaging absorbs the noise.
The same discipline powers attune-author's polish/fact-check
pipeline — generated help content is scored against retrieved
source passages before being marked authoritative. attune-rag's
faithfulness primitives aren't just instrumentation; they're the
contract the family's content-quality story is built on.
pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...
# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json
# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json
# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.jsonThe judge implementation lives at
attune_rag.eval.faithfulness.FaithfulnessJudge. Note: attune_rag.eval.*
is currently INTERNAL and may move — the attune-rag-benchmark --with-faithfulness CLI is the stable contract.
For the methodology behind the 0.9686 threshold, the v1/v2 ground-truth
calibration runs, and the extended-thinking-vs-default decision record, see
docs/rag/faithfulness-thinking-calibration.md.
Keyword retrieval + optional Claude reranker currently meet
the locked P@1 ≥ 0.95, R@3 = 1.00 thresholds against the
attune-help golden set. The remaining hard queries
(3 of 28, currently xpass-gated under [no-embeddings])
have zero token overlap against their target doc (e.g.
"vulnerability scan" → tool-security-audit.md). Closing
that gap needs vector search.
The plan is to ship attune-rag[embeddings] using
fastembed for local,
CPU-only embeddings — no new network dependency, no API key
required at retrieval time. Keyword retrieval stays the default;
embeddings layer in opt-in, same shape as QueryExpander and
LLMReranker. With 0.2.0 cut, embeddings are a Phase 5
candidate — see
docs/specs/ROADMAP-v1.md.
See CHANGELOG.md for the decision record and remaining-gap analysis.
When using the Claude provider, run_and_generate automatically enables
Anthropic prompt caching
on the stable RAG context prefix (≥ 1 024 chars). This eliminates
repeated token costs on the corpus portion of the prompt when the same
context block is reused across calls.
No configuration needed — the provider handles the cache_control
header automatically.
attune-rag's public surface is documented below and snapshot-tested in tests/unit/test_api_surface.py. Formal SemVer commitments are in effect as of 0.2.0 — see docs/POLICY.md for the deprecation policy. Symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z; the snapshot test catches drift.
Top-level (from attune_rag import ...):
- Pipeline —
RagPipeline,RagResult - Corpus —
CorpusProtocol,RetrievalEntry,DirectoryCorpus,AttuneHelpCorpus - Retrieval —
KeywordRetriever,RetrievalHit,RetrieverProtocol - Provenance —
CitationRecord,CitedSource,ClaimCitation,format_citations_markdown,format_claim_citations_markdown - Prompting —
build_augmented_prompt,PROMPT_VARIANTS - Hybrid retrieval —
QueryExpander,LLMReranker
PUBLIC submodules (importable by qualified path):
attune_rag.corpus— exposesAliasInfo,DuplicateAliasError,load_aliases_from_filein addition to the top-level corpus namesattune_rag.corpus.attune_help—AttuneHelpCorpusattune_rag.corpus.help_adapter—HelpCorpusAdapterProtocolattune_rag.providers—LLMProvider,get_provider,list_availableattune_rag.measure_corpus—measure(...)function +MeasureResultdataclass for scoring a corpus against a query set. CLI viapython -m attune_rag.measure_corpus ...or theattune-rag-measureconsole script. Seedocs/USER_CORPUS_GUIDE.md§6 for the worked example.attune_rag.editor— template-editor primitives (lint, schema, rename, autocomplete, references); see "Template editor primitives" above for the symbol listattune_rag.editor.{rename,schema,lint,autocomplete,references}— the individual editor submodules
Console scripts:
attune-rag— CLI entry point (attune_rag.cli:main)attune-rag-measure— quality measurement (attune_rag.measure_corpus:main); CI-suitable via--watermark-r3(non-zero exit on fail)
Anything not listed above is INTERNAL and may change in any release.
The underscore-prefixed editor modules (attune_rag.editor._rename
etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they
re-export the new non-underscore names and emit DeprecationWarning.
They are removed in 0.3.0.
0.2.0 — first SemVer-binding cut. Phase 4 of the v1.0 roadmap
landed cleanly: quality baselines (P@1 ≥ 0.95, R@3 = 1.00, mean
faithfulness ≥ 0.9686) hold; per-hot-path perf thresholds re-locked
under the V2 multi-run methodology (5 × 20 measurements); attune-gui
downstream blocking gate stayed green throughout. From 0.2.0 forward,
docs/POLICY.md §2 binds — symbols PUBLIC in 0.2.x
stay PUBLIC through every 0.2.z.
We hit our Phase 4 goals ~3 weeks ahead of the nominal calendar and
opted to ship early via the freeze-override mechanism rather than let
the cadence clock run out — getting the user-facing additions
(attune-rag-measure console script + attune_rag.measure_corpus
module for benchmarking your own corpus quality;
load_aliases_from_file() for file-based alias customization) into
your hands sooner. Override rationale + per-PR receipts at
docs/specs/api-v0.2.0-cut/.
Classifier stays at 3 - Alpha — the Production/Stable flip is a
Phase 5 deliverable.
Part of the attune ecosystem (attune-ai, attune-help, attune-author, attune-gui).
Apache 2.0. See LICENSE.