attune-rag

Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.

No LLM SDK at install time. All provider deps are optional extras. Two required runtime deps: structlog, jinja2.
Pluggable corpus. Use attune-help (the default), any markdown directory, or your own CorpusProtocol.
Returns a prompt string + citation records by default — pipeline.run() never opens a network connection. You call your own LLM however you like. Optional provider adapters ship convenience wrappers.
Optional hybrid retrieval. QueryExpander and LLMReranker layer Claude Haiku on top of keyword retrieval to improve recall and precision — both opt-in, both fail-safe.

Why attune-rag

Most RAG libraries ship features. attune-rag ships measured quality numbers and gates merges against them. The CI badge isn't "tests pass" — it's P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686 (locked at docs/specs/release-quality-baseline/baseline-1.md) plus per-axis CPU + wall-clock perf thresholds (locked at docs/specs/downstream-validation/perf-baseline.md).

A PR that drops mean_faithfulness below 0.9686 fails CI automatically. Same for any latency hot-path regressing past mean + 2σ. That's the differentiator.

vs LangChain / LlamaIndex

	attune-rag	LangChain	LlamaIndex
Required runtime deps	2	many (transitively, ~30+)	many (~25+)
LLM SDK at install	none	bundled	bundled
Published quality regression thresholds	yes (P@1, R@3, faithfulness)	no	no
Published perf thresholds (wall + CPU)	yes	no	no
Citation primitives built-in	yes	add-on	add-on
"Get a string back, call your own LLM"	default	possible w/ effort	possible w/ effort

LangChain and LlamaIndex are fantastic frameworks if you want batteries-included orchestration. attune-rag is the alternative when you want a RAG component you can drop into an existing app without buying into a framework — and want the quality bar quantified, not implied.

Beyond drop-in retrieval, attune-rag is the grounding foundation for the attune-* family's content-quality discipline. The attune-author polish/fact-check pipeline uses attune-rag's retrieval + faithfulness primitives to verify generated help content is grounded in source material before it's marked authoritative — the same mean_faithfulness ≥ 0.9686 discipline that gates this library's own benchmarks, extended to the authoring loop.

What attune-rag is not

Honest exclusions, so you can self-disqualify if you need any of these:

Not an agent framework. No multi-step chains, no tool-use orchestration, no agent loops.
Not a document-parsing toolkit. Bring your markdown already-parsed; use unstructured.io or similar upstream.
Not a vector DB integration. Keyword retrieval is the default; you wire your own vector store if you need one (an EmbeddingRetriever is on the post-freeze roadmap — see below).
Not a one-line-install batteries-included framework. That's LangChain / LlamaIndex. attune-rag is for the case where that's too much.

Install

pip install attune-rag                     # core only
pip install 'attune-rag[attune-help]'      # + bundled help corpus
pip install 'attune-rag[claude]'           # + Claude adapter
pip install 'attune-rag[gemini]'           # + Gemini adapter
pip install 'attune-rag[all]'              # everything

Quick start — Claude

pip install 'attune-rag[attune-help,claude]'

import asyncio
from attune_rag import RagPipeline

async def main():
    pipeline = RagPipeline()  # defaults to AttuneHelpCorpus
    response, result = await pipeline.run_and_generate(
        "How do I run a security audit with attune?",
        provider="claude",
    )
    print(response)
    print("\nSources:", [h.entry.path for h in result.citation.hits])

asyncio.run(main())

Quick start — Gemini

pip install 'attune-rag[attune-help,gemini]'

response, result = await pipeline.run_and_generate(
    "...", provider="gemini", model="gemini-1.5-pro",
)

Quick start — custom corpus, any LLM

from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus

pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")

# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.

📖 Building a quality corpus. See docs/USER_CORPUS_GUIDE.md for the corpus-authoring discipline that produced the bundled attune-help corpus's 100% / 100% baseline + 100% paraphrased R@3: frontmatter aliases, multi-token intent, the MIN_ALIAS_OVERLAP knob, stemmer traps, the override file pattern, and the strict-dominance measurement loop. The guide is the v0 forerunner of the v1.0.0 framework framing (user-corpus-onboarding spec).

Hybrid retrieval (optional)

QueryExpander and LLMReranker require the [claude] extra and an ANTHROPIC_API_KEY. Both are opt-in and fail-safe — any API error falls back to keyword-only order automatically.

from attune_rag import RagPipeline, LLMReranker, QueryExpander

# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())

# Expander + reranker (max coverage):
pipeline = RagPipeline(
    expander=QueryExpander(),
    reranker=LLMReranker(),
)

Template editor primitives (`attune_rag.editor`)

Headless toolkit for tools that need to validate, lint, and refactor a template corpus — used by the attune-gui template editor and the attune-author edit CLI, but works standalone with any CorpusProtocol.

API	What it does
`load_schema()`	Loads `template_schema.json` (the v1 frontmatter contract: required `type` enum + `name`; optional `tags`, `aliases`, `summary`, `source`, `hash`; `additionalProperties: true`).
`parse_frontmatter(text)` / `validate_frontmatter(data)`	Split a template into frontmatter + body and report typed `FrontmatterIssue`s — used by linters and editors.
`lint_template(text, rel_path, corpus)`	Returns `Diagnostic[]` for schema violations, broken `[[alias]]` references, and depth-marker sequence errors. 1-indexed line/col ranges.
`autocomplete_tags(corpus, prefix, limit)` / `autocomplete_aliases(corpus, prefix, limit)`	Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates.
`find_references(corpus, name, kind)`	Locate every alias/tag/path occurrence across body, frontmatter, and `cross_links.json`.
`plan_rename(corpus, old, new, kind)`	Build a `RenamePlan` (one `FileEdit` per affected file with unified-diff hunks) for `kind="alias"` or `"tag"`. Raises `RenameCollisionError` on existing alias targets.
`apply_rename(corpus, plan)`	Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths.

Schema, lint, and rename are pure functions over CorpusProtocol — no I/O, no global state. All three pieces are tested as a unit and used live by the attune-gui editor's /api/corpus/<id>/lint, /autocomplete, and /refactor/rename/{preview,apply} routes.

from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename

corpus = DirectoryCorpus(Path("./templates")).load()

# Validate a template before saving
diagnostics = lint_template(
    text=Path("./templates/concepts/foo.md").read_text(),
    rel_path="concepts/foo.md",
    corpus=corpus,
)

# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)

Dashboard

attune-rag dashboard show    # live terminal dashboard
attune-rag dashboard render --out report.html  # HTML snapshot

Quality baselines

attune-rag locks two baselines, both gated by CI. Thresholds are empirically derived (mean ± 2σ) from back-to-back benchmark runs on an unchanged HEAD — grounded, not guessed.

Retrieval + faithfulness

Metric	Threshold (current)	Source
`precision_at_1`	≥ 0.95	retrieval, deterministic
`recall_at_3`	= 1.00	retrieval, deterministic
`mean_faithfulness`	≥ 0.9686	Claude judge, σ ≈ 0.005

Gated by .github/workflows/benchmark.yml. Faithfulness gating engages when the PR touches retrieval, reranker, expander, pipeline, prompts, or eval paths, or when the PR title contains [full-bench]. Methodology + raw numbers in docs/specs/release-quality-baseline/.

Per-hot-path latency

Locked dual-axis (wall-clock + CPU-time) thresholds on the four benchmarks. CPU-time is the gating axis (deterministic); wall-clock is advisory.

Numbers measured under the V2 multi-run methodology (5 invocations × 20 runs = 100 measurements per metric) on the locked-baseline runner (Linux ubuntu-latest, CPython 3.11.15). Inter-run and intra-run variance are tracked separately; thresholds are mean + 2σ × inter_run_stdev. Full 8-row dual-axis table + hardware fingerprint + per-metric noise profile: docs/specs/downstream-validation/perf-baseline.md.

Why two threshold styles in the locked table:

keyword_retriever_retrieve has a wider CPU band because measured intra-run variance reflects cold-cache effects on the first few iterations — empirically derived, not tuned for tightness.
llm_reranker_rerank is wall-clock-only because Anthropic network variance dominates the CPU axis; the gate is set generously.

Gated by .github/workflows/perf.yml per-PR (blocking on the CPU axis as of W3.1).

Why this is the differentiator

Most RAG libraries A/B-test internally and ship the result. attune-rag publishes the thresholds, gates merges against them, and re-measures whenever the corpus, judge prompt, or hardware changes. The receipts are checked in.

Bundled `.help/` corpus

The repo ships a polished .help/ corpus that documents attune-rag's own surface — 143 templates across 13 features × 11 kinds (concept, task, reference, quickstart, faq, error, warning, tip, note, comparison, troubleshooting). Generated by attune-author with strict fact-check; queryable via AttuneHelpCorpus or as the bundled default for RagPipeline(). See .help/features.yaml for the feature map and .help/templates/ for the content.

The 13 features: pipeline, retrieval, corpus, prompts, provenance, providers, eval, benchmark, cli, editor, dashboard, expander, reranker.

What faithfulness measures

Faithfulness scores how well an answer is grounded in the retrieved passages — 1.0 means every claim in the answer is supported by a cited source; lower scores mean some claims have no support in the context. It catches hallucination in a way that precision_at_k and recall_at_k can't: those only measure whether the right documents were retrieved, not whether the generated answer actually used them.

attune-rag uses Claude as the judge via Anthropic's tool-use API to produce a structured score in [0.0, 1.0] for each (query, answer, retrieved_context) triple. The reported metric is the mean over the golden query set. Aggregate σ ≈ 0.005 over 40 queries even though per-query judge non-determinism can swing 40+ percentage points on individual queries — averaging absorbs the noise.

The same discipline powers attune-author's polish/fact-check pipeline — generated help content is scored against retrieved source passages before being marked authoritative. attune-rag's faithfulness primitives aren't just instrumentation; they're the contract the family's content-quality story is built on.

Run faithfulness manually

pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...

# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json

# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json

# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json

The judge implementation lives at attune_rag.eval.faithfulness.FaithfulnessJudge. Note: attune_rag.eval.* is currently INTERNAL and may move — the attune-rag-benchmark --with-faithfulness CLI is the stable contract.

For the methodology behind the 0.9686 threshold, the v1/v2 ground-truth calibration runs, and the extended-thinking-vs-default decision record, see docs/rag/faithfulness-thinking-calibration.md.

Roadmap — embeddings (post-freeze 0.2.0+)

Keyword retrieval + optional Claude reranker currently meet the locked P@1 ≥ 0.95, R@3 = 1.00 thresholds against the attune-help golden set. The remaining hard queries (3 of 28, currently xpass-gated under [no-embeddings]) have zero token overlap against their target doc (e.g. "vulnerability scan" → tool-security-audit.md). Closing that gap needs vector search.

The plan is to ship attune-rag[embeddings] using fastembed for local, CPU-only embeddings — no new network dependency, no API key required at retrieval time. Keyword retrieval stays the default; embeddings layer in opt-in, same shape as QueryExpander and LLMReranker. With 0.2.0 cut, embeddings are a Phase 5 candidate — see docs/specs/ROADMAP-v1.md.

See CHANGELOG.md for the decision record and remaining-gap analysis.

Prompt caching (Claude only)

When using the Claude provider, run_and_generate automatically enables Anthropic prompt caching on the stable RAG context prefix (≥ 1 024 chars). This eliminates repeated token costs on the corpus portion of the prompt when the same context block is reused across calls.

No configuration needed — the provider handles the cache_control header automatically.

Public API

attune-rag's public surface is documented below and snapshot-tested in tests/unit/test_api_surface.py. Formal SemVer commitments are in effect as of 0.2.0 — see docs/POLICY.md for the deprecation policy. Symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z; the snapshot test catches drift.

Top-level (from attune_rag import ...):

Pipeline — RagPipeline, RagResult
Corpus — CorpusProtocol, RetrievalEntry, DirectoryCorpus, AttuneHelpCorpus
Retrieval — KeywordRetriever, RetrievalHit, RetrieverProtocol
Provenance — CitationRecord, CitedSource, ClaimCitation, format_citations_markdown, format_claim_citations_markdown
Prompting — build_augmented_prompt, PROMPT_VARIANTS
Hybrid retrieval — QueryExpander, LLMReranker

PUBLIC submodules (importable by qualified path):

attune_rag.corpus — exposes AliasInfo, DuplicateAliasError, load_aliases_from_file in addition to the top-level corpus names
attune_rag.corpus.attune_help — AttuneHelpCorpus
attune_rag.corpus.help_adapter — HelpCorpusAdapter Protocol
attune_rag.providers — LLMProvider, get_provider, list_available
attune_rag.measure_corpus — measure(...) function + MeasureResult dataclass for scoring a corpus against a query set. CLI via python -m attune_rag.measure_corpus ... or the attune-rag-measure console script. See docs/USER_CORPUS_GUIDE.md §6 for the worked example.
attune_rag.editor — template-editor primitives (lint, schema, rename, autocomplete, references); see "Template editor primitives" above for the symbol list
attune_rag.editor.{rename,schema,lint,autocomplete,references} — the individual editor submodules

Console scripts:

attune-rag — CLI entry point (attune_rag.cli:main)
attune-rag-measure — quality measurement (attune_rag.measure_corpus:main); CI-suitable via --watermark-r3 (non-zero exit on fail)

Anything not listed above is INTERNAL and may change in any release. The underscore-prefixed editor modules (attune_rag.editor._rename etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they re-export the new non-underscore names and emit DeprecationWarning. They are removed in 0.3.0.

Status

0.2.0 — first SemVer-binding cut. Phase 4 of the v1.0 roadmap landed cleanly: quality baselines (P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686) hold; per-hot-path perf thresholds re-locked under the V2 multi-run methodology (5 × 20 measurements); attune-gui downstream blocking gate stayed green throughout. From 0.2.0 forward, docs/POLICY.md §2 binds — symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z.

We hit our Phase 4 goals ~3 weeks ahead of the nominal calendar and opted to ship early via the freeze-override mechanism rather than let the cadence clock run out — getting the user-facing additions (attune-rag-measure console script + attune_rag.measure_corpus module for benchmarking your own corpus quality; load_aliases_from_file() for file-based alias customization) into your hands sooner. Override rationale + per-PR receipts at docs/specs/api-v0.2.0-cut/.

Classifier stays at 3 - Alpha — the Production/Stable flip is a Phase 5 deliverable.

Part of the attune ecosystem (attune-ai, attune-help, attune-author, attune-gui).

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.claude		.claude
.github/workflows		.github/workflows
.help		.help
artifacts		artifacts
docs		docs
scripts		scripts
site		site
src/attune_rag		src/attune_rag
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

attune-rag

Why attune-rag

vs LangChain / LlamaIndex

What attune-rag is not

Install

Quick start — Claude

Quick start — Gemini

Quick start — custom corpus, any LLM

Hybrid retrieval (optional)

Template editor primitives (`attune_rag.editor`)

Dashboard

Quality baselines

Retrieval + faithfulness

Per-hot-path latency

Why this is the differentiator

Bundled `.help/` corpus

What faithfulness measures

Run faithfulness manually

Roadmap — embeddings (post-freeze 0.2.0+)

Prompt caching (Claude only)

Public API

Status

License

About

Uh oh!

Releases 21

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

attune-rag

Why attune-rag

vs LangChain / LlamaIndex

What attune-rag is not

Install

Quick start — Claude

Quick start — Gemini

Quick start — custom corpus, any LLM

Hybrid retrieval (optional)

Template editor primitives (attune_rag.editor)

Dashboard

Quality baselines

Retrieval + faithfulness

Per-hot-path latency

Why this is the differentiator

Bundled .help/ corpus

What faithfulness measures

Run faithfulness manually

Roadmap — embeddings (post-freeze 0.2.0+)

Prompt caching (Claude only)

Public API

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Template editor primitives (`attune_rag.editor`)

Bundled `.help/` corpus

Packages