Query-type-aware adaptive fusion weighting (validated symbol-level lever)#39
Merged
Merged
Conversation
Implements the symbol-level finding that fixed 1:1 dense/BM25 fusion is a compromise: on natural-language queries weak BM25 drags a strong dense ranking down via RRF. A cheap local heuristic (looks_like_identifier) routes the fusion weights by query type. - coderag/retrieval/query_type.py: looks_like_identifier() + fusion_weights(). - config: adaptive_fusion (off by default) + nl_/code_ weight pairs (+ env). - HybridSearcher uses fusion_weights(); compare_modes gains an `adaptive` row (and isolates fixed modes from adaptive); `coderag eval --adaptive`, bench `--adaptive`; status() reports adaptive_fusion. - coderag/eval/datasets/coderag_self_identifiers.jsonl: 22 identifier-query cases (derived from the symbol set) to validate the code-side routing. Validated (bge-small, symbol level) vs fixed 1:1 hybrid — a Pareto improvement: NL queries: hybrid MRR 0.604 -> adaptive 0.674 (+0.070, R@1 +20%) identifier queries: hybrid MRR 0.685 -> adaptive 0.685 (unchanged) Honest correction baked into the defaults: the "BM25-up for identifiers" half of the hypothesis was refuted by the data (up-weighting BM25 hurt, MRR 0.685->0.613, because short/common identifiers are lexically ambiguous and the embedder already matches them), so the code-side default is neutral, not BM25-leaning. Documented in docs/eval.md and the strategy doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LhTCPRjNmSitYxgSDfttT7
a8fd84d to
ceaa4a4
Compare
This was referenced Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #38's symbol-level eval surfaced the single biggest lever: fixed 1:1 dense/BM25 fusion is a compromise, not an optimum. On natural-language queries the dense retriever is much stronger, and equal-weight RRF drags it down with weak BM25 (
bge-small: dense MRR 0.675 vs hybrid 0.573). This PR implements query-type-aware fusion weighting to fix that.What
coderag/retrieval/query_type.py:looks_like_identifier()(a cheap, conservative local heuristic) +fusion_weights(). NL queries lean dense; identifier-like queries stay neutral.config.adaptive_fusion(off by default) +nl_/code_weight pairs, all env-configurable (CODERAG_ADAPTIVE_FUSION, …).HybridSearcherroutes weights per query;compare_modesgains anadaptiverow and isolates the fixed modes from it;coderag eval --adaptive,bench_embedders.py --adaptive;status()reports it.coderag/eval/datasets/coderag_self_identifiers.jsonl: 22 identifier-query cases (derived from the symbol set) to validate the code-side routing — not just the NL side.Validated (bge-small, symbol level) — a Pareto improvement over fixed 1:1 hybrid
Honest correction baked into the defaults
The literature's "BM25-up for code" intuition was refuted by the data: up-weighting BM25 for identifier queries hurt (MRR 0.685 → 0.613), because short/common identifiers (
search,index) are lexically ambiguous and the embedder already matches them well. So the code-side default is neutral (1:1), not BM25-leaning — and BM25-leaning is left configurable (CODERAG_CODE_LEXICAL_WEIGHT) for larger repos where exact-string recall matters more. This is the harness preventing a plausible-but-wrong default.Off by default pending larger-repo validation; enable with
CODERAG_ADAPTIVE_FUSION=1.Testing
New offline tests for the classifier (identifier vs NL), the weight routing (incl. validated defaults), and end-to-end adaptive search. Full
pytest -m "not integration"green;ruff+mypyclean on new code.Follow-ups
🤖 Generated with Claude Code
Generated by Claude Code