Skip to content

Added hybrid search: BM25 + semantic + RRF fusion#28

Merged
markgewhite merged 4 commits into
mainfrom
feature/7-hybrid-search-bm25-rrf
Apr 11, 2026
Merged

Added hybrid search: BM25 + semantic + RRF fusion#28
markgewhite merged 4 commits into
mainfrom
feature/7-hybrid-search-bm25-rrf

Conversation

@markgewhite
Copy link
Copy Markdown
Owner

Summary

  • Added BM25Index module using rank_bm25 for keyword search
  • Added custom reciprocal_rank_fusion() (~15 lines) to merge ranked result lists with deduplication
  • Added HybridRetriever orchestrating BM25 (top 20) + semantic (top 20) → RRF → top 30
  • BM25 index rebuilds from ChromaDB on startup with logged timing
  • Replaced semantic-only search in on_message with hybrid retrieval

Closes #7

Test plan

  • BM25Index: build, search, relevance, empty index, metadata (5 tests)
  • RRF fusion: merge, deduplicate, top_n, single list, empty (5 tests)
  • HybridRetriever: combines both retrievers, result limits (2 tests)
  • Manual: ask questions, verify improved retrieval with keyword matches

🤖 Generated with Claude Code

markgewhite and others added 4 commits April 11, 2026 22:41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…usion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@markgewhite markgewhite merged commit 8afdefd into main Apr 11, 2026
1 check passed
@markgewhite markgewhite deleted the feature/7-hybrid-search-bm25-rrf branch April 11, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hybrid search: BM25 + semantic + RRF fusion

1 participant