This poject contains implementations of two key document ranking models used in Information Retrieval (IR):
- BM25/Okapi: A probabilistic ranking function based on term frequency, document length normalization, and inverse document frequency.
- Binary Independence Model (BIM): A probabilistic model assuming independence between query terms and used to score and rank documents based on binary term presence.
- Tokenization of documents
- BM25 scoring and ranking (configurable parameters:
k=1.5,b=0.75) - BIM scoring with support for a set of query terms
- Outputs ranked list of documents with corresponding relevance scores
A sample set of documents and a query are used to test the models:
- Query:
"information retrieval models" - Documents: D1 to D8 (provided in the code)