Skip to content

anonymousgirl123/intent-first-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intent-First Search Engine

Production-grade intent-first search using BM25 + query-time boosts.

ML suggests. Rules protect. Users decide.

Intent-First Search Engine

A production-oriented reference implementation of a modern search system that enforces user intent first using BM25 + query-time lexical boosts, before introducing semantic or ML-based ranking.

ML suggests. Rules protect. Users decide.

This repository demonstrates how real search systems are built in production:
deterministic first, intelligent later.


🎯 Core Philosophy

Search failures rarely come from bad algorithms — they come from broken boundaries.

This system enforces the following invariant:

Intent is enforced deterministically before any ML is allowed to influence ranking.


🧠 Architectural Overview

Search is a funnel, not a brain.


User Query
↓
Intent Control (BM25 + Query-Time Boosts)
↓
Candidate Retrieval (Elasticsearch)
---

🏗 High-Level Design (HLD)

Responsibilities by Layer

  • API Layer → input validation, orchestration
  • Query Understanding → intent extraction & boosts
  • Search Orchestrator → controls flow & guarantees
  • Elasticsearch → retrieval + BM25 scoring
image

🔁 Runtime Flow (Sequence Diagram)

image

🔍 Low-Level Design (LLD)

1️⃣ Query Understanding Service

Responsible for:

Tokenization

Stop-word removal

Intent-based boost assignment (category > material > attribute)

image

2️⃣ Search Orchestrator

Responsible for:

Enforcing architecture rules

Building Elasticsearch queries

Preventing ML/semantic override

image

3️⃣ Elasticsearch (Logical View)

Elasticsearch is not the brain.

It is a retrieval engine only.

image

✅ What This Repo Demonstrates

Intent-first search design

BM25 as the authority

Query-time lexical boosting

Clean architectural boundaries

Explainable, deterministic ranking

Production-real patterns (not toy ML)

🧩 Extension Path (Future Work)

This architecture safely supports:

Hybrid BM25 + vector search

Offline Learning-to-Rank

Feature flags & fallbacks

Multi-region deployment

Caching & performance tuning

All without breaking correctness.

About

A reference implementation of modern search architecture that prioritizes deterministic intent enforcement (BM25 + boosts), bounded semantic expansion, and explainable ranking. Built to reflect real production search systems rather than end-to-end black-box ML.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages