The query term matching component analyzes how well each node matches specific terms and entities mentioned in the query. This goes beyond semantic similarity to check for explicit term matches across multiple node properties.
First, the system analyzes the query to extract:
query = "Who was the California Attorney General in 2020?"
# Extracted information:
{
'query_year': 2020,
'important_terms': ['california', 'attorney', 'general'],
'proper_nouns': ['California', 'Attorney', 'General'],
'original_query': 'who was the california attorney general in 2020?'
}Process:
- Extract year using regex
\b(19|20)\d{2}\b - Remove stop words (who, what, when, the, a, is, was, etc.)
- Keep important terms (length > 2, not stop words)
- Extract proper nouns (capitalized words)
Then, for each node, the system checks for matches across 6 different properties:
| Property | Weight | What It Checks |
|---|---|---|
| Node Name | 3.0 | Exact or partial match of query terms in the node's name |
| Proper Nouns | 2.0 | Capitalized terms from query found in node name |
| Summary | 1.5 | Query terms found in node's content summary |
| Connected Entities | 1.0 | Query terms in names of connected nodes |
| Attributes | 1.0 | Query terms in node's attribute values |
| Labels | 0.5 | Query terms in node's type labels |
Total Max Score: 9.0 (normalized to 0-1 range)
Query: "Who was the California Attorney General in 2020?"
Node: "Kamala Harris"
Scoring:
- Node name: "Kamala Harris" not in query → 0/3.0
- Proper nouns: No match → 0/2.0
- Summary: "Attorney General of California..." → 3/3 terms → 1.5/1.5 ✓
- Connected: ["California", "Attorney General", "San Francisco"] → 2/3 terms → 0.67/1.0
- Attributes: {position: "Attorney General", state: "California"} → 2/3 terms → 0.67/1.0
- Labels: ["Person", "Official"] → 0/3 terms → 0/0.5
Raw Score: 2.84 / 9.0 = 0.316
Query: "What position did Gavin Newsom hold?"
Node: "Gavin Newsom"
Scoring:
- Node name: "Gavin Newsom" in query → 3.0/3.0 ✓✓✓
- Proper nouns: "Gavin Newsom" matches → 2.0/2.0 ✓✓
- Summary: "Governor of California, position..." → 1/2 terms → 0.75/1.5
- Connected: ["Governor", "California", "Lieutenant"] → 1/2 terms → 0.5/1.0
- Attributes: {position: "Governor"} → 1/2 terms → 0.5/1.0
- Labels: ["Person"] → 0/2 terms → 0/0.5
Raw Score: 6.75 / 9.0 = 0.750
The query match score is combined with other factors:
final_score = 0.35 × semantic_score + # RRF hybrid search
0.15 × connection_score + # Graph connectivity
0.25 × temporal_score + # Time period match
0.25 × query_match_score # Term matching (THIS)
Query: "Who was the California Attorney General in 2020?"
Semantic search alone might return:
- "California" (high semantic match, but not a person)
- "Attorney General" (concept, not a specific person)
- "Kamala Harris" (person, but wrong time period)
Multi-factor search returns:
- "Xavier Becerra" (moderate semantic + perfect temporal + high query match)
- Summary mentions "Attorney General" and "California"
- Attributes: term_start=2017, term_end=2021 (includes 2020)
- Connected to: California, Attorney General, etc.
Use when:
- Query has specific proper nouns (names, places)
- Looking for exact entity matches
- Query has technical terms or specific roles
Use when:
- Balanced queries with some specific terms
- Want to combine term matching with semantic understanding
- General purpose queries
Use when:
- Very broad queries
- Conceptual questions
- Prioritizing semantic similarity over exact matches
Query: "Where did Kamala Harris work?"
Recommended: query_match_weight=0.3
Why: "Kamala Harris" is a proper noun that should match exactly
Query: "Who was the California Attorney General in 2020?"
Recommended: query_match_weight=0.25
Why: Multiple specific terms (California, Attorney General) plus temporal
Query: "What government positions exist in California?"
Recommended: query_match_weight=0.15
Why: Broad conceptual query, semantic understanding more important
- Precision: Finds entities that explicitly match query terms
- Proper Noun Handling: Correctly identifies named entities
- Multi-Property Search: Checks name, summary, attributes, connections
- Weighted Importance: Node name matches count more than label matches
- Complementary: Works alongside semantic search, not replacing it
- Requires exact or partial string matches (case-insensitive)
- May miss synonyms (e.g., "AG" vs "Attorney General")
- Stop words are removed (may miss some context)
- English-language focused
Potential improvements:
- Synonym expansion (AG → Attorney General)
- Fuzzy matching for misspellings
- Multi-language support
- Entity type awareness (Person vs Place vs Organization)
- Phrase matching (multi-word terms)