Skip to content

perf(warm): warm the full search path at startup, not just the model#61

Merged
Neverdecel merged 1 commit into
masterfrom
claude/warm-search-path
Jun 19, 2026
Merged

perf(warm): warm the full search path at startup, not just the model#61
Neverdecel merged 1 commit into
masterfrom
claude/warm-search-path

Conversation

@Neverdecel

Copy link
Copy Markdown
Owner

Context

The embed-vs-store badge from #60 paid off immediately. On the public demo a query showed embed 26 ms vs store 363 ms over 548 chunks. 548 chunks brute-forced is sub-millisecond, so 363 ms isn't compute — it's a cold index load.

Root cause: warm() ran status() + embed_query(), so only the model was warmed. The store's vector/FTS/scalar indexes and LanceDB's query path load lazily on the first real query, and that entire cold-load lands in store_ms. warm()'s own docstring already promised the first query should "reflect warm performance" — the store half just wasn't being warmed.

Change

Run one representative search() in warm() so the retrieval indexes are resident before the first user query. Guarded and best-effort so warm-up can never block startup; a no-op on an empty index.

Verification

Reproduced locally (~550 chunks, fresh process):

total embed store (dense / lex / hydrate)
cold first query 35.4 ms 0.2 35.0 (14.4 / 9.4 / 11.2)
after warm search 14.7 ms 0.1 14.5 (5.2 / 4.9 / 4.4)

The cold penalty is spread evenly across dense/lexical/hydrate — the signature of cold index loading, not one slow op. On the demo's slower disk this cold-load is far larger (the observed ~360 ms). Tests: retrieval / store / surfaces / webui all pass; lint + format clean.

🤖 Generated with Claude Code


Generated by Claude Code

warm() ran status() + embed_query(), so the store's vector/FTS/scalar indexes
and LanceDB's query path stayed cold until the first real query — which then
paid the entire index-load cost. With the new badge breakdown this is visible
as a large store_ms (e.g. embed 26ms vs store 363ms over 548 chunks on the
demo) while embed is already warm.

Run one representative search() in warm() so the retrieval indexes are resident
before the first user query. Measured locally (~550 chunks): first-query store
drops from ~35ms to ~14ms; on a slower deployed host the cold-load is far larger.
Best-effort and guarded so warm-up can never block startup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Neverdecel Neverdecel merged commit 77c0ade into master Jun 19, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants