perf(warm): warm the full search path at startup, not just the model#61
Merged
Conversation
warm() ran status() + embed_query(), so the store's vector/FTS/scalar indexes and LanceDB's query path stayed cold until the first real query — which then paid the entire index-load cost. With the new badge breakdown this is visible as a large store_ms (e.g. embed 26ms vs store 363ms over 548 chunks on the demo) while embed is already warm. Run one representative search() in warm() so the retrieval indexes are resident before the first user query. Measured locally (~550 chunks): first-query store drops from ~35ms to ~14ms; on a slower deployed host the cold-load is far larger. Best-effort and guarded so warm-up can never block startup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y1DfHPqxHppXF6zEYgFKi3
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The embed-vs-store badge from #60 paid off immediately. On the public demo a query showed
embed 26 msvsstore 363 msover 548 chunks. 548 chunks brute-forced is sub-millisecond, so 363 ms isn't compute — it's a cold index load.Root cause:
warm()ranstatus()+embed_query(), so only the model was warmed. The store's vector/FTS/scalar indexes and LanceDB's query path load lazily on the first real query, and that entire cold-load lands instore_ms.warm()'s own docstring already promised the first query should "reflect warm performance" — the store half just wasn't being warmed.Change
Run one representative
search()inwarm()so the retrieval indexes are resident before the first user query. Guarded and best-effort so warm-up can never block startup; a no-op on an empty index.Verification
Reproduced locally (~550 chunks, fresh process):
The cold penalty is spread evenly across dense/lexical/hydrate — the signature of cold index loading, not one slow op. On the demo's slower disk this cold-load is far larger (the observed ~360 ms). Tests: retrieval / store / surfaces / webui all pass; lint + format clean.
🤖 Generated with Claude Code
Generated by Claude Code