fix(evaluation): map local sequential ids to global conv ids in BM25/embedding index filenames (#127)#242
Open
Fearvox wants to merge 1 commit into
Conversation
…nd-AI#127) The eval index-building stage (stage 2) wrote BM25/Embedding index files keyed by a local range(num_conv) counter (conv_0..conv_{N-1}), while stage 1 (memcell writer) and the search stage (reader) both key off the GLOBAL conversation id extracted from conversation_id via _extract_conv_index (e.g. "locomo_5" -> "5"). On a sliced run (--from-conv 5 --to-conv 10) the sliced conversations keep their global ids, so stage 2 looked for memcell_list_conv_0..4.json (absent -> skipped), built nothing, and the search stage then failed to find bm25_index_conv_5..9.pkl, yielding empty retrieval. Reported in EverMind-AI#127. Fix: - stage2_index_building: build_bm25_index / build_emb_index iterate the global conv ids actually in play. The adapter passes the exact slice via a new conv_ids arg; a discover_conv_ids() helper derives ids from the memcell_list_conv_*.json filenames as a fallback for the CLI entry point. - evermemos_adapter: _check_missing_indexes now probes by global conv ids (not range(num_conv)), so the skip-logic no longer always reports a slice's indexes as missing and rebuilds them under the wrong filenames. Adds an offline regression test (no docker / LLM / network) that reproduces the slice and asserts the built index filenames are exactly the ones the search stage reads. Prior art on the same mismatch by Jah-yee (EverMind-AI#136, broad 21-file fix; EverMind-AI#115, tangential stage3 rename); this change keeps the fix surgical. Co-authored-by: Jah-yee <166608075+Jah-yee@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When the evaluation pipeline is run on a slice (
--from-conv 5 --to-conv 10), retrieval silently returned empty results. Root cause is a three-way filename-keying mismatch in the EverCore eval adapter:conversation_idvia_extract_conv_index("locomo_5" -> "5"), writingmemcell_list_conv_5..9.json.bm25_index_conv_5..9.pkl/embedding_index_conv_5..9.pkl.range(config.num_conv)counter (0..N-1), so for a 5-conversation slice it looked formemcell_list_conv_0..4.json(absent →"File not found, skipping"), built nothing, and wrote indexes under the wrong names.Net effect: a sliced run builds zero usable indexes and every search returns empty. This is exactly the symptom reported in #127 (
[BUG] evaluation: BM25/Embedding index filenames mismatch when running with --from-conv/--to-conv, causing empty retrieval).This PR aligns stage 2 (and the adapter's skip-logic probe) to the same global-id key:
stage2_index_building.build_bm25_index/build_emb_indexnow iterate the global conv ids actually in play. The adapter passes the exact slice via a newconv_idsargument; adiscover_conv_ids()helper derives ids from thememcell_list_conv_*.jsonfilenames as a fallback for the standalone CLI entry point (main()).evermemos_adapter._check_missing_indexesnow probes by global conv ids instead ofrange(num_conv), so the smart-skip logic no longer always reports a slice's indexes as missing and rebuilds them under the wrong filenames.Why
range(num_conv)is a silent correctness bug for any non-full run: the slice's conversations keep their global ids, but the builder assumed a contiguous0..N-1local space. The full dataset (--from-conv 0) happened to work only because local and global ids coincide there, which is why it went unnoticed.Tests
Adds
methods/EverCore/tests/test_evaluation_index_filename_conv_ids.py, which reproduces the slice fully offline — no docker, no LLM, no network (BM25 path only; the bug is a filename-mapping bug):test_bm25_index_filenames_match_global_conv_ids— writesmemcell_list_conv_5/6/7.json(a slice) with localnum_conv=3, builds indexes, and asserts the produced files arebm25_index_conv_5/6/7.pkl(what search reads) and that the buggyconv_0/1/2.pklare absent.test_built_filename_matches_search_lookup_key— asserts the filename the search stage computes via_extract_conv_indexexists on disk after the build.test_check_missing_indexes_uses_global_conv_ids— asserts the skip-logic probe keyed by global ids reports nothing missing, while the old local-range keying would wrongly report["0","1","2"].test_discover_conv_ids_*— the disk-discovery helper returns the global ids and sorts them numerically (2before10).test_real_adapter_check_missing_indexes_keys_by_global_id— exercises the realEverCoreAdaptermethod when importable.Run locally:
cd methods/EverCore uv sync PYTHONPATH=src pytest tests/test_evaluation_index_filename_conv_ids.py -vResult in this build: 6 passed, 1 skipped.
Before / after (mechanism)
Scope
Surgical: two source files (
evaluation/src/adapters/evermemos/stage2_index_building.py,evaluation/src/adapters/evermemos_adapter.py) plus one new regression test. No schema, no repository-layer, no docker-compose, no docs churn — deliberately narrower than the broad 21-file prior-art PR (#136).Closes #127.
Credit
Prior community work diagnosed the same
--from-conv/--to-convfilename mismatch. Co-author preserved on the commit:fix(evaluation): resolve BM25/Embedding index filename mismatch when using --from-conv/--to-conv(same root cause, broad 21-file scope) and fix: rename stage3_memory_retrivel.py to stage3_memory_retrieval #115fix: rename stage3_memory_retrivel.py to stage3_memory_retrieval(tangential stage-3 rename).Co-authored-by: Jah-yee 166608075+Jah-yee@users.noreply.github.com
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
🤖 Generated with Claude Code