Skip to content

fix(evaluation): map local sequential ids to global conv ids in BM25/embedding index filenames (#127)#242

Open
Fearvox wants to merge 1 commit into
EverMind-AI:mainfrom
Fearvox:fix/issue-127-index-filename-conv-ids
Open

fix(evaluation): map local sequential ids to global conv ids in BM25/embedding index filenames (#127)#242
Fearvox wants to merge 1 commit into
EverMind-AI:mainfrom
Fearvox:fix/issue-127-index-filename-conv-ids

Conversation

@Fearvox
Copy link
Copy Markdown
Collaborator

@Fearvox Fearvox commented Jun 3, 2026

What

When the evaluation pipeline is run on a slice (--from-conv 5 --to-conv 10), retrieval silently returned empty results. Root cause is a three-way filename-keying mismatch in the EverCore eval adapter:

  • Stage 1 (memcell writer) keys files by the global conversation id extracted from conversation_id via _extract_conv_index ("locomo_5" -> "5"), writing memcell_list_conv_5..9.json.
  • Search stage (reader) also keys by the global id, reading bm25_index_conv_5..9.pkl / embedding_index_conv_5..9.pkl.
  • Stage 2 (index builder) iterated a local range(config.num_conv) counter (0..N-1), so for a 5-conversation slice it looked for memcell_list_conv_0..4.json (absent → "File not found, skipping"), built nothing, and wrote indexes under the wrong names.

Net effect: a sliced run builds zero usable indexes and every search returns empty. This is exactly the symptom reported in #127 ([BUG] evaluation: BM25/Embedding index filenames mismatch when running with --from-conv/--to-conv, causing empty retrieval).

This PR aligns stage 2 (and the adapter's skip-logic probe) to the same global-id key:

  • stage2_index_building.build_bm25_index / build_emb_index now iterate the global conv ids actually in play. The adapter passes the exact slice via a new conv_ids argument; a discover_conv_ids() helper derives ids from the memcell_list_conv_*.json filenames as a fallback for the standalone CLI entry point (main()).
  • evermemos_adapter._check_missing_indexes now probes by global conv ids instead of range(num_conv), so the smart-skip logic no longer always reports a slice's indexes as missing and rebuilds them under the wrong filenames.

Why

range(num_conv) is a silent correctness bug for any non-full run: the slice's conversations keep their global ids, but the builder assumed a contiguous 0..N-1 local space. The full dataset (--from-conv 0) happened to work only because local and global ids coincide there, which is why it went unnoticed.

Tests

Adds methods/EverCore/tests/test_evaluation_index_filename_conv_ids.py, which reproduces the slice fully offline — no docker, no LLM, no network (BM25 path only; the bug is a filename-mapping bug):

  • test_bm25_index_filenames_match_global_conv_ids — writes memcell_list_conv_5/6/7.json (a slice) with local num_conv=3, builds indexes, and asserts the produced files are bm25_index_conv_5/6/7.pkl (what search reads) and that the buggy conv_0/1/2.pkl are absent.
  • test_built_filename_matches_search_lookup_key — asserts the filename the search stage computes via _extract_conv_index exists on disk after the build.
  • test_check_missing_indexes_uses_global_conv_ids — asserts the skip-logic probe keyed by global ids reports nothing missing, while the old local-range keying would wrongly report ["0","1","2"].
  • test_discover_conv_ids_* — the disk-discovery helper returns the global ids and sorts them numerically (2 before 10).
  • test_real_adapter_check_missing_indexes_keys_by_global_id — exercises the real EverCoreAdapter method when importable.

Run locally:

cd methods/EverCore
uv sync
PYTHONPATH=src pytest tests/test_evaluation_index_filename_conv_ids.py -v

Result in this build: 6 passed, 1 skipped.

Skip note: the one skipped test imports the full EverCoreAdapter, whose chain currently hits an unrelated, pre-existing import break on mainstage1_memcells_extraction.py imports ScenarioType from memory_layer.profile_manager, which no longer exports it. That break exists on clean origin/main and is not introduced by this PR; the test skips cleanly and auto-enables once that unrelated import is fixed. The core stage-2 reproduction does not touch that chain and runs fully.

Before / after (mechanism)

files on disk (stage1, global ids): memcell_list_conv_5/6/7.json
OLD stage2 looked for (local range): memcell_list_conv_0/1/2.json
OLD stage2 found: []  -> builds 0 indexes -> search reads conv_5/6/7.pkl (never written) -> EMPTY
NEW stage2 builds:  bm25_index_conv_5/6/7.pkl  == what search reads -> non-empty

Scope

Surgical: two source files (evaluation/src/adapters/evermemos/stage2_index_building.py, evaluation/src/adapters/evermemos_adapter.py) plus one new regression test. No schema, no repository-layer, no docker-compose, no docs churn — deliberately narrower than the broad 21-file prior-art PR (#136).

Closes #127.

Credit

Prior community work diagnosed the same --from-conv/--to-conv filename mismatch. Co-author preserved on the commit:

Co-authored-by: Jah-yee 166608075+Jah-yee@users.noreply.github.com
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

🤖 Generated with Claude Code

…nd-AI#127)

The eval index-building stage (stage 2) wrote BM25/Embedding index files
keyed by a local range(num_conv) counter (conv_0..conv_{N-1}), while
stage 1 (memcell writer) and the search stage (reader) both key off the
GLOBAL conversation id extracted from conversation_id via
_extract_conv_index (e.g. "locomo_5" -> "5").

On a sliced run (--from-conv 5 --to-conv 10) the sliced conversations keep
their global ids, so stage 2 looked for memcell_list_conv_0..4.json
(absent -> skipped), built nothing, and the search stage then failed to
find bm25_index_conv_5..9.pkl, yielding empty retrieval. Reported in EverMind-AI#127.

Fix:
- stage2_index_building: build_bm25_index / build_emb_index iterate the
  global conv ids actually in play. The adapter passes the exact slice via
  a new conv_ids arg; a discover_conv_ids() helper derives ids from the
  memcell_list_conv_*.json filenames as a fallback for the CLI entry point.
- evermemos_adapter: _check_missing_indexes now probes by global conv ids
  (not range(num_conv)), so the skip-logic no longer always reports a
  slice's indexes as missing and rebuilds them under the wrong filenames.

Adds an offline regression test (no docker / LLM / network) that
reproduces the slice and asserts the built index filenames are exactly the
ones the search stage reads.

Prior art on the same mismatch by Jah-yee (EverMind-AI#136, broad 21-file fix; EverMind-AI#115,
tangential stage3 rename); this change keeps the fix surgical.

Co-authored-by: Jah-yee <166608075+Jah-yee@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 02:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] evaluation: BM25/Embedding index filenames mismatch when running with --from-conv/--to-conv, causing empty retrieval

2 participants