Skip to content

perf(localized_reader): O(1) name lookup via HashMap index (#191)#253

Closed
SAY-5 wants to merge 2 commits intoTrueNine:devfrom
SAY-5:perf/localized-reader-name-index-191
Closed

perf(localized_reader): O(1) name lookup via HashMap index (#191)#253
SAY-5 wants to merge 2 commits intoTrueNine:devfrom
SAY-5:perf/localized-reader-name-index-191

Conversation

@SAY-5
Copy link
Copy Markdown
Contributor

@SAY-5 SAY-5 commented Apr 30, 2026

Closes #191.

scan_directory looked up the existing entry for a normalised full_name via entries.iter_mut().find(|e| e.name == full_name) on every file processed. With n total entries that's an O(n²) walk — fine for tiny prompt trees, painful once a workspace accumulates a few hundred .src.mdx / .mdx pairs across the zh / en / dist combinations.

Fix

Add a side-table HashMap<String, usize> mapping the normalised name to the entry's index in the Vec. The dedup branch becomes by_name.get(&full_name) (O(1)), and the new-entry branch inserts (full_name.clone(), entries.len()) before the push so the next lookup hits.

The seen HashSet is left in place to keep the diff minimal — it's addressed separately in #190 (PR #242).

Test plan

  • cargo build --manifest-path sdk/Cargo.toml — clean
  • cargo test --manifest-path sdk/Cargo.toml --lib repositories — 45/45 pass

TrueNine and others added 2 commits April 25, 2026 10:10
Fix two CI failures from previous merge
Closes TrueNine#191.

\`scan_directory\` looked up the existing entry for a normalised
\`full_name\` via \`entries.iter_mut().find(|e| e.name == full_name)\`
on every file processed. With \`n\` total entries that's an O(n²)
walk — fine for tiny prompt trees, painful once a workspace
accumulates a few hundred .src.mdx / .mdx pairs across the
zh / en / dist combinations.

Add a side-table \`HashMap<String, usize>\` mapping the normalised
name to the entry's index in the \`Vec\`. The dedup branch becomes
\`by_name.get(&full_name)\` (O(1)), and the new-entry branch inserts
\`(full_name.clone(), entries.len())\` before the push so the next
lookup hits.

The \`seen\` HashSet is left in place to keep the diff minimal; it's
addressed separately in TrueNine#190 (PR TrueNine#242). Once both land,
\`localized_reader\` is allocation-bounded by the number of entries
rather than the square of it.

\`cargo test --lib repositories\` is green at 45/45.
@SAY-5 SAY-5 requested a review from TrueNine as a code owner April 30, 2026 02:41
@TrueNine
Copy link
Copy Markdown
Owner

Thanks for the contribution on #253. Retargeting this to dev on our side, and I am adding explicit regression coverage so the behavior stays pinned down while we take the lookup optimization.

@TrueNine TrueNine changed the base branch from main to dev April 30, 2026 04:17
@TrueNine
Copy link
Copy Markdown
Owner

Thanks again for the contribution on #253.

I applied the lookup optimization onto dev on our side and added explicit regression coverage to pin down the intended behavior while taking the performance improvement.

What we added:

  • a regression test documenting that #253 keeps localized variants grouped into the same FlatFileEntry
  • coverage that a zh source, an en source, and a compiled mdx file with the same normalized name still merge into one entry

I verified the result with:
cargo test --manifest-path sdk/Cargo.toml --lib repositories

Current result: 46 passed.

So the optimization is now covered by a behavior-preserving test and is included on our dev branch for the next release. Since the change has already been applied manually on our side, I am closing this PR to keep the branch history clean. Thanks for the careful contribution.

@TrueNine TrueNine closed this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Localized Reader] O(n²) 查找替代 HashSet

2 participants