perf(localized_reader): O(1) name lookup via HashMap index (#191)#253
perf(localized_reader): O(1) name lookup via HashMap index (#191)#253SAY-5 wants to merge 2 commits intoTrueNine:devfrom
Conversation
Fix two CI failures from previous merge
Closes TrueNine#191. \`scan_directory\` looked up the existing entry for a normalised \`full_name\` via \`entries.iter_mut().find(|e| e.name == full_name)\` on every file processed. With \`n\` total entries that's an O(n²) walk — fine for tiny prompt trees, painful once a workspace accumulates a few hundred .src.mdx / .mdx pairs across the zh / en / dist combinations. Add a side-table \`HashMap<String, usize>\` mapping the normalised name to the entry's index in the \`Vec\`. The dedup branch becomes \`by_name.get(&full_name)\` (O(1)), and the new-entry branch inserts \`(full_name.clone(), entries.len())\` before the push so the next lookup hits. The \`seen\` HashSet is left in place to keep the diff minimal; it's addressed separately in TrueNine#190 (PR TrueNine#242). Once both land, \`localized_reader\` is allocation-bounded by the number of entries rather than the square of it. \`cargo test --lib repositories\` is green at 45/45.
|
Thanks for the contribution on #253. Retargeting this to |
|
Thanks again for the contribution on #253. I applied the lookup optimization onto What we added:
I verified the result with: Current result: 46 passed. So the optimization is now covered by a behavior-preserving test and is included on our |
Closes #191.
scan_directorylooked up the existing entry for a normalisedfull_nameviaentries.iter_mut().find(|e| e.name == full_name)on every file processed. Withntotal entries that's an O(n²) walk — fine for tiny prompt trees, painful once a workspace accumulates a few hundred .src.mdx / .mdx pairs across the zh / en / dist combinations.Fix
Add a side-table
HashMap<String, usize>mapping the normalised name to the entry's index in theVec. The dedup branch becomesby_name.get(&full_name)(O(1)), and the new-entry branch inserts(full_name.clone(), entries.len())before the push so the next lookup hits.The
seenHashSet is left in place to keep the diff minimal — it's addressed separately in #190 (PR #242).Test plan
cargo build --manifest-path sdk/Cargo.toml— cleancargo test --manifest-path sdk/Cargo.toml --lib repositories— 45/45 pass