⚡️ Speed up function closest_matching_file_function_name by 175% in PR #1689 (consolidate-python-discovery)#1691
Conversation
Runtime improvement (primary): The optimized version reduces the median runtime of the matcher from 24.7 ms to 8.99 ms — a 175% speedup. Line profiling shows closest_matching_file_function_name total time dropped from ~0.18 s to ~0.037 s, and the hot cost inside full Levenshtein calls was cut dramatically. What changed (concrete optimizations) - Precompute candidate metadata: the code now builds a flattened candidates list of (file_path, function, fn_name_lower, fn_len). This removes repeated attribute lookups, .lower() and len() calls inside the inner loop. - Bounded (banded) Levenshtein: replaced many full-distance computations with _bounded_levenshtein(s1, s2, max_distance) that computes distance only within a narrow band and returns >max_distance early if it cannot be better. The algorithm also: - early-exits when length difference already exceeds the bound, - restricts DP computation to a band [start..end] per row, and - performs a row-level early-exit if the minimum in the active band exceeds the bound. - Local binding and micro-optimizations: binder _bounded = _bounded_levenshtein and caching target_len reduce attribute lookups. The bounded implementation uses manual comparisons for the min-of-three to avoid tuple/min overhead in tight loops. - Keep original levenshtein_distance intact for external callers (no behavioral change for that API). Why this speeds things up (mechanics) - The original profile shows most time was spent inside levenshtein_distance calls (many full DP passes). The bounded approach avoids doing the O(n*m) DP for any candidate that cannot beat the current min_distance — in a large candidate set most names are either skipped by length or exceed the bound early. That turns many expensive full-distance calls into quick checks or truncated DP runs. - Reducing repeated .lower() and len() calls removes repeated Python-level work and attribute lookups inside tight loops which are relatively expensive in Python. - Restricting the DP to a narrow band reduces inner-loop iterations and memory writes/reads per row, giving much less Python-level loop iteration overhead and fewer list operations. Evidence from profiling and tests - Line profiler: original had ~1.78e8 ns in dist calls; optimized shows ~3.25e7 ns for bounded calls — a big reduction in DP work. - Annotated tests show the biggest wins for larger and near-match workloads (large_scale_exact_match, large_scale_near_match, choose_closest_of_multiple_candidates, special_characters_and_dots). Example: large-scale search for one exact name dropped from ~13.3 ms to ~5.99 ms in one annotated test. - The bounded variant still returns a number > max_distance when the true distance exceeds the bound, which preserves the selection logic (we only care about distances smaller than current min_distance). Trade-offs and regressions to be aware of - Small overhead for tiny inputs: building the candidates list and extra function call overhead cause a few microbenchmarks to be slightly slower (some tests show small regressions, e.g., empty maps or single-item trivial cases). These regressions are minimal and expected given the additional precompute allocation; they are an acceptable trade-off for the large reductions when many candidates are present. - Memory: candidates list adds short-lived tuples (file_path, function, lower-name, length). This is small relative to the CPU savings when many names are checked. Hot-path impact - get_functions_to_optimize can call closest_matching_file_function_name when the exact function wasn't found; in common interactive/CLI flows this is a helpful path. The optimization yields the most benefit when many functions are present (e.g., scanning a repository or when user passes a mistyped name and there are many candidates). Because the heavy work (distance comparisons across many candidates) is now bounded and cheaper, interactive latency and batch throughput improve in those hot scenarios. Summary - Primary benefit: 175% speedup in runtime for the matcher by avoiding repeated costly full Levenshtein computations and reducing Python-level overhead inside the candidate loop. - Side-effects: minimal memory allocation for candidate metadata and a few micro regressions on trivial inputs — reasonable trade-offs given the large improvements for real workloads (many candidates / near-match searches). - Correctness: original external levenshtein_distance kept intact; the bounded version preserves selection semantics (any distance >= current min_distance is treated as not improving the match).
PR Review SummaryPrek ChecksFixed in commit
mypy: No issues found. Code ReviewIssues found and fixed:
Observations (non-blocking): Test Coverage
Coverage regression: -6% overall for this file. Details:
Pre-existing test failure (unrelated to this PR): Last updated: 2026-02-27T18:20:00Z |
| @@ -936,3 +946,78 @@ def filter_files_optimized(file_path: Path, tests_root: Path, ignore_paths: list | |||
| file_path in submodule_paths | |||
| or any(file_path.is_relative_to(submodule_path) for submodule_path in submodule_paths) | |||
There was a problem hiding this comment.
Bug (fixed in latest commit): _bounded_levenshtein was defined twice — the second definition silently shadowed the first. While both implementations were identical so there was no functional bug, this is dead code that should not be committed. Fixed by removing the duplicate in cca120dd.
Duplicate Code AnalysisNo duplicates detected. I analyzed all changed files in this PR:
All new/modified functions were checked against the codebase:
This PR actually reduces duplication by consolidating Python function discovery to use libcst consistently and removing the old AST-based |
⚡️ Codeflash found optimizations for this PR📄 995% (9.95x) speedup for
|
⚡️ This pull request contains optimizations for PR #1689
If you approve this dependent PR, these changes will be merged into the original PR branch
consolidate-python-discovery.📄 175% (1.75x) speedup for
closest_matching_file_function_nameincodeflash/discovery/functions_to_optimize.py⏱️ Runtime :
24.7 milliseconds→8.99 milliseconds(best of111runs)📝 Explanation and details
Runtime improvement (primary): The optimized version reduces the median runtime of the matcher from 24.7 ms to 8.99 ms — a 175% speedup. Line profiling shows closest_matching_file_function_name total time dropped from ~0.18 s to ~0.037 s, and the hot cost inside full Levenshtein calls was cut dramatically.
What changed (concrete optimizations)
Why this speeds things up (mechanics)
Evidence from profiling and tests
Trade-offs and regressions to be aware of
Hot-path impact
Summary
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1689-2026-02-27T17.52.22and push.