⚡️ Speed up function existing_unit_test_count by 30% in PR #1660 (unstructured-inference)#1833
Conversation
The optimization inlines `qualified_name_with_modules_from_root` and wraps the expensive `module_name_from_file_path` call—which performs `Path.resolve()` and `relative_to()` operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that `module_name_from_file_path` consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.
|
Claude finished @codeflash-ai[bot]'s task in 23s —— View job PR Review Summary
Prek ChecksFixed and committed (
Prek now passes cleanly. Code ReviewCritical bug fixed: duplicate function definition The optimization PR introduced Naming convention violation fixed Per Redundant local import removed The original generated code imported No other issues: the LRU cache approach itself is sound — Duplicate DetectionNo duplicates detected. Test Coverage3486 passed, 57 skipped, 1 failed. The single failure (
Last updated: 2026-03-15 |
⚡️ This pull request contains optimizations for PR #1660
If you approve this dependent PR, these changes will be merged into the original PR branch
unstructured-inference.📄 30% (0.30x) speedup for
existing_unit_test_countincodeflash/discovery/discover_unit_tests.py⏱️ Runtime :
938 microseconds→724 microseconds(best of20runs)📝 Explanation and details
The optimization inlines
qualified_name_with_modules_from_rootand wraps the expensivemodule_name_from_file_pathcall—which performsPath.resolve()andrelative_to()operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms thatmodule_name_from_file_pathconsumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
test_ranking_boost.py::test_mixed_test_typestest_ranking_boost.py::test_multiple_existing_teststest_ranking_boost.py::test_no_matching_keytest_ranking_boost.py::test_no_teststest_ranking_boost.py::test_only_replay_teststest_ranking_boost.py::test_parametrized_tests_deduplicationtest_ranking_boost.py::test_single_existing_testtest_ranking_boost.py::test_truthiness_for_boolean_usage🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1660-2026-03-15T01.04.08and push.