⚡️ Speed up function collect_existing_class_names by 109% in PR #1660 (unstructured-inference)#1845
Conversation
The optimization replaced a large multi-type `isinstance()` check (13 AST node types constructed into a tuple on every iteration) with a single `hasattr(node, "body")` test, then conditionally checked for `orelse`, `finalbody`, and `handlers` only when `body` exists. Line profiler shows the original `isinstance` block consumed ~40% of runtime across 7327 calls, while the new `hasattr` checks are ~3× cheaper per call. The nested conditionals avoid calling `getattr` with default values when attributes are absent (e.g., `orelse` is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional `getattr` calls to typically one or two `hasattr` checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.
|
Claude finished @codeflash-ai[bot]'s task in 6s —— View job PR Review Summary
Prek ChecksAll ruff checks passed. No formatting issues. Mypy Fix (committed)The optimization introduced a mypy type error: Fixed by adding an explicit type annotation: Code ReviewThe optimization is correct and sound. Key observations:
No correctness issues found. Duplicate DetectionNo duplicates detected. Test CoverageFull test suite was still running at review time (background process). The PR itself includes 52 generated regression tests + 2 replay tests with 100% coverage per the PR description. Last updated: 2026-03-16T23:32 |
…existing_class_names
⚡️ This pull request contains optimizations for PR #1660
If you approve this dependent PR, these changes will be merged into the original PR branch
unstructured-inference.📄 109% (1.09x) speedup for
collect_existing_class_namesincodeflash/languages/python/context/code_context_extractor.py⏱️ Runtime :
3.63 milliseconds→1.74 milliseconds(best of5runs)⚡️ This change will improve the performance of the following benchmarks:
🔻 This change will degrade the performance of the following benchmarks:
{benchmark_info_degraded}
📝 Explanation and details
The optimization replaced a large multi-type
isinstance()check (13 AST node types constructed into a tuple on every iteration) with a singlehasattr(node, "body")test, then conditionally checked fororelse,finalbody, andhandlersonly whenbodyexists. Line profiler shows the originalisinstanceblock consumed ~40% of runtime across 7327 calls, while the newhasattrchecks are ~3× cheaper per call. The nested conditionals avoid callinggetattrwith default values when attributes are absent (e.g.,orelseis missing in 85% of nodes), cutting wasted attribute lookups from four unconditionalgetattrcalls to typically one or twohasattrchecks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
⏪ Click to see Replay Tests
benchmarks/codeflash_replay_tests_poa70mzd/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor_collect_existing_class_names_test_benchmark_extractTo edit these changes
git checkout codeflash/optimize-pr1660-2026-03-16T23.30.46and push.