⚡️ Speed up method CallGraph.ancestors by 37% in PR #1660 (unstructured-inference)#1834
Conversation
The optimization replaces the per-iteration `max_depth is not None and depth >= max_depth` check with a single upfront branch that runs two specialized BFS variants: one without depth tracking (storing plain `FunctionNode` in the queue) when `max_depth` is None, and one with depth tracking (storing `tuple[FunctionNode, int]`) when a limit is set. This eliminates tuple packing/unpacking and a conditional check on every loop iteration in the common unlimited-depth case. Line profiler shows the original `for edge in self.callers_of(current)` accounted for 91% of runtime; the optimized code caches `self.reverse` once and uses `reverse_map.get(current, [])` inline, avoiding 8309 redundant dictionary lookups. The trade-off is slightly longer code due to the two-path structure, but runtime improves 37% with negligible regressions in a few edge cases (max_depth=0 is 66% slower, but these are rare micro-benchmarks with sub-microsecond absolute deltas).
|
Claude finished @codeflash-ai[bot]'s task in 7s —— View job PR Review SummaryPrek ChecksFixed: Ruff detected Committed fix: The final diff after the fix now cleanly shows only the intended Code ReviewNo bugs. The
Duplicate DetectionNo duplicates detected. Test Coverage
All 39 existing unit tests pass. Last updated: 2026-03-15 |
The optimizer added a second 'reverse' property at the bottom of CallGraph that shadowed the existing one (F811), leaving the original as dead code. Remove the duplicate and the unnecessary _build_reverse helper; the existing reverse property via _build_adjacency is sufficient.
|
Applied directly in 77576ba — optimized descendants, ancestors, and topological_order together with the same pattern (local dict cache, split max_depth fast path, inlined dict.get). |
⚡️ This pull request contains optimizations for PR #1660
If you approve this dependent PR, these changes will be merged into the original PR branch
unstructured-inference.📄 37% (0.37x) speedup for
CallGraph.ancestorsincodeflash/models/call_graph.py⏱️ Runtime :
67.5 milliseconds→49.1 milliseconds(best of10runs)📝 Explanation and details
The optimization replaces the per-iteration
max_depth is not None and depth >= max_depthcheck with a single upfront branch that runs two specialized BFS variants: one without depth tracking (storing plainFunctionNodein the queue) whenmax_depthis None, and one with depth tracking (storingtuple[FunctionNode, int]) when a limit is set. This eliminates tuple packing/unpacking and a conditional check on every loop iteration in the common unlimited-depth case. Line profiler shows the originalfor edge in self.callers_of(current)accounted for 91% of runtime; the optimized code cachesself.reverseonce and usesreverse_map.get(current, [])inline, avoiding 8309 redundant dictionary lookups. The trade-off is slightly longer code due to the two-path structure, but runtime improves 37% with negligible regressions in a few edge cases (max_depth=0 is 66% slower, but these are rare micro-benchmarks with sub-microsecond absolute deltas).✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
test_call_graph.py::TestAncestors.test_empty_for_roottest_call_graph.py::TestAncestors.test_max_depth_limits_traversaltest_call_graph.py::TestAncestors.test_transitive_ancestors🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1660-2026-03-15T02.05.53and push.