⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`) by codeflash-ai[bot] · Pull Request #1845 · codeflash-ai/codeflash

codeflash-ai · 2026-03-16T23:30:51Z

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.

📄 109% (1.09x) speedup for `collect_existing_class_names` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 3.63 milliseconds → 1.74 milliseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function	Original Runtime	Expected New Runtime	Speedup
tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract	16.4 seconds	16.4 seconds	0.00%

🔻 This change will degrade the performance of the following benchmarks:

{benchmark_info_degraded}

📝 Explanation and details

The optimization replaced a large multi-type isinstance() check (13 AST node types constructed into a tuple on every iteration) with a single hasattr(node, "body") test, then conditionally checked for orelse, finalbody, and handlers only when body exists. Line profiler shows the original isinstance block consumed ~40% of runtime across 7327 calls, while the new hasattr checks are ~3× cheaper per call. The nested conditionals avoid calling getattr with default values when attributes are absent (e.g., orelse is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional getattr calls to typically one or two hasattr checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 52 Passed
⏪ Replay Tests	✅ 2 Passed
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import ast  # used to build/parse Python source into an AST

import pytest  # used for our unit tests
# import the function under test from the actual module path
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_empty_module_returns_empty_set():
    # Parse an empty module (no statements)
    tree = ast.parse("")  # produces ast.Module with empty body
    # The function should return an empty set when there are no classes
    assert collect_existing_class_names(tree) == set() # 2.32μs -> 1.88μs (23.4% faster)

def test_top_level_and_nested_classes_collected():
    # Build source with:
    # - a top-level class Top
    # - a nested class Nested inside Top
    # - a class InsideFunc inside a function
    source = """
class Top:
    class Nested:
        pass

def some_func():
    class InsideFunc:
        pass
"""
    tree = ast.parse(source)
    # The function should find all three class names regardless of nesting
    expected = {"Top", "Nested", "InsideFunc"}
    assert collect_existing_class_names(tree) == expected # 7.46μs -> 5.07μs (47.2% faster)

def test_functions_and_variables_not_included():
    # Ensure the collector only collects class names and ignores functions/assignments
    source = """
def Foo(): 
    pass

bar = 123

class RealClass:
    pass
"""
    tree = ast.parse(source)
    # Only RealClass should be discovered, not 'Foo' or 'bar'
    assert collect_existing_class_names(tree) == {"RealClass"} # 6.41μs -> 4.49μs (42.9% faster)

def test_classes_in_try_except_finally_and_handlers():
    # Classes placed in try, except handler body, and finally should all be found.
    source = """
try:
    class InTry:
        pass
except Exception:
    class InExcept:
        pass
finally:
    class InFinally:
        pass
"""
    tree = ast.parse(source)
    expected = {"InTry", "InExcept", "InFinally"}
    assert collect_existing_class_names(tree) == expected # 9.07μs -> 5.63μs (61.0% faster)

def test_classes_in_orelse_and_nested_control_flow():
    # Classes inside if/else, for/else, while/else, with blocks should be found.
    # Also verify nested class names are captured uniquely.
    source = """
if True:
    class InIf:
        pass
else:
    class InElse:
        pass

for i in []:
    class InFor:
        pass
else:
    class InForElse:
        pass

while False:
    class InWhile:
        pass
else:
    class InWhileElse:
        pass

with (open if False else None):
    class InWith:
        pass
"""
    tree = ast.parse(source)
    expected = {
        "InIf",
        "InElse",
        "InFor",
        "InForElse",
        "InWhile",
        "InWhileElse",
        "InWith",
    }
    assert collect_existing_class_names(tree) == expected # 15.0μs -> 9.70μs (54.9% faster)

def test_async_function_and_async_for_and_async_with_nested_classes():
    # AsyncFunctionDef and AsyncFor/AsyncWith constructs are traversed.
    # Put various nested class statements inside async function bodies.
    source = """
async def afr():
    # async for inside an async function
    async for x in []:
        class InAsyncFor:
            pass

    # async with inside an async function
    async with (a if False else a):
        class InAsyncWith:
            pass

    # nested async def with its own class
    async def inner_async():
        class InInnerAsync:
            pass
"""
    tree = ast.parse(source)
    expected = {"InAsyncFor", "InAsyncWith", "InInnerAsync"}
    assert collect_existing_class_names(tree) == expected # 9.94μs -> 6.33μs (56.9% faster)

def test_duplicate_class_names_seen_once():
    # Multiple classes with the same name in different scopes should only appear once
    source = """
class Dup:
    pass

def f():
    class Dup:
        pass

if True:
    class Dup:
        pass
"""
    tree = ast.parse(source)
    # Set should contain single 'Dup'
    assert collect_existing_class_names(tree) == {"Dup"} # 8.65μs -> 5.82μs (48.5% faster)

def test_large_number_of_top_level_classes():
    # Generate 1000 top-level class definitions and ensure they are all found.
    # Using many small class definitions exercises linear traversal over many nodes.
    count = 1000
    source_lines = [f"class C{i}:\n    pass\n" for i in range(count)]
    source = "\n".join(source_lines)
    tree = ast.parse(source)

    found = collect_existing_class_names(tree) # 1.07ms -> 475μs (125% faster)
    # Build expected set of names C0..C{count-1}
    expected = {f"C{i}" for i in range(count)}
    # The result should exactly match the expected set
    assert found == expected
    # Also sanity-check that size matches
    assert len(found) == count

def test_large_nested_structure_mixed_blocks():
    # Create a nested structure repeatedly (depth 3 repeated many times) to stress traversal.
    # Each block contains a unique class name. This both increases the number of nodes and
    # verifies traversal across many container types.
    chunk = """
def make_block_{i}():
    if True:
        class A_{i}:
            pass
    try:
        class B_{i}:
            pass
    except:
        class C_{i}:
            pass
    for _ in []:
        class D_{i}:
            pass
"""
    times = 150  # 150 * 4 classes = 600 classes total; comfortably under 1000 but substantial
    source = "\n".join(chunk.format(i=i) for i in range(times))
    tree = ast.parse(source)
    found = collect_existing_class_names(tree) # 1.16ms -> 544μs (112% faster)
    # Build expected set for all generated class names
    expected = set()
    for i in range(times):
        expected.update({f"A_{i}", f"B_{i}", f"C_{i}", f"D_{i}"})
    assert found == expected
    # Ensure the number of classes matches
    assert len(found) == 4 * times

import ast

# imports
import pytest
# function to test
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_single_class_at_module_level():
    """Test collection of a single class defined at module level."""
    code = "class MyClass:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.46μs -> 3.30μs (65.6% faster)
    assert result == {"MyClass"}

def test_multiple_classes_at_module_level():
    """Test collection of multiple classes at module level."""
    code = "class ClassA:\n    pass\nclass ClassB:\n    pass\nclass ClassC:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 7.67μs -> 4.93μs (55.5% faster)
    assert result == {"ClassA", "ClassB", "ClassC"}

def test_nested_class_in_class():
    """Test collection of nested classes within a class."""
    code = "class Outer:\n    class Inner:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.00μs -> 3.38μs (48.0% faster)
    assert result == {"Outer", "Inner"}

def test_class_inside_function():
    """Test collection of a class defined inside a function."""
    code = "def my_func():\n    class LocalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.16μs -> 3.47μs (48.8% faster)
    assert result == {"LocalClass"}

def test_class_inside_async_function():
    """Test collection of a class defined inside an async function."""
    code = "async def my_async_func():\n    class LocalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.61μs -> 4.08μs (37.6% faster)
    assert result == {"LocalClass"}

def test_class_inside_if_statement():
    """Test collection of a class defined inside an if statement."""
    code = "if True:\n    class ConditionalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.92μs -> 3.18μs (54.9% faster)
    assert result == {"ConditionalClass"}

def test_class_inside_for_loop():
    """Test collection of a class defined inside a for loop."""
    code = "for i in range(1):\n    class LoopClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.84μs -> 3.47μs (39.6% faster)
    assert result == {"LoopClass"}

def test_class_inside_async_for_loop():
    """Test collection of a class defined inside an async for loop."""
    code = "async def f():\n    async for i in range(1):\n        class AsyncLoopClass:\n            pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.20μs -> 4.30μs (44.3% faster)
    assert result == {"AsyncLoopClass"}

def test_class_inside_while_loop():
    """Test collection of a class defined inside a while loop."""
    code = "while False:\n    class WhileClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.95μs -> 3.51μs (41.1% faster)
    assert result == {"WhileClass"}

def test_class_inside_with_statement():
    """Test collection of a class defined inside a with statement."""
    code = "with open('file.txt') as f:\n    class WithClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.09μs -> 3.65μs (39.6% faster)
    assert result == {"WithClass"}

def test_class_inside_async_with_statement():
    """Test collection of a class defined inside an async with statement."""
    code = "async def f():\n    async with open('file.txt') as f:\n        class AsyncWithClass:\n            pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.08μs -> 4.57μs (33.1% faster)
    assert result == {"AsyncWithClass"}

def test_class_inside_try_except():
    """Test collection of a class defined inside try-except block."""
    code = "try:\n    class TryClass:\n        pass\nexcept:\n    class ExceptClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.15μs -> 5.50μs (48.3% faster)
    assert result == {"TryClass", "ExceptClass"}

def test_class_inside_try_finally():
    """Test collection of a class defined inside try-finally block."""
    code = "try:\n    class TryClass:\n        pass\nfinally:\n    class FinallyClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 7.10μs -> 4.76μs (49.3% faster)
    assert result == {"TryClass", "FinallyClass"}

def test_complex_nested_structure():
    """Test collection from a complex nested structure with multiple levels."""
    code = """
class OuterClass:
    class InnerClass:
        pass
    
    def method(self):
        class MethodClass:
            pass

def outer_function():
    class FunctionClass:
        pass
    
    if True:
        class IfClass:
            pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 12.6μs -> 8.44μs (48.8% faster)
    assert result == {"OuterClass", "InnerClass", "MethodClass", "FunctionClass", "IfClass"}

def test_empty_module():
    """Test collection from an empty module."""
    code = ""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 2.37μs -> 1.74μs (36.2% faster)
    assert result == set()

def test_module_with_no_classes():
    """Test collection from a module with no class definitions."""
    code = "x = 1\ny = 2\nz = x + y"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.66μs -> 3.08μs (51.5% faster)
    assert result == set()

def test_class_with_special_characters_in_name():
    """Test collection of a class with underscores in name."""
    code = "class _PrivateClass:\n    pass\nclass __DunderClass__:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.33μs -> 4.03μs (57.2% faster)
    assert result == {"_PrivateClass", "__DunderClass__"}

def test_class_with_single_letter_name():
    """Test collection of classes with single-letter names."""
    code = "class A:\n    pass\nclass B:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.25μs -> 3.84μs (62.9% faster)
    assert result == {"A", "B"}

def test_class_with_long_name():
    """Test collection of a class with a very long name."""
    long_name = "VeryLongClassNameWith" + "A" * 100 + "Suffix"
    code = f"class {long_name}:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.83μs -> 2.57μs (48.7% faster)
    assert result == {long_name}

def test_deeply_nested_classes():
    """Test collection of deeply nested class definitions."""
    code = """
class Level1:
    class Level2:
        class Level3:
            class Level4:
                class Level5:
                    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.92μs -> 4.10μs (68.9% faster)
    assert result == {"Level1", "Level2", "Level3", "Level4", "Level5"}

def test_class_in_if_else_elif():
    """Test collection of classes in if-elif-else branches."""
    code = """
if True:
    class IfClass:
        pass
elif False:
    class ElifClass:
        pass
else:
    class ElseClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.63μs -> 5.62μs (53.5% faster)
    assert result == {"IfClass", "ElifClass", "ElseClass"}

def test_class_in_for_loop_else():
    """Test collection of a class in the else clause of a for loop."""
    code = "for i in []:\n    pass\nelse:\n    class ForElseClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.99μs -> 3.88μs (54.5% faster)
    assert result == {"ForElseClass"}

def test_class_in_while_loop_else():
    """Test collection of a class in the else clause of a while loop."""
    code = "while False:\n    pass\nelse:\n    class WhileElseClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.82μs -> 3.86μs (50.9% faster)
    assert result == {"WhileElseClass"}

def test_class_in_try_except_else():
    """Test collection of classes in try-except-else block."""
    code = """
try:
    class TryClass:
        pass
except:
    class ExceptClass:
        pass
else:
    class ElseClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.99μs -> 5.52μs (62.8% faster)
    assert result == {"TryClass", "ExceptClass", "ElseClass"}

def test_class_in_try_except_finally():
    """Test collection of classes in try-except-finally block."""
    code = """
try:
    class TryClass:
        pass
except:
    class ExceptClass:
        pass
finally:
    class FinallyClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 9.04μs -> 5.25μs (72.1% faster)
    assert result == {"TryClass", "ExceptClass", "FinallyClass"}

def test_class_in_multiple_except_handlers():
    """Test collection of classes in multiple except handlers."""
    code = """
try:
    pass
except ValueError:
    class ValueErrorClass:
        pass
except TypeError:
    class TypeErrorClass:
        pass
except:
    class GenericExceptClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 10.7μs -> 6.13μs (73.7% faster)
    assert result == {"ValueErrorClass", "TypeErrorClass", "GenericExceptClass"}

def test_class_name_collision_same_scope():
    """Test that class names are stored in a set (no duplicates)."""
    # Even though we're redefining a class (which would be a syntax error in real code),
    # the AST will only have one ClassDef node for each name at each location.
    # This tests set behavior by verifying the return type.
    code = "class MyClass:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.05μs -> 2.63μs (53.6% faster)
    assert isinstance(result, set)
    assert len(result) == 1

def test_return_type_is_set():
    """Test that the function returns a set type."""
    code = "class A:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.82μs -> 2.60μs (46.5% faster)
    assert isinstance(result, set)

def test_class_names_are_strings():
    """Test that all returned class names are strings."""
    code = "class Class1:\n    pass\nclass Class2:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.81μs -> 4.11μs (41.5% faster)
    assert all(isinstance(name, str) for name in result)

def test_function_with_no_class():
    """Test that functions without nested classes don't add entries."""
    code = "def my_function():\n    x = 1\n    return x"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.97μs -> 3.69μs (34.8% faster)
    assert result == set()

def test_nested_function_with_class():
    """Test collection of classes in nested functions."""
    code = """
def outer():
    def inner():
        class NestedFunctionClass:
            pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.18μs -> 3.59μs (44.4% faster)
    assert result == {"NestedFunctionClass"}

def test_class_with_inherited_classes():
    """Test that inherited class names are still collected."""
    code = """
class Base:
    pass

class Derived(Base):
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.04μs -> 4.28μs (41.2% faster)
    assert result == {"Base", "Derived"}

def test_class_with_generic_type_parameters():
    """Test collection of classes with generic type parameters using typing.Generic."""
    code = """
from typing import TypeVar, Generic

T = TypeVar('T')

class GenericClass(Generic[T]):
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.78μs -> 4.07μs (42.1% faster)
    assert result == {"GenericClass"}

def test_class_with_decorators():
    """Test that decorated classes are still collected."""
    code = """
@decorator
class DecoratedClass:
    pass

@decorator1
@decorator2
class MultiDecoratedClass:
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.84μs -> 4.12μs (41.9% faster)
    assert result == {"DecoratedClass", "MultiDecoratedClass"}

def test_empty_class():
    """Test collection of an empty class with only ellipsis."""
    code = "class EmptyClass:\n    ..."
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.80μs -> 2.88μs (32.1% faster)
    assert result == {"EmptyClass"}

def test_many_classes_at_module_level():
    """Test collection of 100 classes at module level."""
    # Generate code with 100 class definitions
    code_lines = [f"class Class{i}:\n    pass" for i in range(100)]
    code = "\n".join(code_lines)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 113μs -> 52.8μs (116% faster)
    expected = {f"Class{i}" for i in range(100)}
    assert result == expected
    assert len(result) == 100

def test_many_nested_classes():
    """Test collection of many nested classes with proper body."""
    code = "class Level0:\n    pass\n"
    for i in range(1, 20):
        indent = "    " * i
        code += f"{indent}class Level{i}:\n{indent}    pass\n"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 25.7μs -> 12.8μs (101% faster)
    expected = {f"Level{i}" for i in range(20)}
    assert result == expected
    assert len(result) == 20

def test_many_classes_in_control_flow():
    """Test collection of classes distributed across control flow statements."""
    # Create 50 if statements each with a class
    code_parts = []
    for i in range(50):
        code_parts.append(f"if {i} % 2:\n    class Class{i}:\n        pass\n")
    code = "\n".join(code_parts)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 90.1μs -> 43.9μs (105% faster)
    expected = {f"Class{i}" for i in range(50)}
    assert result == expected
    assert len(result) == 50

def test_many_classes_in_try_except():
    """Test collection of classes in multiple try-except blocks."""
    code = ""
    for i in range(10):
        code += f"""try:
    class TryClass{i}:
        pass
except:
    class ExceptClass{i}:
        pass
finally:
    class FinallyClass{i}:
        pass

"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 53.1μs -> 25.9μs (105% faster)
    assert len(result) == 30

def test_large_complex_nested_structure():
    """Test collection from a large complex nested structure."""
    code = """
class OuterClass1:
    class InnerClass1:
        pass
    
    def method1(self):
        class MethodClass1:
            pass
        if True:
            class IfInMethodClass1:
                pass
    
    def method2(self):
        class MethodClass2:
            pass
"""
    # Repeat this structure multiple times
    code = code * 10
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 72.9μs -> 35.4μs (106% faster)
    # Each repetition adds the same classes (no duplicates due to set)
    # So we expect just the unique classes
    assert "OuterClass1" in result
    assert "InnerClass1" in result
    assert "MethodClass1" in result
    assert "IfInMethodClass1" in result
    assert "MethodClass2" in result

def test_performance_with_mixed_constructs():
    """Test performance with a realistic large codebase structure."""
    code_parts = []
    
    # Add module-level classes
    for i in range(20):
        code_parts.append(f"class ModuleClass{i}:\n    pass\n")
    
    # Add functions with nested classes
    for i in range(20):
        code_parts.append(f"""
def function{i}():
    class FunctionClass{i}:
        pass
    
    if True:
        class FunctionIfClass{i}:
            pass
    
    for j in range(1):
        class FunctionForClass{i}:
            pass

""")
    
    # Add classes with nested structures
    for i in range(20):
        code_parts.append(f"""
class OuterClass{i}:
    class InnerClass{i}:
        pass
    
    def method{i}(self):
        class MethodClass{i}:
            pass

""")
    
    code = "\n".join(code_parts)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 200μs -> 95.8μs (110% faster)
    
    # Verify we collected a significant number of classes
    assert len(result) > 100

def test_hundred_classes_single_if_block():
    """Test collection of 100 classes defined in a single if block."""
    code = "if True:\n"
    for i in range(100):
        code += f"    class Class{i}:\n        pass\n"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 117μs -> 53.9μs (118% faster)
    expected = {f"Class{i}" for i in range(100)}
    assert result == expected
    assert len(result) == 100

def test_alternating_control_flow_with_classes():
    """Test collection with alternating if-else-try blocks containing classes."""
    code = ""
    for i in range(30):
        code += f"""if {i} % 2 == 0:
    class EvenClass{i}:
        pass
else:
    class OddClass{i}:
        pass

try:
    class TryClass{i}:
        pass
except:
    class ExceptClass{i}:
        pass

"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 195μs -> 96.4μs (102% faster)
    assert len(result) == 30 * 4

⏪ Click to see Replay Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`benchmarks/codeflash_replay_tests_poa70mzd/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor_collect_existing_class_names_test_benchmark_extract`	258μs	116μs	121%✅

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-16T23.30.46 and push.

The optimization replaced a large multi-type `isinstance()` check (13 AST node types constructed into a tuple on every iteration) with a single `hasattr(node, "body")` test, then conditionally checked for `orelse`, `finalbody`, and `handlers` only when `body` exists. Line profiler shows the original `isinstance` block consumed ~40% of runtime across 7327 calls, while the new `hasattr` checks are ~3× cheaper per call. The nested conditionals avoid calling `getattr` with default values when attributes are absent (e.g., `orelse` is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional `getattr` calls to typically one or two `hasattr` checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.

claude · 2026-03-16T23:31:25Z

Claude finished @codeflash-ai[bot]'s task in 6s —— View job

PR Review Summary

Prek Checks

All ruff checks passed. No formatting issues.

Mypy Fix (committed)

The optimization introduced a mypy type error: stack = [tree] was inferred as list[ast.Module], but node.body returns list[stmt] — incompatible types. The original code used getattr(node, "body", []) which returns Any and silently bypassed this check.

Fixed by adding an explicit type annotation: stack: list[ast.AST] = [tree] — committed in a508ef6a.

Code Review

The optimization is correct and sound. Key observations:

hasattr(node, "body") is a valid replacement for the isinstance check. All AST statement nodes that have a body attribute are compound statements that can contain nested ClassDef nodes — exactly the set the original isinstance check covered.
The new approach is actually more complete: it correctly handles ast.TryStar (Python 3.11+ try/except* syntax) which the original isinstance check missed.
The four getattr(..., []) calls in the original always ran unconditionally and paid the cost of attribute lookup even when attributes were absent. The new hasattr + direct access approach is cheaper for the common case.

No correctness issues found.

Duplicate Detection

No duplicates detected. codeflash/languages/python/static_analysis/code_replacer.py:35 uses a similar hasattr(node, "body") pattern but for a different purpose (function body replacement), not a duplicate.

Test Coverage

Full test suite was still running at review time (background process). The PR itself includes 52 generated regression tests + 2 replay tests with 100% coverage per the PR description.

Last updated: 2026-03-16T23:32
| Branch

…existing_class_names

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 16, 2026

fix: add explicit list[ast.AST] type annotation for stack in collect_…

a508ef6

…existing_class_names

claude bot merged commit c0577e5 into unstructured-inference Mar 16, 2026
27 checks passed

claude bot deleted the codeflash/optimize-pr1660-2026-03-16T23.30.46 branch March 16, 2026 23:59

codeflash-ai bot mentioned this pull request Mar 16, 2026

feat: improve function ranking with reference graph and test-based boosting #1660

Merged

4 tasks

claude bot mentioned this pull request Mar 16, 2026

Improve context extraction and test discovery performance #1846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`)#1845

⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`)#1845
claude[bot] merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-16T23.30.46

codeflash-ai bot commented Mar 16, 2026

Uh oh!

claude bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Mar 16, 2026

⚡️ This pull request contains optimizations for PR #1660

📄 109% (1.09x) speedup for collect_existing_class_names in codeflash/languages/python/context/code_context_extractor.py

⚡️ This change will improve the performance of the following benchmarks:

🔻 This change will degrade the performance of the following benchmarks:

📝 Explanation and details

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Mypy Fix (committed)

Code Review

Duplicate Detection

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 109% (1.09x) speedup for `collect_existing_class_names` in `codeflash/languages/python/context/code_context_extractor.py`

claude bot commented Mar 16, 2026 •

edited

Loading