Skip to content

⚡️ Speed up function collect_existing_class_names by 109% in PR #1660 (unstructured-inference)#1845

Merged
claude[bot] merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-16T23.30.46
Mar 16, 2026
Merged

⚡️ Speed up function collect_existing_class_names by 109% in PR #1660 (unstructured-inference)#1845
claude[bot] merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-16T23.30.46

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai bot commented Mar 16, 2026

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.


📄 109% (1.09x) speedup for collect_existing_class_names in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 3.63 milliseconds 1.74 milliseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Speedup
tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract 16.4 seconds 16.4 seconds 0.00%

🔻 This change will degrade the performance of the following benchmarks:

{benchmark_info_degraded}

📝 Explanation and details

The optimization replaced a large multi-type isinstance() check (13 AST node types constructed into a tuple on every iteration) with a single hasattr(node, "body") test, then conditionally checked for orelse, finalbody, and handlers only when body exists. Line profiler shows the original isinstance block consumed ~40% of runtime across 7327 calls, while the new hasattr checks are ~3× cheaper per call. The nested conditionals avoid calling getattr with default values when attributes are absent (e.g., orelse is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional getattr calls to typically one or two hasattr checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 52 Passed
⏪ Replay Tests 2 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast  # used to build/parse Python source into an AST

import pytest  # used for our unit tests
# import the function under test from the actual module path
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_empty_module_returns_empty_set():
    # Parse an empty module (no statements)
    tree = ast.parse("")  # produces ast.Module with empty body
    # The function should return an empty set when there are no classes
    assert collect_existing_class_names(tree) == set() # 2.32μs -> 1.88μs (23.4% faster)

def test_top_level_and_nested_classes_collected():
    # Build source with:
    # - a top-level class Top
    # - a nested class Nested inside Top
    # - a class InsideFunc inside a function
    source = """
class Top:
    class Nested:
        pass

def some_func():
    class InsideFunc:
        pass
"""
    tree = ast.parse(source)
    # The function should find all three class names regardless of nesting
    expected = {"Top", "Nested", "InsideFunc"}
    assert collect_existing_class_names(tree) == expected # 7.46μs -> 5.07μs (47.2% faster)

def test_functions_and_variables_not_included():
    # Ensure the collector only collects class names and ignores functions/assignments
    source = """
def Foo(): 
    pass

bar = 123

class RealClass:
    pass
"""
    tree = ast.parse(source)
    # Only RealClass should be discovered, not 'Foo' or 'bar'
    assert collect_existing_class_names(tree) == {"RealClass"} # 6.41μs -> 4.49μs (42.9% faster)

def test_classes_in_try_except_finally_and_handlers():
    # Classes placed in try, except handler body, and finally should all be found.
    source = """
try:
    class InTry:
        pass
except Exception:
    class InExcept:
        pass
finally:
    class InFinally:
        pass
"""
    tree = ast.parse(source)
    expected = {"InTry", "InExcept", "InFinally"}
    assert collect_existing_class_names(tree) == expected # 9.07μs -> 5.63μs (61.0% faster)

def test_classes_in_orelse_and_nested_control_flow():
    # Classes inside if/else, for/else, while/else, with blocks should be found.
    # Also verify nested class names are captured uniquely.
    source = """
if True:
    class InIf:
        pass
else:
    class InElse:
        pass

for i in []:
    class InFor:
        pass
else:
    class InForElse:
        pass

while False:
    class InWhile:
        pass
else:
    class InWhileElse:
        pass

with (open if False else None):
    class InWith:
        pass
"""
    tree = ast.parse(source)
    expected = {
        "InIf",
        "InElse",
        "InFor",
        "InForElse",
        "InWhile",
        "InWhileElse",
        "InWith",
    }
    assert collect_existing_class_names(tree) == expected # 15.0μs -> 9.70μs (54.9% faster)

def test_async_function_and_async_for_and_async_with_nested_classes():
    # AsyncFunctionDef and AsyncFor/AsyncWith constructs are traversed.
    # Put various nested class statements inside async function bodies.
    source = """
async def afr():
    # async for inside an async function
    async for x in []:
        class InAsyncFor:
            pass

    # async with inside an async function
    async with (a if False else a):
        class InAsyncWith:
            pass

    # nested async def with its own class
    async def inner_async():
        class InInnerAsync:
            pass
"""
    tree = ast.parse(source)
    expected = {"InAsyncFor", "InAsyncWith", "InInnerAsync"}
    assert collect_existing_class_names(tree) == expected # 9.94μs -> 6.33μs (56.9% faster)

def test_duplicate_class_names_seen_once():
    # Multiple classes with the same name in different scopes should only appear once
    source = """
class Dup:
    pass

def f():
    class Dup:
        pass

if True:
    class Dup:
        pass
"""
    tree = ast.parse(source)
    # Set should contain single 'Dup'
    assert collect_existing_class_names(tree) == {"Dup"} # 8.65μs -> 5.82μs (48.5% faster)

def test_large_number_of_top_level_classes():
    # Generate 1000 top-level class definitions and ensure they are all found.
    # Using many small class definitions exercises linear traversal over many nodes.
    count = 1000
    source_lines = [f"class C{i}:\n    pass\n" for i in range(count)]
    source = "\n".join(source_lines)
    tree = ast.parse(source)

    found = collect_existing_class_names(tree) # 1.07ms -> 475μs (125% faster)
    # Build expected set of names C0..C{count-1}
    expected = {f"C{i}" for i in range(count)}
    # The result should exactly match the expected set
    assert found == expected
    # Also sanity-check that size matches
    assert len(found) == count

def test_large_nested_structure_mixed_blocks():
    # Create a nested structure repeatedly (depth 3 repeated many times) to stress traversal.
    # Each block contains a unique class name. This both increases the number of nodes and
    # verifies traversal across many container types.
    chunk = """
def make_block_{i}():
    if True:
        class A_{i}:
            pass
    try:
        class B_{i}:
            pass
    except:
        class C_{i}:
            pass
    for _ in []:
        class D_{i}:
            pass
"""
    times = 150  # 150 * 4 classes = 600 classes total; comfortably under 1000 but substantial
    source = "\n".join(chunk.format(i=i) for i in range(times))
    tree = ast.parse(source)
    found = collect_existing_class_names(tree) # 1.16ms -> 544μs (112% faster)
    # Build expected set for all generated class names
    expected = set()
    for i in range(times):
        expected.update({f"A_{i}", f"B_{i}", f"C_{i}", f"D_{i}"})
    assert found == expected
    # Ensure the number of classes matches
    assert len(found) == 4 * times
import ast

# imports
import pytest
# function to test
from codeflash.languages.python.context.code_context_extractor import \
    collect_existing_class_names

def test_single_class_at_module_level():
    """Test collection of a single class defined at module level."""
    code = "class MyClass:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.46μs -> 3.30μs (65.6% faster)
    assert result == {"MyClass"}

def test_multiple_classes_at_module_level():
    """Test collection of multiple classes at module level."""
    code = "class ClassA:\n    pass\nclass ClassB:\n    pass\nclass ClassC:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 7.67μs -> 4.93μs (55.5% faster)
    assert result == {"ClassA", "ClassB", "ClassC"}

def test_nested_class_in_class():
    """Test collection of nested classes within a class."""
    code = "class Outer:\n    class Inner:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.00μs -> 3.38μs (48.0% faster)
    assert result == {"Outer", "Inner"}

def test_class_inside_function():
    """Test collection of a class defined inside a function."""
    code = "def my_func():\n    class LocalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.16μs -> 3.47μs (48.8% faster)
    assert result == {"LocalClass"}

def test_class_inside_async_function():
    """Test collection of a class defined inside an async function."""
    code = "async def my_async_func():\n    class LocalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.61μs -> 4.08μs (37.6% faster)
    assert result == {"LocalClass"}

def test_class_inside_if_statement():
    """Test collection of a class defined inside an if statement."""
    code = "if True:\n    class ConditionalClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.92μs -> 3.18μs (54.9% faster)
    assert result == {"ConditionalClass"}

def test_class_inside_for_loop():
    """Test collection of a class defined inside a for loop."""
    code = "for i in range(1):\n    class LoopClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.84μs -> 3.47μs (39.6% faster)
    assert result == {"LoopClass"}

def test_class_inside_async_for_loop():
    """Test collection of a class defined inside an async for loop."""
    code = "async def f():\n    async for i in range(1):\n        class AsyncLoopClass:\n            pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.20μs -> 4.30μs (44.3% faster)
    assert result == {"AsyncLoopClass"}

def test_class_inside_while_loop():
    """Test collection of a class defined inside a while loop."""
    code = "while False:\n    class WhileClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.95μs -> 3.51μs (41.1% faster)
    assert result == {"WhileClass"}

def test_class_inside_with_statement():
    """Test collection of a class defined inside a with statement."""
    code = "with open('file.txt') as f:\n    class WithClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.09μs -> 3.65μs (39.6% faster)
    assert result == {"WithClass"}

def test_class_inside_async_with_statement():
    """Test collection of a class defined inside an async with statement."""
    code = "async def f():\n    async with open('file.txt') as f:\n        class AsyncWithClass:\n            pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.08μs -> 4.57μs (33.1% faster)
    assert result == {"AsyncWithClass"}

def test_class_inside_try_except():
    """Test collection of a class defined inside try-except block."""
    code = "try:\n    class TryClass:\n        pass\nexcept:\n    class ExceptClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.15μs -> 5.50μs (48.3% faster)
    assert result == {"TryClass", "ExceptClass"}

def test_class_inside_try_finally():
    """Test collection of a class defined inside try-finally block."""
    code = "try:\n    class TryClass:\n        pass\nfinally:\n    class FinallyClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 7.10μs -> 4.76μs (49.3% faster)
    assert result == {"TryClass", "FinallyClass"}

def test_complex_nested_structure():
    """Test collection from a complex nested structure with multiple levels."""
    code = """
class OuterClass:
    class InnerClass:
        pass
    
    def method(self):
        class MethodClass:
            pass

def outer_function():
    class FunctionClass:
        pass
    
    if True:
        class IfClass:
            pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 12.6μs -> 8.44μs (48.8% faster)
    assert result == {"OuterClass", "InnerClass", "MethodClass", "FunctionClass", "IfClass"}

def test_empty_module():
    """Test collection from an empty module."""
    code = ""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 2.37μs -> 1.74μs (36.2% faster)
    assert result == set()

def test_module_with_no_classes():
    """Test collection from a module with no class definitions."""
    code = "x = 1\ny = 2\nz = x + y"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.66μs -> 3.08μs (51.5% faster)
    assert result == set()

def test_class_with_special_characters_in_name():
    """Test collection of a class with underscores in name."""
    code = "class _PrivateClass:\n    pass\nclass __DunderClass__:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.33μs -> 4.03μs (57.2% faster)
    assert result == {"_PrivateClass", "__DunderClass__"}

def test_class_with_single_letter_name():
    """Test collection of classes with single-letter names."""
    code = "class A:\n    pass\nclass B:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.25μs -> 3.84μs (62.9% faster)
    assert result == {"A", "B"}

def test_class_with_long_name():
    """Test collection of a class with a very long name."""
    long_name = "VeryLongClassNameWith" + "A" * 100 + "Suffix"
    code = f"class {long_name}:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.83μs -> 2.57μs (48.7% faster)
    assert result == {long_name}

def test_deeply_nested_classes():
    """Test collection of deeply nested class definitions."""
    code = """
class Level1:
    class Level2:
        class Level3:
            class Level4:
                class Level5:
                    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.92μs -> 4.10μs (68.9% faster)
    assert result == {"Level1", "Level2", "Level3", "Level4", "Level5"}

def test_class_in_if_else_elif():
    """Test collection of classes in if-elif-else branches."""
    code = """
if True:
    class IfClass:
        pass
elif False:
    class ElifClass:
        pass
else:
    class ElseClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.63μs -> 5.62μs (53.5% faster)
    assert result == {"IfClass", "ElifClass", "ElseClass"}

def test_class_in_for_loop_else():
    """Test collection of a class in the else clause of a for loop."""
    code = "for i in []:\n    pass\nelse:\n    class ForElseClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.99μs -> 3.88μs (54.5% faster)
    assert result == {"ForElseClass"}

def test_class_in_while_loop_else():
    """Test collection of a class in the else clause of a while loop."""
    code = "while False:\n    pass\nelse:\n    class WhileElseClass:\n        pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.82μs -> 3.86μs (50.9% faster)
    assert result == {"WhileElseClass"}

def test_class_in_try_except_else():
    """Test collection of classes in try-except-else block."""
    code = """
try:
    class TryClass:
        pass
except:
    class ExceptClass:
        pass
else:
    class ElseClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 8.99μs -> 5.52μs (62.8% faster)
    assert result == {"TryClass", "ExceptClass", "ElseClass"}

def test_class_in_try_except_finally():
    """Test collection of classes in try-except-finally block."""
    code = """
try:
    class TryClass:
        pass
except:
    class ExceptClass:
        pass
finally:
    class FinallyClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 9.04μs -> 5.25μs (72.1% faster)
    assert result == {"TryClass", "ExceptClass", "FinallyClass"}

def test_class_in_multiple_except_handlers():
    """Test collection of classes in multiple except handlers."""
    code = """
try:
    pass
except ValueError:
    class ValueErrorClass:
        pass
except TypeError:
    class TypeErrorClass:
        pass
except:
    class GenericExceptClass:
        pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 10.7μs -> 6.13μs (73.7% faster)
    assert result == {"ValueErrorClass", "TypeErrorClass", "GenericExceptClass"}

def test_class_name_collision_same_scope():
    """Test that class names are stored in a set (no duplicates)."""
    # Even though we're redefining a class (which would be a syntax error in real code),
    # the AST will only have one ClassDef node for each name at each location.
    # This tests set behavior by verifying the return type.
    code = "class MyClass:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.05μs -> 2.63μs (53.6% faster)
    assert isinstance(result, set)
    assert len(result) == 1

def test_return_type_is_set():
    """Test that the function returns a set type."""
    code = "class A:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.82μs -> 2.60μs (46.5% faster)
    assert isinstance(result, set)

def test_class_names_are_strings():
    """Test that all returned class names are strings."""
    code = "class Class1:\n    pass\nclass Class2:\n    pass"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.81μs -> 4.11μs (41.5% faster)
    assert all(isinstance(name, str) for name in result)

def test_function_with_no_class():
    """Test that functions without nested classes don't add entries."""
    code = "def my_function():\n    x = 1\n    return x"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 4.97μs -> 3.69μs (34.8% faster)
    assert result == set()

def test_nested_function_with_class():
    """Test collection of classes in nested functions."""
    code = """
def outer():
    def inner():
        class NestedFunctionClass:
            pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.18μs -> 3.59μs (44.4% faster)
    assert result == {"NestedFunctionClass"}

def test_class_with_inherited_classes():
    """Test that inherited class names are still collected."""
    code = """
class Base:
    pass

class Derived(Base):
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 6.04μs -> 4.28μs (41.2% faster)
    assert result == {"Base", "Derived"}

def test_class_with_generic_type_parameters():
    """Test collection of classes with generic type parameters using typing.Generic."""
    code = """
from typing import TypeVar, Generic

T = TypeVar('T')

class GenericClass(Generic[T]):
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.78μs -> 4.07μs (42.1% faster)
    assert result == {"GenericClass"}

def test_class_with_decorators():
    """Test that decorated classes are still collected."""
    code = """
@decorator
class DecoratedClass:
    pass

@decorator1
@decorator2
class MultiDecoratedClass:
    pass
"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 5.84μs -> 4.12μs (41.9% faster)
    assert result == {"DecoratedClass", "MultiDecoratedClass"}

def test_empty_class():
    """Test collection of an empty class with only ellipsis."""
    code = "class EmptyClass:\n    ..."
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 3.80μs -> 2.88μs (32.1% faster)
    assert result == {"EmptyClass"}

def test_many_classes_at_module_level():
    """Test collection of 100 classes at module level."""
    # Generate code with 100 class definitions
    code_lines = [f"class Class{i}:\n    pass" for i in range(100)]
    code = "\n".join(code_lines)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 113μs -> 52.8μs (116% faster)
    expected = {f"Class{i}" for i in range(100)}
    assert result == expected
    assert len(result) == 100

def test_many_nested_classes():
    """Test collection of many nested classes with proper body."""
    code = "class Level0:\n    pass\n"
    for i in range(1, 20):
        indent = "    " * i
        code += f"{indent}class Level{i}:\n{indent}    pass\n"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 25.7μs -> 12.8μs (101% faster)
    expected = {f"Level{i}" for i in range(20)}
    assert result == expected
    assert len(result) == 20

def test_many_classes_in_control_flow():
    """Test collection of classes distributed across control flow statements."""
    # Create 50 if statements each with a class
    code_parts = []
    for i in range(50):
        code_parts.append(f"if {i} % 2:\n    class Class{i}:\n        pass\n")
    code = "\n".join(code_parts)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 90.1μs -> 43.9μs (105% faster)
    expected = {f"Class{i}" for i in range(50)}
    assert result == expected
    assert len(result) == 50

def test_many_classes_in_try_except():
    """Test collection of classes in multiple try-except blocks."""
    code = ""
    for i in range(10):
        code += f"""try:
    class TryClass{i}:
        pass
except:
    class ExceptClass{i}:
        pass
finally:
    class FinallyClass{i}:
        pass

"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 53.1μs -> 25.9μs (105% faster)
    assert len(result) == 30

def test_large_complex_nested_structure():
    """Test collection from a large complex nested structure."""
    code = """
class OuterClass1:
    class InnerClass1:
        pass
    
    def method1(self):
        class MethodClass1:
            pass
        if True:
            class IfInMethodClass1:
                pass
    
    def method2(self):
        class MethodClass2:
            pass
"""
    # Repeat this structure multiple times
    code = code * 10
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 72.9μs -> 35.4μs (106% faster)
    # Each repetition adds the same classes (no duplicates due to set)
    # So we expect just the unique classes
    assert "OuterClass1" in result
    assert "InnerClass1" in result
    assert "MethodClass1" in result
    assert "IfInMethodClass1" in result
    assert "MethodClass2" in result

def test_performance_with_mixed_constructs():
    """Test performance with a realistic large codebase structure."""
    code_parts = []
    
    # Add module-level classes
    for i in range(20):
        code_parts.append(f"class ModuleClass{i}:\n    pass\n")
    
    # Add functions with nested classes
    for i in range(20):
        code_parts.append(f"""
def function{i}():
    class FunctionClass{i}:
        pass
    
    if True:
        class FunctionIfClass{i}:
            pass
    
    for j in range(1):
        class FunctionForClass{i}:
            pass

""")
    
    # Add classes with nested structures
    for i in range(20):
        code_parts.append(f"""
class OuterClass{i}:
    class InnerClass{i}:
        pass
    
    def method{i}(self):
        class MethodClass{i}:
            pass

""")
    
    code = "\n".join(code_parts)
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 200μs -> 95.8μs (110% faster)
    
    # Verify we collected a significant number of classes
    assert len(result) > 100

def test_hundred_classes_single_if_block():
    """Test collection of 100 classes defined in a single if block."""
    code = "if True:\n"
    for i in range(100):
        code += f"    class Class{i}:\n        pass\n"
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 117μs -> 53.9μs (118% faster)
    expected = {f"Class{i}" for i in range(100)}
    assert result == expected
    assert len(result) == 100

def test_alternating_control_flow_with_classes():
    """Test collection with alternating if-else-try blocks containing classes."""
    code = ""
    for i in range(30):
        code += f"""if {i} % 2 == 0:
    class EvenClass{i}:
        pass
else:
    class OddClass{i}:
        pass

try:
    class TryClass{i}:
        pass
except:
    class ExceptClass{i}:
        pass

"""
    tree = ast.parse(code)
    result = collect_existing_class_names(tree) # 195μs -> 96.4μs (102% faster)
    assert len(result) == 30 * 4
⏪ Click to see Replay Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
benchmarks/codeflash_replay_tests_poa70mzd/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor_collect_existing_class_names_test_benchmark_extract 258μs 116μs 121%✅

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-16T23.30.46 and push.

Codeflash

The optimization replaced a large multi-type `isinstance()` check (13 AST node types constructed into a tuple on every iteration) with a single `hasattr(node, "body")` test, then conditionally checked for `orelse`, `finalbody`, and `handlers` only when `body` exists. Line profiler shows the original `isinstance` block consumed ~40% of runtime across 7327 calls, while the new `hasattr` checks are ~3× cheaper per call. The nested conditionals avoid calling `getattr` with default values when attributes are absent (e.g., `orelse` is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional `getattr` calls to typically one or two `hasattr` checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 16, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 16, 2026

Claude finished @codeflash-ai[bot]'s task in 6s —— View job


PR Review Summary

  • Triage PR scope
  • Lint and typecheck
  • Resolve stale threads
  • Code review
  • Duplicate detection
  • Test coverage
  • Check optimization PRs

Prek Checks

All ruff checks passed. No formatting issues.

Mypy Fix (committed)

The optimization introduced a mypy type error: stack = [tree] was inferred as list[ast.Module], but node.body returns list[stmt] — incompatible types. The original code used getattr(node, "body", []) which returns Any and silently bypassed this check.

Fixed by adding an explicit type annotation: stack: list[ast.AST] = [tree] — committed in a508ef6a.

Code Review

The optimization is correct and sound. Key observations:

  • hasattr(node, "body") is a valid replacement for the isinstance check. All AST statement nodes that have a body attribute are compound statements that can contain nested ClassDef nodes — exactly the set the original isinstance check covered.
  • The new approach is actually more complete: it correctly handles ast.TryStar (Python 3.11+ try/except* syntax) which the original isinstance check missed.
  • The four getattr(..., []) calls in the original always ran unconditionally and paid the cost of attribute lookup even when attributes were absent. The new hasattr + direct access approach is cheaper for the common case.

No correctness issues found.

Duplicate Detection

No duplicates detected. codeflash/languages/python/static_analysis/code_replacer.py:35 uses a similar hasattr(node, "body") pattern but for a different purpose (function body replacement), not a duplicate.

Test Coverage

Full test suite was still running at review time (background process). The PR itself includes 52 generated regression tests + 2 replay tests with 100% coverage per the PR description.


Last updated: 2026-03-16T23:32
| Branch

@claude claude bot merged commit c0577e5 into unstructured-inference Mar 16, 2026
27 checks passed
@claude claude bot deleted the codeflash/optimize-pr1660-2026-03-16T23.30.46 branch March 16, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants