⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`) by codeflash-ai[bot] · Pull Request #1833 · codeflash-ai/codeflash

codeflash-ai · 2026-03-15T01:04:14Z

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.

📄 30% (0.30x) speedup for `existing_unit_test_count` in `codeflash/discovery/discover_unit_tests.py`

⏱️ Runtime : 938 microseconds → 724 microseconds (best of 20 runs)

📝 Explanation and details

The optimization inlines qualified_name_with_modules_from_root and wraps the expensive module_name_from_file_path call—which performs Path.resolve() and relative_to() operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that module_name_from_file_path consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 11 Passed
🌀 Generated Regression Tests	✅ 3 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_ranking_boost.py::test_mixed_test_types`	80.1μs	83.3μs	-3.80%⚠️
`test_ranking_boost.py::test_multiple_existing_tests`	80.9μs	84.1μs	-3.80%⚠️
`test_ranking_boost.py::test_no_matching_key`	92.0μs	94.9μs	-3.03%⚠️
`test_ranking_boost.py::test_no_tests`	98.3μs	104μs	-6.21%⚠️
`test_ranking_boost.py::test_only_replay_tests`	78.2μs	81.8μs	-4.34%⚠️
`test_ranking_boost.py::test_parametrized_tests_deduplication`	81.6μs	85.1μs	-4.11%⚠️
`test_ranking_boost.py::test_single_existing_test`	78.9μs	83.1μs	-5.08%⚠️
`test_ranking_boost.py::test_truthiness_for_boolean_usage`	150μs	81.7μs	84.8%✅

🌀 Click to see Generated Regression Tests

from pathlib import Path

# imports
import pytest  # used for our unit tests
# import the real implementations from the package under test
from codeflash.discovery.discover_unit_tests import existing_unit_test_count
from codeflash.models.function_types import FunctionToOptimize
from codeflash.models.models import FunctionCalledInTest, TestsInFile, TestType

def test_returns_zero_when_no_tests_present():
    # Create a simple FunctionToOptimize located in a module under the project root.
    func = FunctionToOptimize(function_name="do_work", file_path=Path("/project/pkg/module.py"))
    project_root = Path("/project")

    # Provide an empty mapping: no tests reference the function.
    result = existing_unit_test_count(func, project_root, {}) # 63.5μs -> 8.01μs (694% faster)
    # Expect zero because there are no entries for this function.
    assert result == 0

from dataclasses import dataclass
from enum import Enum
from pathlib import Path

# imports
import pytest
from codeflash.discovery.discover_unit_tests import existing_unit_test_count
# We need to import the function under test and its dependencies
from codeflash.models.function_types import FunctionParent, FunctionToOptimize
from codeflash.models.models import FunctionCalledInTest, TestType

@dataclass(frozen=True)
class CodePosition:
    """Represents a position in code."""
    line: int
    col: int

@dataclass(frozen=True)
class TestsInFile:
    """Represents tests in a file."""
    test_file: Path
    test_class: str | None
    test_function: str
    test_type: TestType

def test_existing_unit_test_count_empty_function_to_tests():
    """Test with empty function_to_tests dictionary returns 0."""
    # Create a simple function to optimize
    func = FunctionToOptimize(
        function_name="my_func",
        file_path=Path("/project/src/module.py")
    )
    project_root = Path("/project")
    function_to_tests = {}
    
    # Call the function - should return 0 since no tests exist
    result = existing_unit_test_count(func, project_root, function_to_tests) # 64.1μs -> 8.56μs (649% faster)
    
    # Assert that the result is 0
    assert result == 0

def test_existing_unit_test_count_function_not_in_mapping():
    """Test when function is not in the function_to_tests mapping."""
    # Create a function to optimize
    func = FunctionToOptimize(
        function_name="missing_func",
        file_path=Path("/project/src/module.py")
    )
    project_root = Path("/project")
    
    # Create mapping with different function
    other_key = "src.other_module.other_func"
    function_to_tests = {other_key: set()}
    
    # Call the function
    result = existing_unit_test_count(func, project_root, function_to_tests) # 69.6μs -> 8.91μs (681% faster)
    
    # Assert that the result is 0
    assert result == 0

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-15T01.04.08 and push.

The optimization inlines `qualified_name_with_modules_from_root` and wraps the expensive `module_name_from_file_path` call—which performs `Path.resolve()` and `relative_to()` operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that `module_name_from_file_path` consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.

claude · 2026-03-15T01:04:44Z

Claude finished @codeflash-ai[bot]'s task in 23s —— View job

PR Review Summary

Prek Checks

Fixed and committed (57d0d566):

I001 — from functools import lru_cache was inserted out of alphabetical order (after from typing import ... instead of after from collections import ...)
Formatting reformatted by ruff

Prek now passes cleanly.

Code Review

Critical bug fixed: duplicate function definition

The optimization PR introduced _module_name_from_file_path_cached twice at the end of the file (lines 1101–1106 and 1109–1115). In Python, the second definition silently overwrites the first. Both definitions were identical so there was no runtime crash, but mypy flagged it as [no-redef] and it's dead code. I removed the duplicate.

Naming convention violation fixed

Per CLAUDE.md code style rules: "NEVER use leading underscores (_function_name) — Python has no true private functions, use public names." The function was renamed from _module_name_from_file_path_cached → module_name_from_file_path_cached (call site updated accordingly).

Redundant local import removed

The original generated code imported module_name_from_file_path inside the cached helper and at the top of the file. The local import was redundant since the top-level import already exists and is used by other callers in the file (line 1051). The local import was removed.

No other issues: the LRU cache approach itself is sound — Path objects are hashable so they work as cache keys, maxsize=128 is a reasonable bound, and the optimization correctly avoids repeated Path.resolve()/relative_to() calls.

Duplicate Detection

No duplicates detected. module_name_from_file_path_cached is defined only in discover_unit_tests.py and wraps the existing module_name_from_file_path utility — no reimplementation.

Test Coverage

3486 passed, 57 skipped, 1 failed. The single failure (test_tracer_initialization_normal) is pre-existing and unrelated to this PR.

codeflash/discovery/discover_unit_tests.py: 41% coverage — this is a pre-existing baseline; the file requires full project/filesystem setup to exercise most paths. No coverage regression from this change.

Last updated: 2026-03-15

…er_unit_tests

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 15, 2026

codeflash-ai bot mentioned this pull request Mar 15, 2026

feat: improve function ranking with reference graph and test-based boosting #1660

Merged

4 tasks

fix: remove duplicate function, fix naming and import order in discov…

57d0d56

…er_unit_tests

KRRT7 merged commit 6510ebe into unstructured-inference Mar 15, 2026
26 of 27 checks passed

KRRT7 deleted the codeflash/optimize-pr1660-2026-03-15T01.04.08 branch March 15, 2026 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`)#1833

⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`)#1833
KRRT7 merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-15T01.04.08

codeflash-ai bot commented Mar 15, 2026

Uh oh!

claude bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai bot commented Mar 15, 2026

⚡️ This pull request contains optimizations for PR #1660

📄 30% (0.30x) speedup for existing_unit_test_count in codeflash/discovery/discover_unit_tests.py

📝 Explanation and details

Uh oh!

claude bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 30% (0.30x) speedup for `existing_unit_test_count` in `codeflash/discovery/discover_unit_tests.py`

claude bot commented Mar 15, 2026 •

edited

Loading