Skip to content

⚡️ Speed up function existing_unit_test_count by 30% in PR #1660 (unstructured-inference)#1833

Merged
KRRT7 merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-15T01.04.08
Mar 15, 2026
Merged

⚡️ Speed up function existing_unit_test_count by 30% in PR #1660 (unstructured-inference)#1833
KRRT7 merged 2 commits intounstructured-inferencefrom
codeflash/optimize-pr1660-2026-03-15T01.04.08

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai bot commented Mar 15, 2026

⚡️ This pull request contains optimizations for PR #1660

If you approve this dependent PR, these changes will be merged into the original PR branch unstructured-inference.

This PR will be automatically closed if the original PR is merged.


📄 30% (0.30x) speedup for existing_unit_test_count in codeflash/discovery/discover_unit_tests.py

⏱️ Runtime : 938 microseconds 724 microseconds (best of 20 runs)

📝 Explanation and details

The optimization inlines qualified_name_with_modules_from_root and wraps the expensive module_name_from_file_path call—which performs Path.resolve() and relative_to() operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that module_name_from_file_path consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 11 Passed
🌀 Generated Regression Tests 3 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_ranking_boost.py::test_mixed_test_types 80.1μs 83.3μs -3.80%⚠️
test_ranking_boost.py::test_multiple_existing_tests 80.9μs 84.1μs -3.80%⚠️
test_ranking_boost.py::test_no_matching_key 92.0μs 94.9μs -3.03%⚠️
test_ranking_boost.py::test_no_tests 98.3μs 104μs -6.21%⚠️
test_ranking_boost.py::test_only_replay_tests 78.2μs 81.8μs -4.34%⚠️
test_ranking_boost.py::test_parametrized_tests_deduplication 81.6μs 85.1μs -4.11%⚠️
test_ranking_boost.py::test_single_existing_test 78.9μs 83.1μs -5.08%⚠️
test_ranking_boost.py::test_truthiness_for_boolean_usage 150μs 81.7μs 84.8%✅
🌀 Click to see Generated Regression Tests
from pathlib import Path

# imports
import pytest  # used for our unit tests
# import the real implementations from the package under test
from codeflash.discovery.discover_unit_tests import existing_unit_test_count
from codeflash.models.function_types import FunctionToOptimize
from codeflash.models.models import FunctionCalledInTest, TestsInFile, TestType

def test_returns_zero_when_no_tests_present():
    # Create a simple FunctionToOptimize located in a module under the project root.
    func = FunctionToOptimize(function_name="do_work", file_path=Path("/project/pkg/module.py"))
    project_root = Path("/project")

    # Provide an empty mapping: no tests reference the function.
    result = existing_unit_test_count(func, project_root, {}) # 63.5μs -> 8.01μs (694% faster)
    # Expect zero because there are no entries for this function.
    assert result == 0
from dataclasses import dataclass
from enum import Enum
from pathlib import Path

# imports
import pytest
from codeflash.discovery.discover_unit_tests import existing_unit_test_count
# We need to import the function under test and its dependencies
from codeflash.models.function_types import FunctionParent, FunctionToOptimize
from codeflash.models.models import FunctionCalledInTest, TestType

@dataclass(frozen=True)
class CodePosition:
    """Represents a position in code."""
    line: int
    col: int

@dataclass(frozen=True)
class TestsInFile:
    """Represents tests in a file."""
    test_file: Path
    test_class: str | None
    test_function: str
    test_type: TestType

def test_existing_unit_test_count_empty_function_to_tests():
    """Test with empty function_to_tests dictionary returns 0."""
    # Create a simple function to optimize
    func = FunctionToOptimize(
        function_name="my_func",
        file_path=Path("/project/src/module.py")
    )
    project_root = Path("/project")
    function_to_tests = {}
    
    # Call the function - should return 0 since no tests exist
    result = existing_unit_test_count(func, project_root, function_to_tests) # 64.1μs -> 8.56μs (649% faster)
    
    # Assert that the result is 0
    assert result == 0

def test_existing_unit_test_count_function_not_in_mapping():
    """Test when function is not in the function_to_tests mapping."""
    # Create a function to optimize
    func = FunctionToOptimize(
        function_name="missing_func",
        file_path=Path("/project/src/module.py")
    )
    project_root = Path("/project")
    
    # Create mapping with different function
    other_key = "src.other_module.other_func"
    function_to_tests = {other_key: set()}
    
    # Call the function
    result = existing_unit_test_count(func, project_root, function_to_tests) # 69.6μs -> 8.91μs (681% faster)
    
    # Assert that the result is 0
    assert result == 0

To edit these changes git checkout codeflash/optimize-pr1660-2026-03-15T01.04.08 and push.

Codeflash Static Badge

The optimization inlines `qualified_name_with_modules_from_root` and wraps the expensive `module_name_from_file_path` call—which performs `Path.resolve()` and `relative_to()` operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that `module_name_from_file_path` consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 15, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 15, 2026

Claude finished @codeflash-ai[bot]'s task in 23s —— View job


PR Review Summary

  • Triage changed files
  • Lint and type checks (auto-fixed issues, committed)
  • Resolve stale threads (none found)
  • Code review
  • Duplicate detection
  • Test coverage
  • Summary

Prek Checks

Fixed and committed (57d0d566):

  • I001from functools import lru_cache was inserted out of alphabetical order (after from typing import ... instead of after from collections import ...)
  • Formatting reformatted by ruff

Prek now passes cleanly.


Code Review

Critical bug fixed: duplicate function definition

The optimization PR introduced _module_name_from_file_path_cached twice at the end of the file (lines 1101–1106 and 1109–1115). In Python, the second definition silently overwrites the first. Both definitions were identical so there was no runtime crash, but mypy flagged it as [no-redef] and it's dead code. I removed the duplicate.

Naming convention violation fixed

Per CLAUDE.md code style rules: "NEVER use leading underscores (_function_name) — Python has no true private functions, use public names." The function was renamed from _module_name_from_file_path_cachedmodule_name_from_file_path_cached (call site updated accordingly).

Redundant local import removed

The original generated code imported module_name_from_file_path inside the cached helper and at the top of the file. The local import was redundant since the top-level import already exists and is used by other callers in the file (line 1051). The local import was removed.

No other issues: the LRU cache approach itself is sound — Path objects are hashable so they work as cache keys, maxsize=128 is a reasonable bound, and the optimization correctly avoids repeated Path.resolve()/relative_to() calls.


Duplicate Detection

No duplicates detected. module_name_from_file_path_cached is defined only in discover_unit_tests.py and wraps the existing module_name_from_file_path utility — no reimplementation.


Test Coverage

3486 passed, 57 skipped, 1 failed. The single failure (test_tracer_initialization_normal) is pre-existing and unrelated to this PR.

codeflash/discovery/discover_unit_tests.py: 41% coverage — this is a pre-existing baseline; the file requires full project/filesystem setup to exercise most paths. No coverage regression from this change.


Last updated: 2026-03-15

@KRRT7 KRRT7 merged commit 6510ebe into unstructured-inference Mar 15, 2026
26 of 27 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1660-2026-03-15T01.04.08 branch March 15, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant