perf: cache jedi project, batch test cache writes, fix Windows relative_to bug by KRRT7 · Pull Request #1852 · codeflash-ai/codeflash

KRRT7 · 2026-03-17T05:44:39Z

Summary

Ports the valuable improvements from #1846 that remain applicable after #1660 was merged:

Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 path bug)
Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
Batch TestsCache writes with flush() + executemany instead of per-row commit
Gracefully disable cache writes on sqlite3.OperationalError
Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
Derive all_defs from all_names via is_definition() to save a redundant Jedi call

Thanks to @aseembits93 for the original work in #1846 — this PR carries forward the pieces that are still applicable after the #1660 architecture changes.

Test plan

pytest tests/test_code_context_extractor.py — 104 passed
pytest tests/test_unit_test_discovery.py — 57 passed

claude · 2026-03-17T05:46:16Z

Claude finished @KRRT7's task in 3s —— View job

PR Review Summary

Prek Checks

ruff check and ruff format — both passed with no issues.

Mypy: One new error introduced by this PR was fixed and pushed (7cf183e7):

code_extractor.py:627 — Item "None" of Module | None has no attribute "visit" — The restructuring that hoisted parsed_dst_module = None to the top of the function broke mypy's type narrowing. Fixed by removing the None pre-assignment and adding else: parsed_dst_module = dst_module_code in the second branch, so mypy can see the variable is always cst.Module before .visit().

All remaining mypy errors in the changed files are pre-existing (present on main).

Code Review

code_context_extractor.py

@cache on get_jedi_project: Correct — str is hashable. Effective caching across 5+ call sites. Minor note: if callers normalize project_root_path differently (trailing slash, symlinks), cache misses will occur silently. Current code consistently uses str(project_root_path) so this is fine in practice.
relative_to() try/except (get_code_optimization_context): Good fix for Windows 8.3 short-path names. The ValueError is the correct exception for relative_to() when paths don't share a root.
jedi_project: object | None = None parameter type: Using object as the parameter type is technically correct (mypy accepts Any → object), but callers can't rely on any attributes of the value. Since get_jedi_project returns Any, this works at runtime. Acceptable for an opaque external type.
Pre-grouping in get_function_sources_from_jedi: This is a semantic change worth noting. The old code used belongs_to_function_qualified(ref, qn) which matched any ref whose full_name starts with qn + "." (including refs inside nested closures/lambdas). The new code groups by parent().full_name == qn — only immediate-parent matches. This is more precise: refs inside nested functions inside the target function are no longer included. Since nested functions are themselves analyzed, their helpers will be found via their own entries. This is an improvement in precision, not a regression.
Self-reference exclusion (ref_qn == parent_qn check): Correct — prevents recursive calls from being treated as external helpers.
all_defs derivation: [name for name in all_names if name.is_definition()] is semantically equivalent to script.get_names(all_scopes=True, definitions=True) — since all_names already includes definitions (Jedi's definitions parameter defaults to True). Saves one Jedi call per file. ✓

discover_unit_tests.py

TestsCache.flush() with executemany: Clean improvement. The batch is flushed per test file (inside the loop), which is a good balance between memory usage and write frequency.
Graceful writes_enabled disable on OperationalError: Good defensive programming. Note: if the error is transient, writes stay disabled for the rest of the session. For a CLI tool this is acceptable.
functions_to_optimize_by_name O(1) lookup: Clean refactor. The original O(n) linear scan over functions_to_optimize per unresolved name was a real inefficiency.
One potential issue — flush() on exception path: If an exception is raised mid-way through the test file loop (e.g., UnicodeDecodeError at line 1098), the try/except at line 1085 catches it and continues, but flush() is only called at the end of the with block (line 1110). If the process is interrupted (e.g., KeyboardInterrupt), unflushed rows will be lost. This is a pre-existing issue (the old code committed per-row so it wouldn't lose data on interruption). Not introduced by this PR but worth noting.

code_extractor.py

Early-exit for empty imports: Both early-exit paths (not has_module_level_imports and no gathered imports) correctly return dst_code_fallback. ✓
lstrip("\n") on Module.code for the fallback: The comment explains this aligns the fallback with the transformed_module.code.lstrip("\n") result. This is a subtle behavior change from the original (dst_module_code.code without stripping), but it was introduced intentionally to maintain consistency. ✓

Duplicate Detection

No duplicates detected. get_jedi_project, flush(), _maybe_strip_docstring (consolidated from two duplicate definitions in prior commits), and functions_to_optimize_by_name are all novel additions or consolidations.

Test Coverage

All relevant tests pass:

tests/test_add_needed_imports_from_module.py — 10/10 passed
tests/test_unit_test_discovery.py — 57/57 passed

Coverage for changed files (against these test suites):

File	Coverage
`discover_unit_tests.py`	12%
`code_context_extractor.py`	12%
`code_extractor.py`	33%

Low coverage is expected — these files depend on live Jedi analysis, SQLite, and API keys for most paths. The tests that can run without external dependencies all pass.

Fix committed: 7cf183e7 — mypy parsed_dst_module None narrowing
| Branch

…ve_to bug Port valuable improvements from #1846 that remain applicable after #1660: - Cache jedi.Project instances via @cache to avoid recreating across 5 call sites - Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths) - Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup - Batch TestsCache writes with flush() + executemany instead of per-row commit - Gracefully disable cache writes on sqlite3.OperationalError - Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files - Derive all_defs from all_names via is_definition() to save a redundant Jedi call

The optimization added early-exit logic to `add_needed_imports_from_module` that checks whether the source module contains any module-level imports before invoking the heavyweight `GatherImportsVisitor` and downstream import-merging machinery. In the common case where a pruned module has no imports (or only nested ones inside functions), line profiling showed the gatherer and two `AddImportsVisitor`/`RemoveImportsVisitor` transforms consumed 36% of original runtime; the early exit skips all three, falling back to the destination code immediately. A second early exit after gathering verifies the visitor actually collected imports, avoiding redundant CST transformations when the source is import-free. Combined, these checks eliminate ~99% of the work when imports are absent, yielding an 18222% speedup with no semantic change because the fallback path always returned the correct destination code.

codeflash-ai · 2026-03-17T06:54:27Z

⚡️ Codeflash found optimizations for this PR

📄 18,223% (182.23x) speedup for `re_extract_from_cache` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 634 milliseconds → 3.46 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function re_extract_from_cache by 18,223% in PR #1852 (cf-1846-port-perf-improvements) #1853

If you approve, it will be merged into this PR (branch cf-1846-port-perf-improvements).

…2026-03-17T06.54.17 ⚡️ Speed up function `re_extract_from_cache` by 18,223% in PR #1852 (`cf-1846-port-perf-improvements`)

codeflash-ai · 2026-03-17T07:49:06Z

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

⚡️ Speed up function re_extract_from_cache by 18,223% in PR #1852 (cf-1846-port-perf-improvements) #1853

KRRT7 · 2026-03-17T08:10:46Z

@claude fix failing CI

claude · 2026-03-17T08:11:18Z

Claude finished @KRRT7's task in 3m 34s —— View job

Fixed Failing CI

Root Cause

The mypy CI failure was a [no-redef] error in code_context_extractor.py:

codeflash/languages/python/context/code_context_extractor.py:1775:1: error:
Name "_maybe_strip_docstring" already defined on line 1595  [no-redef]

_maybe_strip_docstring was defined twice — once at line 1595 (simple version, delegating to remove_docstring_from_body) and again at line 1775 (more complete version added by this PR, handling ConcatenatedString and empty body edge cases).

Fix

Removed the first (weaker) definition at lines 1586–1598, along with its now-unused remove_docstring_from_body helper. The better definition at line 1761 (formerly 1775) is kept.

uv run mypy now reports: Success: no issues found in 1 source file
| Branch

…no-redef error Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

…imports_from_module The generator expression yielded `isinstance(stmt, ...)` where `stmt` was already filtered to be `cst.SimpleStatementLine`, so it always returned False. This caused add_needed_imports_from_module to always skip adding imports, breaking 11 tests in test_code_context_extractor.py. Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>

…n flat output - Replace with in _maybe_strip_docstring — the | syntax for runtime isinstance is Python 3.10+ only and caused TypeError on 3.9 - Strip leading newlines from cst.Module.code in add_needed_imports_from_module fallback path so early-exit returns are consistent with the normal transformed_module.code.lstrip("\n") path, fixing extra blank lines after file headers in read_writable_code.flat output Co-Authored-By: Oz <oz-agent@warp.dev>

KRRT7 · 2026-03-17T21:36:01Z

@claude work on the identified issues in #1852 (comment)

claude · 2026-03-17T21:36:35Z

Claude finished @KRRT7's task in 1m 36s —— View job

Gather context and read relevant files
Fix type narrowing bug in add_needed_imports_from_module (code_extractor.py:619)
Fix silent .lstrip("\n") behavior change in fallback (code_extractor.py:561)
Fix get_jedi_project return type annotation (code_context_extractor.py:52)
Run prek to verify
Commit and push

Changes Made

code_extractor.py

Restored fallback behavior (line 561): Removed the silent .lstrip("\n") on the cst.Module fallback path — the original code used dst_module_code.code without stripping, so this restores that behavior for the three early-return paths.
Fixed type narrowing (line 619): Replaced if parsed_dst_module is None: with if isinstance(dst_module_code, str): — mypy can now correctly narrow dst_module_code to str for the cst.parse_module(dst_module_code) call, eliminating the mypy error.

code_context_extractor.py

Improved return type (line 52): Changed get_jedi_project return annotation from object to Any (added Any to the typing import). This is more accurate since object implies a restrictive type that doesn't allow attribute access on the result.

Committed as ba7feb07 — all prek checks pass.
| Branch

…mports_from_module - Remove silent .lstrip("\n") on fallback for cst.Module input (restores original behavior) - Replace `parsed_dst_module is None` with `isinstance(dst_module_code, str)` for correct mypy narrowing - Change get_jedi_project return type from object to Any for accuracy Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…wing The early `parsed_dst_module = None` assignment broke mypy's type narrowing. Add explicit `else: parsed_dst_module = dst_module_code` so mypy sees the variable is always a `cst.Module` before the `.visit()` call. Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>

KRRT7 force-pushed the cf-1846-port-perf-improvements branch from 27c8226 to 5060c08 Compare March 17, 2026 06:26

github-actions bot and others added 2 commits March 17, 2026 06:33

fix: correct pending_rows type annotation to int | TestType

9f20cdf

codeflash-ai bot mentioned this pull request Mar 17, 2026

⚡️ Speed up function re_extract_from_cache by 18,223% in PR #1852 (cf-1846-port-perf-improvements) #1853

Merged

Merge pull request #1853 from codeflash-ai/codeflash/optimize-pr1852-…

e820e11

…2026-03-17T06.54.17 ⚡️ Speed up function `re_extract_from_cache` by 18,223% in PR #1852 (`cf-1846-port-perf-improvements`)

github-actions bot and others added 4 commits March 17, 2026 08:14

fix: remove duplicate _maybe_strip_docstring definition causing mypy …

dc8548f

…no-redef error Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

fix: remove extra blank line to pass ruff format

7e675e4

github-actions bot and others added 3 commits March 17, 2026 21:38

fix: normalize module fallback formatting for import merge

bf9adf2

KRRT7 merged commit 948bfed into main Mar 18, 2026
25 of 27 checks passed

KRRT7 deleted the cf-1846-port-perf-improvements branch March 18, 2026 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache jedi project, batch test cache writes, fix Windows relative_to bug#1852

perf: cache jedi project, batch test cache writes, fix Windows relative_to bug#1852
KRRT7 merged 11 commits intomainfrom
cf-1846-port-perf-improvements

KRRT7 commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

codeflash-ai bot commented Mar 17, 2026

⚡️ Speed up function `re_extract_from_cache` by 18,223% in PR #1852 (`cf-1846-port-perf-improvements`) #1853

Uh oh!

codeflash-ai bot commented Mar 17, 2026

Uh oh!

KRRT7 commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

KRRT7 commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Mar 17, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

codeflash-ai bot commented Mar 17, 2026

⚡️ Codeflash found optimizations for this PR

📄 18,223% (182.23x) speedup for re_extract_from_cache in codeflash/languages/python/context/code_context_extractor.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function re_extract_from_cache by 18,223% in PR #1852 (cf-1846-port-perf-improvements) #1853

Uh oh!

codeflash-ai bot commented Mar 17, 2026

Uh oh!

KRRT7 commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed Failing CI

Root Cause

Fix

Uh oh!

KRRT7 commented Mar 17, 2026

Uh oh!

claude bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 17, 2026 •

edited

Loading

📄 18,223% (182.23x) speedup for `re_extract_from_cache` in `codeflash/languages/python/context/code_context_extractor.py`

⚡️ Speed up function `re_extract_from_cache` by 18,223% in PR #1852 (`cf-1846-port-perf-improvements`) #1853

claude bot commented Mar 17, 2026 •

edited

Loading

claude bot commented Mar 17, 2026 •

edited

Loading