Skip to content

perf: cache jedi project, batch test cache writes, fix Windows relative_to bug#1852

Merged
KRRT7 merged 11 commits intomainfrom
cf-1846-port-perf-improvements
Mar 18, 2026
Merged

perf: cache jedi project, batch test cache writes, fix Windows relative_to bug#1852
KRRT7 merged 11 commits intomainfrom
cf-1846-port-perf-improvements

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Mar 17, 2026

Summary

Ports the valuable improvements from #1846 that remain applicable after #1660 was merged:

  • Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
  • Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 path bug)
  • Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
  • Batch TestsCache writes with flush() + executemany instead of per-row commit
  • Gracefully disable cache writes on sqlite3.OperationalError
  • Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
  • Derive all_defs from all_names via is_definition() to save a redundant Jedi call

Thanks to @aseembits93 for the original work in #1846 — this PR carries forward the pieces that are still applicable after the #1660 architecture changes.

Test plan

  • pytest tests/test_code_context_extractor.py — 104 passed
  • pytest tests/test_unit_test_discovery.py — 57 passed

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @KRRT7's task in 3s —— View job


PR Review Summary

  • Triage
  • Lint and type check
  • Resolve stale threads
  • Code review
  • Duplicate detection
  • Test coverage

Prek Checks

ruff check and ruff format — both passed with no issues.

Mypy: One new error introduced by this PR was fixed and pushed (7cf183e7):

code_extractor.py:627Item "None" of Module | None has no attribute "visit" — The restructuring that hoisted parsed_dst_module = None to the top of the function broke mypy's type narrowing. Fixed by removing the None pre-assignment and adding else: parsed_dst_module = dst_module_code in the second branch, so mypy can see the variable is always cst.Module before .visit().

All remaining mypy errors in the changed files are pre-existing (present on main).


Code Review

code_context_extractor.py

  • @cache on get_jedi_project: Correct — str is hashable. Effective caching across 5+ call sites. Minor note: if callers normalize project_root_path differently (trailing slash, symlinks), cache misses will occur silently. Current code consistently uses str(project_root_path) so this is fine in practice.

  • relative_to() try/except (get_code_optimization_context): Good fix for Windows 8.3 short-path names. The ValueError is the correct exception for relative_to() when paths don't share a root.

  • jedi_project: object | None = None parameter type: Using object as the parameter type is technically correct (mypy accepts Anyobject), but callers can't rely on any attributes of the value. Since get_jedi_project returns Any, this works at runtime. Acceptable for an opaque external type.

  • Pre-grouping in get_function_sources_from_jedi: This is a semantic change worth noting. The old code used belongs_to_function_qualified(ref, qn) which matched any ref whose full_name starts with qn + "." (including refs inside nested closures/lambdas). The new code groups by parent().full_name == qn — only immediate-parent matches. This is more precise: refs inside nested functions inside the target function are no longer included. Since nested functions are themselves analyzed, their helpers will be found via their own entries. This is an improvement in precision, not a regression.

  • Self-reference exclusion (ref_qn == parent_qn check): Correct — prevents recursive calls from being treated as external helpers.

  • all_defs derivation: [name for name in all_names if name.is_definition()] is semantically equivalent to script.get_names(all_scopes=True, definitions=True) — since all_names already includes definitions (Jedi's definitions parameter defaults to True). Saves one Jedi call per file. ✓

discover_unit_tests.py

  • TestsCache.flush() with executemany: Clean improvement. The batch is flushed per test file (inside the loop), which is a good balance between memory usage and write frequency.

  • Graceful writes_enabled disable on OperationalError: Good defensive programming. Note: if the error is transient, writes stay disabled for the rest of the session. For a CLI tool this is acceptable.

  • functions_to_optimize_by_name O(1) lookup: Clean refactor. The original O(n) linear scan over functions_to_optimize per unresolved name was a real inefficiency.

  • One potential issue — flush() on exception path: If an exception is raised mid-way through the test file loop (e.g., UnicodeDecodeError at line 1098), the try/except at line 1085 catches it and continues, but flush() is only called at the end of the with block (line 1110). If the process is interrupted (e.g., KeyboardInterrupt), unflushed rows will be lost. This is a pre-existing issue (the old code committed per-row so it wouldn't lose data on interruption). Not introduced by this PR but worth noting.

code_extractor.py

  • Early-exit for empty imports: Both early-exit paths (not has_module_level_imports and no gathered imports) correctly return dst_code_fallback. ✓
  • lstrip("\n") on Module.code for the fallback: The comment explains this aligns the fallback with the transformed_module.code.lstrip("\n") result. This is a subtle behavior change from the original (dst_module_code.code without stripping), but it was introduced intentionally to maintain consistency. ✓

Duplicate Detection

No duplicates detected. get_jedi_project, flush(), _maybe_strip_docstring (consolidated from two duplicate definitions in prior commits), and functions_to_optimize_by_name are all novel additions or consolidations.


Test Coverage

All relevant tests pass:

  • tests/test_add_needed_imports_from_module.py — 10/10 passed
  • tests/test_unit_test_discovery.py — 57/57 passed

Coverage for changed files (against these test suites):

File Coverage
discover_unit_tests.py 12%
code_context_extractor.py 12%
code_extractor.py 33%

Low coverage is expected — these files depend on live Jedi analysis, SQLite, and API keys for most paths. The tests that can run without external dependencies all pass.


Fix committed: 7cf183e7 — mypy parsed_dst_module None narrowing
| Branch

…ve_to bug

Port valuable improvements from #1846 that remain applicable after #1660:
- Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
- Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths)
- Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
- Batch TestsCache writes with flush() + executemany instead of per-row commit
- Gracefully disable cache writes on sqlite3.OperationalError
- Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
- Derive all_defs from all_names via is_definition() to save a redundant Jedi call
@KRRT7 KRRT7 force-pushed the cf-1846-port-perf-improvements branch from 27c8226 to 5060c08 Compare March 17, 2026 06:26
github-actions bot and others added 2 commits March 17, 2026 06:33
The optimization added early-exit logic to `add_needed_imports_from_module` that checks whether the source module contains any module-level imports before invoking the heavyweight `GatherImportsVisitor` and downstream import-merging machinery. In the common case where a pruned module has no imports (or only nested ones inside functions), line profiling showed the gatherer and two `AddImportsVisitor`/`RemoveImportsVisitor` transforms consumed 36% of original runtime; the early exit skips all three, falling back to the destination code immediately. A second early exit after gathering verifies the visitor actually collected imports, avoiding redundant CST transformations when the source is import-free. Combined, these checks eliminate ~99% of the work when imports are absent, yielding an 18222% speedup with no semantic change because the fallback path always returned the correct destination code.
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 17, 2026

⚡️ Codeflash found optimizations for this PR

📄 18,223% (182.23x) speedup for re_extract_from_cache in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 634 milliseconds 3.46 milliseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch cf-1846-port-perf-improvements).

Static Badge

…2026-03-17T06.54.17

⚡️ Speed up function `re_extract_from_cache` by 18,223% in PR #1852 (`cf-1846-port-perf-improvements`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 17, 2026

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

@KRRT7
Copy link
Copy Markdown
Collaborator Author

KRRT7 commented Mar 17, 2026

@claude fix failing CI

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @KRRT7's task in 3m 34s —— View job


Fixed Failing CI

  • Gather context
  • Check CI status and identify failures
  • Fix the issues
  • Run prek to verify
  • Commit and push

Root Cause

The mypy CI failure was a [no-redef] error in code_context_extractor.py:

codeflash/languages/python/context/code_context_extractor.py:1775:1: error:
Name "_maybe_strip_docstring" already defined on line 1595  [no-redef]

_maybe_strip_docstring was defined twice — once at line 1595 (simple version, delegating to remove_docstring_from_body) and again at line 1775 (more complete version added by this PR, handling ConcatenatedString and empty body edge cases).

Fix

Removed the first (weaker) definition at lines 1586–1598, along with its now-unused remove_docstring_from_body helper. The better definition at line 1761 (formerly 1775) is kept.

uv run mypy now reports: Success: no issues found in 1 source file
| Branch

github-actions bot and others added 4 commits March 17, 2026 08:14
…no-redef error

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
…imports_from_module

The generator expression yielded `isinstance(stmt, ...)` where `stmt` was
already filtered to be `cst.SimpleStatementLine`, so it always returned False.
This caused add_needed_imports_from_module to always skip adding imports,
breaking 11 tests in test_code_context_extractor.py.

Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>
…n flat output

- Replace  with
   in
  _maybe_strip_docstring — the | syntax for runtime isinstance is Python 3.10+
  only and caused TypeError on 3.9
- Strip leading newlines from cst.Module.code in add_needed_imports_from_module
  fallback path so early-exit returns are consistent with the normal
  transformed_module.code.lstrip("\n") path, fixing extra blank lines after
  file headers in read_writable_code.flat output

Co-Authored-By: Oz <oz-agent@warp.dev>
@KRRT7
Copy link
Copy Markdown
Collaborator Author

KRRT7 commented Mar 17, 2026

@claude work on the identified issues in #1852 (comment)

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @KRRT7's task in 1m 36s —— View job


  • Gather context and read relevant files
  • Fix type narrowing bug in add_needed_imports_from_module (code_extractor.py:619)
  • Fix silent .lstrip("\n") behavior change in fallback (code_extractor.py:561)
  • Fix get_jedi_project return type annotation (code_context_extractor.py:52)
  • Run prek to verify
  • Commit and push

Changes Made

code_extractor.py

  1. Restored fallback behavior (line 561): Removed the silent .lstrip("\n") on the cst.Module fallback path — the original code used dst_module_code.code without stripping, so this restores that behavior for the three early-return paths.

  2. Fixed type narrowing (line 619): Replaced if parsed_dst_module is None: with if isinstance(dst_module_code, str): — mypy can now correctly narrow dst_module_code to str for the cst.parse_module(dst_module_code) call, eliminating the mypy error.

code_context_extractor.py

  1. Improved return type (line 52): Changed get_jedi_project return annotation from object to Any (added Any to the typing import). This is more accurate since object implies a restrictive type that doesn't allow attribute access on the result.

Committed as ba7feb07 — all prek checks pass.
| Branch

github-actions bot and others added 3 commits March 17, 2026 21:38
…mports_from_module

- Remove silent .lstrip("\n") on fallback for cst.Module input (restores original behavior)
- Replace `parsed_dst_module is None` with `isinstance(dst_module_code, str)` for correct mypy narrowing
- Change get_jedi_project return type from object to Any for accuracy

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wing

The early `parsed_dst_module = None` assignment broke mypy's type narrowing.
Add explicit `else: parsed_dst_module = dst_module_code` so mypy sees the
variable is always a `cst.Module` before the `.visit()` call.

Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>
@KRRT7 KRRT7 merged commit 948bfed into main Mar 18, 2026
25 of 27 checks passed
@KRRT7 KRRT7 deleted the cf-1846-port-perf-improvements branch March 18, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant