perf: optimize Java tracing agent (E2E reduction + serialization + writes) by KRRT7 · Pull Request #2058 · codeflash-ai/codeflash

KRRT7 · 2026-04-10T08:04:58Z

Summary

Serialization optimizations: ThreadLocal Kryo Output buffer reuse, inline fast-path for safe arg types, skip verification roundtrip for known-safe containers
Write optimizations: Batched SQLite inserts (256/txn), ArrayBlockingQueue, permanent autocommit-off, opt-in in-memory SQLite with VACUUM INTO at shutdown (enabled in CI via is_ci())
Profiling fixture: New ProfilingWorkload.java for benchmarking the tracer

Benchmark (50k captures)

Metric	Before	After	Improvement
onEntry	~5200ms	~1200ms	4.3x faster
avg/capture	0.43ms	0.02ms	21x faster
writes	~3200ms	~900ms	3.5x faster
dump to disk	N/A	~190ms	one-time cost

Test plan

245 Maven unit tests pass
8/8 E2E tests pass (disk-backed mode, local)
8/8 E2E tests pass (in-memory mode, simulating CI)
CI passes

Remove filterEvens and instanceMethod from the Workload fixture (4→2 functions) and reduce main() loop from 1000→100 rounds. The E2E test only needs to verify the tracer→optimizer pipeline works end-to-end; it doesn't need 4 functions or 1604 replay tests to prove that. Expected impact: ~2 functions × ~8 candidates × fewer replay tests should bring the job from ~75 min down to ~10-15 min.

The fixture directory wasn't in the path filter, so changes to Workload.java didn't trigger the java E2E tests.

Remove assertions for filterEvens and instanceMethod which were removed from the Workload fixture. Adjust expected invocation counts accordingly.

Drop repeatString from the Workload fixture (2→1 function). computeSum alone exercises the full tracer→optimizer pipeline (trace → replay tests → optimize → evaluate → rank → explain → review). The second function added no additional pipeline coverage.

- Reuse ThreadLocal Kryo Output buffers (eliminates #1 allocation hotspot) - Fast-path inline serialization for safe arg types (bypasses executor) - Skip verification roundtrip for known-safe containers (ArrayList, HashMap, etc.) - Batch SQLite inserts (256/txn) with permanent autocommit-off - Switch to ArrayBlockingQueue (no per-element Node allocation) - Add opt-in in-memory SQLite mode (VACUUM INTO at shutdown), enabled in CI - Add timing instrumentation (onEntry, serialization, writes, dump) - Add ProfilingWorkload fixture for benchmarking Benchmark (50k captures): onEntry 5200ms→1200ms (4.3x), avg/capture 0.43ms→0.02ms (21x), writes 3200ms→900ms (3.5x) with in-memory mode.

repeatString was removed from Workload.java in the E2E reduction.

Strip AtomicLong accumulators, System.nanoTime() timing, and getTimingSummary() that were added for profiling. No functional change.

The jdk.ExecutionSample#period=1ms syntax in -XX:StartFlightRecording is only supported on JDK 13+. On JDK 11 (CI), it causes "Failure when starting JFR on_create_vm_2" and no JFR file is created. The settings=profile preset still provides 10ms CPU sampling.

Temporary instrumentation to debug flaky futurehouse E2E test. Logs matched/skipped/timed-out counts and did_all_timeout state.

Set PYTHONHASHSEED=0 in test subprocess environments so original and candidate runs use identical hash behavior, eliminating a source of non-deterministic return-value comparisons. Also upgrade diff logging from debug to info level with actual types and repr values for DID_PASS, RETURN_VALUE, and STDOUT diffs.

JavaSupport.ensure_runtime_environment() was never called during the optimization flow, so _language_version stayed None and the backend received language_version=null. The LLM had no Java version constraint, causing it to generate Java 16+ APIs (e.g. Stream.toList()) for Java 11 projects.

Make TOTAL_LOOPING_TIME configurable via CODEFLASH_LOOPING_TIME env var (defaults to 10s). Set to 5s in Java E2E CI jobs to cut verification time per candidate. Also cache the codeflash-runtime JAR keyed on source hash to skip mvn install when unchanged.

Combine JFR profiling and argument capture agent into one JAVA_TOOL_OPTIONS string, running the target program once instead of twice. JFR and javaagent are orthogonal JVM features that coexist without conflict. Keeps build_jfr_env/build_agent_env for standalone use.

…ted on Java 11)

…patch

Keep only repeatString which reliably produces 284% improvement. Drop computeSum (marginal 16%), filterEvens and instanceMethod (no optimization found). Reduces tracer E2E from ~1h27m to ~21m.

The Workload.java fixture was trimmed to only repeatString but test files still asserted computeSum, filterEvens, and instanceMethod.

KRRT7 added 2 commits April 10, 2026 03:04

ci: add java_tracer_e2e fixture path to e2e_java change detection

21f61ec

The fixture directory wasn't in the path filter, so changes to Workload.java didn't trigger the java E2E tests.

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Apr 10, 2026

KRRT7 added 3 commits April 10, 2026 03:17

fix: update java tracer unit tests for reduced Workload fixture

46957e1

Remove assertions for filterEvens and instanceMethod which were removed from the Workload fixture. Adjust expected invocation counts accordingly.

KRRT7 changed the title ~~perf: reduce java-tracer E2E from ~75 min to ~15 min~~ perf: optimize Java tracing agent (E2E reduction + serialization + writes) Apr 10, 2026

KRRT7 added 13 commits April 10, 2026 05:05

fix: remove stale repeatString assertions from integration tests

e81f25f

repeatString was removed from Workload.java in the E2E reduction.

flexing

01e2215

Remove debug timing instrumentation from tracer

bfe6f3a

Strip AtomicLong accumulators, System.nanoTime() timing, and getTimingSummary() that were added for profiling. No functional change.

chore: add diagnostic logging to compare_test_results

e191f74

Temporary instrumentation to debug flaky futurehouse E2E test. Logs matched/skipped/timed-out counts and did_all_timeout state.

chore: remove diagnostic logging from compare_test_results

82ec301

chore: replace console.print with logger.info for Java project detection

b05561e

perf: use --effort low for java-tracer E2E to reduce CI time

151df77

fix: drop jdk.ExecutionSample#period from combined JFR opts (unsuppor…

013c83f

…ted on Java 11)

KRRT7 had a problem deploying to external-trusted-contributors April 10, 2026 17:55 — with GitHub Actions Error

KRRT7 deployed to external-trusted-contributors April 10, 2026 17:55 — with GitHub Actions Active

KRRT7 had a problem deploying to external-trusted-contributors April 10, 2026 17:55 — with GitHub Actions Error

KRRT7 added 5 commits April 10, 2026 12:58

fix: skip environment approval gate for trusted users on workflow_dis…

cb87763

…patch

ci: add standalone Java E2E workflow for isolated testing

40f16b5

perf: trim tracer E2E workload to single function (repeatString)

5c778df

Keep only repeatString which reliably produces 284% improvement. Drop computeSum (marginal 16%), filterEvens and instanceMethod (no optimization found). Reduces tracer E2E from ~1h27m to ~21m.

fix: add --no-pr to codeflash optimize workflow to prevent CI-opened PRs

0cb67c1

fix: update test assertions to match simplified Workload fixture

b737f71

The Workload.java fixture was trimmed to only repeatString but test files still asserted computeSum, filterEvens, and instanceMethod.

KRRT7 enabled auto-merge April 10, 2026 21:22

KRRT7 merged commit 819a56c into main Apr 10, 2026
30 of 31 checks passed

KRRT7 deleted the perf/reduce-java-tracer-e2e branch April 10, 2026 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize Java tracing agent (E2E reduction + serialization + writes)#2058

perf: optimize Java tracing agent (E2E reduction + serialization + writes)#2058
KRRT7 merged 23 commits intomainfrom
perf/reduce-java-tracer-e2e

KRRT7 commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (50k captures)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KRRT7 commented Apr 10, 2026 •

edited

Loading