perf: optimize Java tracing agent (E2E reduction + serialization + writes)#2058
Merged
perf: optimize Java tracing agent (E2E reduction + serialization + writes)#2058
Conversation
Remove filterEvens and instanceMethod from the Workload fixture (4→2 functions) and reduce main() loop from 1000→100 rounds. The E2E test only needs to verify the tracer→optimizer pipeline works end-to-end; it doesn't need 4 functions or 1604 replay tests to prove that. Expected impact: ~2 functions × ~8 candidates × fewer replay tests should bring the job from ~75 min down to ~10-15 min.
The fixture directory wasn't in the path filter, so changes to Workload.java didn't trigger the java E2E tests.
Remove assertions for filterEvens and instanceMethod which were removed from the Workload fixture. Adjust expected invocation counts accordingly.
Drop repeatString from the Workload fixture (2→1 function). computeSum alone exercises the full tracer→optimizer pipeline (trace → replay tests → optimize → evaluate → rank → explain → review). The second function added no additional pipeline coverage.
- Reuse ThreadLocal Kryo Output buffers (eliminates #1 allocation hotspot) - Fast-path inline serialization for safe arg types (bypasses executor) - Skip verification roundtrip for known-safe containers (ArrayList, HashMap, etc.) - Batch SQLite inserts (256/txn) with permanent autocommit-off - Switch to ArrayBlockingQueue (no per-element Node allocation) - Add opt-in in-memory SQLite mode (VACUUM INTO at shutdown), enabled in CI - Add timing instrumentation (onEntry, serialization, writes, dump) - Add ProfilingWorkload fixture for benchmarking Benchmark (50k captures): onEntry 5200ms→1200ms (4.3x), avg/capture 0.43ms→0.02ms (21x), writes 3200ms→900ms (3.5x) with in-memory mode.
repeatString was removed from Workload.java in the E2E reduction.
Strip AtomicLong accumulators, System.nanoTime() timing, and getTimingSummary() that were added for profiling. No functional change.
The jdk.ExecutionSample#period=1ms syntax in -XX:StartFlightRecording is only supported on JDK 13+. On JDK 11 (CI), it causes "Failure when starting JFR on_create_vm_2" and no JFR file is created. The settings=profile preset still provides 10ms CPU sampling.
Temporary instrumentation to debug flaky futurehouse E2E test. Logs matched/skipped/timed-out counts and did_all_timeout state.
Set PYTHONHASHSEED=0 in test subprocess environments so original and candidate runs use identical hash behavior, eliminating a source of non-deterministic return-value comparisons. Also upgrade diff logging from debug to info level with actual types and repr values for DID_PASS, RETURN_VALUE, and STDOUT diffs.
JavaSupport.ensure_runtime_environment() was never called during the optimization flow, so _language_version stayed None and the backend received language_version=null. The LLM had no Java version constraint, causing it to generate Java 16+ APIs (e.g. Stream.toList()) for Java 11 projects.
Make TOTAL_LOOPING_TIME configurable via CODEFLASH_LOOPING_TIME env var (defaults to 10s). Set to 5s in Java E2E CI jobs to cut verification time per candidate. Also cache the codeflash-runtime JAR keyed on source hash to skip mvn install when unchanged.
Combine JFR profiling and argument capture agent into one JAVA_TOOL_OPTIONS string, running the target program once instead of twice. JFR and javaagent are orthogonal JVM features that coexist without conflict. Keeps build_jfr_env/build_agent_env for standalone use.
Keep only repeatString which reliably produces 284% improvement. Drop computeSum (marginal 16%), filterEvens and instanceMethod (no optimization found). Reduces tracer E2E from ~1h27m to ~21m.
The Workload.java fixture was trimmed to only repeatString but test files still asserted computeSum, filterEvens, and instanceMethod.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_ci())ProfilingWorkload.javafor benchmarking the tracerBenchmark (50k captures)
Test plan