feat(cli): Add standalone --server and --client modes#110
feat(cli): Add standalone --server and --client modes#110
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
dustinblack
left a comment
There was a problem hiding this comment.
Good first step toward #11. The scope is right — standalone server/client as independent processes without touching transport internals. A few things to think about as this develops:
-
No integration with existing metrics/reporting. The standalone client does its own ad-hoc stats via
info!logging (mean/P50/P95/P99) rather than usingResultsManager/MetricsCollector. This means no streaming output, no JSON/CSV, no integration with the reporting pipeline or dashboard. Fine for an initial draft, but worth planning how to close that gap. -
Blocking/async code duplication. The blocking and async server/client paths are nearly identical except for
await. Consider whether a shared structure or helper could reduce the duplication — the current pattern will be a maintenance burden as features are added. -
Two server modes.
--server(new, user-facing) and--internal-run-as-server(existing, hidden) now coexist. The distinction is clear in code but worth documenting so users don't stumble into the internal flag, and so future contributors understand when to use which. -
Duration mode not supported. The standalone client only supports message-count mode (
-i). If someone passes-d, it will silently fall back to the default message count. Should either support it or reject it with a clear error.
Overall the approach is clean and well-tested. Nice to see 5 integration tests covering round-trip, one-way, ping/pong, retry, and shutdown.
This comment was marked as outdated.
This comment was marked as outdated.
|
Empty message count panic — If -i 0 or similar produces zero messages, the round-trip percentile computation will panic on an empty vector (latencies[msg_count / 2]). -i 0 can panic in round-trip summary due to divide/index operations on empty latency vectors. main.rs: Ignored config options — concurrency, send_delay, include_first_message, percentiles, streaming/output_file are all silently ignored. Should either be supported or rejected. No shutdown message — Client relies on transport close for server to detect disconnect. The internal server path uses explicit MessageType::Shutdown which is more deterministic. main.rs: Duration mode is not honored in standalone client paths: When -d 5s is passed, BenchmarkConfig::from_args sets msg_count = None (duration takes precedence). The standalone client then falls back to the default count via unwrap_or. The config.duration field is never read anywhere in the standalone paths. So --client -d 5s -m tcp --blocking silently runs the default message count instead of a 5-second timed test. Heap allocation per message in measurement loop payload.clone() allocates a new Vec on every iteration. Per the project's own guidelines ("No allocations in measurement loops"), this adds allocation overhead to every measurement sample. The same pattern appears in the round-trip loop and both async variants. This will skew latency numbers, particularly for small messages where allocation cost is proportionally large. Standalone client bypasses existing reporting pipeline (ResultsManager/streaming JSON/CSV), unlike established benchmark flows Test coverage claim mismatch: this branch diff does not add standalone-specific CLI/integration tests; new flags/paths need explicit coverage. When I asked to compare against C2C branch: |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
f382afc to
93bfac7
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
This comment was marked as outdated.
This comment was marked as outdated.
Normal vs Standalone Benchmark ComparisonManual comparison of benchmark results between normal mode (main branch) and standalone client/server mode (this PR). All tests run locally on the same machine with 1000 warmup iterations. TCP Round-Trip (10000 msgs, 1024 bytes)
UDS Round-Trip (10000 msgs, 1024 bytes)
TCP Large Payload Round-Trip (5000 msgs, 8192 bytes)
TCP One-Way (10000 msgs, 1024 bytes)
Note: Standalone one-way reports throughput only on client side; server reports latency separately. Normal mode measures server-side latency via latency file. Summary
|
PR SummaryThis PR adds standalone Features
Reporting
Transport
Code Quality
Benchmark ComparisonNormal mode vs standalone mode results are within ~10-15% mean latency tolerance, with standalone often showing better tail latency (see comparison tables in PR comments). Deferred ItemsTwo maintainability concerns were raised during review: the standalone logic adding ~2800 lines to |
This comment was marked as outdated.
This comment was marked as outdated.
dustinblack
left a comment
There was a problem hiding this comment.
Good progress on the implementation — CI is green, the benchmark comparison shows comparable performance, and the integration tests for blocking TCP paths are solid.
However, Matt's review from Mar 31 raised several correctness issues that still appear unaddressed. These need to be resolved before this can be approved:
-
Panic on
-i 0:latencies[msg_count / 2]will panic on an empty vector. Need either input validation rejectingmsg_count < 1or a guard before the percentile computation. -
Duration mode silently ignored:
--client -d 5sfalls back to the default message count viaunwrap_or. Theconfig.durationfield is never read in the standalone paths. This should either be supported or rejected with a clear error. -
payload.clone()in measurement loop: Allocates a newVec<u8>on every iteration. For a benchmarking tool, this adds measurable overhead, especially for small messages. The existing benchmark runner avoids this — standalone should too. -
Silently ignored config options:
concurrency,send_delay,include_first_message,percentiles,streaming-output-*, andoutput-fileare all accepted but do nothing. These should either be wired up or rejected so users aren't misled by silent no-ops. -
No shutdown message: Client relies on transport close for the server to detect disconnect. The internal server path uses explicit
MessageType::Shutdownwhich is more reliable. Worth aligning these. -
Coverage at 7.3%: Almost all standalone code is uncovered. The integration tests exercise the transport layer directly but don't go through
run_standalone_server/run_standalone_client. Consider whether the tests added actually cover the code paths being introduced.
|
Thanks for the review! I want to make sure we're looking at the same code — these items were all addressed in commits after mcurrier2's original feedback on March 31. Could you confirm you're reviewing the latest state of the branch (HEAD at Here's where each item was addressed:
Happy to walk through any of these in more detail if needed. |
dustinblack
left a comment
There was a problem hiding this comment.
First — apologies for the previous review. I reviewed against stale code and repeated issues that had already been addressed. That's on me. This review is against the current HEAD (c6f99c72).
The PR has come a long way — duration mode, concurrency, metrics integration, message reuse, shutdown messages, and streaming output are all properly wired up. 36 tests pass. Here are the remaining items:
High: Code duplication
The blocking/async client and server paths are near-identical copies — roughly 8 variants of similar measurement/server loops. Any future bug fix needs replication across all of them. This also ties into the coverage issue below.
High: Coverage and code placement
The 7.3% changed-line coverage is not purely a tarpaulin limitation — it's because ~800 lines of new logic live in main.rs (binary crate) rather than in the library. handle_client_connection, dispatch_server_message, effective_concurrency, build_standalone_transport_config, aggregate_and_print, and the measurement loops could all live in lib.rs or a new module exposed through it. This would:
- Let tarpaulin instrument the code, making the 36 existing tests show up in coverage reports
- Enable integration tests in
tests/to exercise the standalone logic - Reduce the blocking/async duplication by sharing testable core logic
Given that development on this project is heavily AI-assisted, measurable test coverage is especially important as a verification mechanism. I'd like to see the core logic moved to the library crate.
Medium items
-
Server one-way latency includes deserialization.
get_monotonic_time_ns()is called afterreceive_blocking()returns, so bincode deserialization time is included in the measurement. For large payloads this is non-trivial. -
Concurrent mode grace period bug. The 2-second
SERVER_ACCEPT_GRACE_PERIODcould expire between the one-way and round-trip test phases, causing the server to exit mid-test when both tests are enabled with concurrency > 1. -
--quietflag not honored. Standalone paths always init tracing to stderr. -
Mutex
unwrap()in multi-threaded handlers. If a handler thread panics while holding the metrics lock, all other threads cascade-panic on the poisoned mutex. -
--shm-directauto-blocking not applied. The--shm-direct→--blockingauto-enable happens after the standalone dispatch branch, so--client --shm-directwithout--blockinghits the async path incorrectly.
Low items
-
from_stream()bypasses socket buffer tuning that normalstart_server_blockingapplies — could cause measurement differences. -
Integer division drops remainder messages in concurrent mode (
msg_count / concurrency). -
Response messages always have empty payloads — asymmetric round-trip. Worth documenting whether this is intentional.
…ting Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container). Relates to #11. Standalone mode features: - --server flag starts a server that listens for client connections - --client flag connects to a running server with retry logic (100ms backoff, 30s timeout) - Both async (Tokio) and blocking (std) execution modes supported - Duration (-d) and message-count (-i) modes both supported - Default transport endpoints work without extra flags - Endpoint flags (--socket-path, --shared-memory-name, --message-queue-name) promoted to user-facing Reporting integration: - Full ResultsManager/MetricsCollector integration for structured output (JSON, streaming CSV, console summary with HDR percentiles) - Server-side one-way latency measurement using monotonic clock (accurate for same-host and container scenarios) - Round-trip latency with per-message streaming support Code quality: - Shared helpers: dispatch_server_message(), retry constants - 25 tests covering CLI parsing, transport config, server dispatch, connection retry, shutdown, duration mode, one-way, round-trip - Explicit MessageType::Shutdown on client disconnect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --send-delay support: inserts a configurable pause after each message send (blocking uses thread::sleep, async uses tokio::sleep) - Add --include-first-message support: when false (default), sends a canary message before measurement to warm up the connection, matching the existing BenchmarkRunner behavior - Applied to both one-way and round-trip tests in both blocking and async client paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reuse a single Message struct across loop iterations instead of calling Message::new() with payload.clone() on every send. The message id and timestamp are updated in-place before each send. This removes one Vec<u8> heap allocation per message in the measurement loop, reducing allocation overhead that can skew latency results, especially for small messages. Applied to both one-way and round-trip tests in both blocking and async client paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server-side multi-accept: - TCP and UDS servers now accept multiple concurrent connections, spawning a handler thread per client with its own MetricsCollector - Grace period after first client prevents premature server exit - SHM and PMQ fall back to single-client mode with a warning - Server aggregates one-way latency metrics across all handlers Client-side multi-threaded execution: - Blocking client spawns N worker threads, each with its own transport connection, MetricsCollector, and message loop - Async client uses tokio::task::JoinSet for concurrent workers - Results aggregated via MetricsCollector::aggregate_worker_metrics() - Per-message streaming disabled for concurrent mode (aggregated only) Transport additions: - BlockingTcpSocket::from_stream() wraps pre-accepted TcpStream - BlockingUnixDomainSocket::from_stream() wraps pre-accepted UnixStream Shared helpers: - handle_client_connection() -- per-client message dispatch and metrics - aggregate_and_print_server_metrics() -- shared aggregation logic Tests: - test_standalone_concurrent_tcp_round_trip (3 concurrent clients) - test_handle_client_connection_round_trip (dispatch correctness) - test_handle_client_connection_one_way_metrics (metrics recording) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_standalone_concurrent_tcp_one_way: multi-accept server with 2 concurrent one-way clients, verifying server-side metrics recording - test_tcp_from_stream_send_receive: BlockingTcpSocket::from_stream() full send/receive round-trip - test_uds_from_stream_send_receive: BlockingUnixDomainSocket::from_stream() full send/receive round-trip (unix-only) - test_concurrency_forced_to_one_for_shm: CLI parsing for SHM with concurrency > 1 - test_aggregate_and_print_empty_collectors: empty input edge case - test_aggregate_and_print_single_collector: single collector with data Total binary tests: 34. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add multi-accept support for async TCP and UDS servers, matching the blocking server's concurrency support. Uses tokio::net listeners with spawn_blocking for per-client handler threads. - Remove unused _args parameter from run_standalone_server_async - Replace inline latency printing in async server with shared print_server_one_way_latency helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace all 12 hardcoded test ports (18301-18314) with OS-assigned ports via get_free_port() helper (binds to port 0, extracts assigned port). Prevents port conflicts in parallel test runs and with other processes. - Extract 2-second multi-accept grace period into SERVER_ACCEPT_GRACE_PERIOD constant with documentation explaining the behavior and limitation. - Document the grace period in --server CLI help text so users know concurrent clients should connect within 2 seconds of each other. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tokio::net::TcpStream::into_std() leaves the stream in non-blocking mode (set by tokio for epoll/kqueue). The blocking transport's read_exact/write_all calls then fail with WouldBlock errors, causing immediate disconnection. Fix: call set_nonblocking(false) on streams after into_std() in both TCP and UDS async multi-accept servers. Add test_standalone_async_concurrent_tcp_round_trip to exercise the async multi-accept path (tokio accept + spawn_blocking + from_stream + handle_client_connection). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_standalone_blocking_tcp_one_way: verify server received exact message count with correct sequential IDs, add shutdown message - test_standalone_blocking_tcp_duration_round_trip: verify response IDs match requests, assert count > 10 for 200ms test, add shutdown - test_standalone_blocking_tcp_duration_one_way: verify server received exact count with sequential IDs, assert count > 10 for 200ms test - test_concurrency_forced_to_one_for_shm: test actual concurrency forcing logic instead of just CLI parsing - test_standalone_concurrent_tcp_one_way: assert exact message count per handler instead of just "greater than zero" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Clean up garbled doc comment on async concurrent test (editing artifacts from multiple rewrites) - Replace silent panic swallowing in async multi-accept servers: try_join_next().transpose() silently dropped JoinErrors from panicked handler tasks. Now logs warnings via warn!(). - Extract effective_concurrency() helper to deduplicate the concurrency-forcing logic (was copied in blocking client, async client, and test). Test now calls the actual helper instead of reimplementing the logic inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_standalone_large_payload_integrity: 4KB payloads with recognizable byte pattern, server echoes back, client verifies content byte-for-byte to catch corruption - test_handle_client_connection_filters_canary: verifies warmup canary messages (id=u64::MAX) are excluded from one-way metrics - test_handle_client_connection_mixed_message_types: interleaved OneWay and Request messages on a single connection, verifies correct metrics recording and response dispatch - test_aggregate_and_print_multiple_collectors: aggregation across 2 collectors with different latency distributions - test_effective_concurrency_all_mechanisms: covers UDS, PMQ, SHM, TCP, and concurrency=1 edge case Total binary tests: 40. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accepted TCP/UDS streams inherit non-blocking mode from the listener (set for the accept poll loop). The handler threads need blocking mode for the transport's read_exact/write_all operations. This is the blocking-server equivalent of the async into_std fix in commit 8723429. Without this fix, standalone server handlers immediately disconnect from clients. Applies to both run_standalone_server_blocking_multi_accept_tcp and _uds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks. If a handler thread panics while holding the lock, other threads can still push their metrics instead of cascade-panicking. - Add defensive --shm-direct guard in standalone server and client: returns error if --shm-direct is used without --blocking. This is normally enforced by main() but the guard protects against future refactoring that might change the dispatch order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grace period bug: reset timer on every new connection, not just the first. Prevents premature server exit between one-way and round-trip test phases when using concurrency > 1. Applied to all four multi-accept servers (blocking TCP/UDS, async TCP/UDS). - Honor --quiet flag in standalone server and client. When set, suppress all tracing output to stderr. - Handle poisoned mutex gracefully: use unwrap_or_else(|e| e.into_inner()) instead of unwrap() on mutex locks in handler threads. - Add defensive --shm-direct guard in standalone server and client. - Add socket buffer tuning (recv/send buffer sizes) to multi-accept TCP servers to match normal transport behavior. - Fix integer division remainder: last worker now receives any extra messages when msg_count is not evenly divisible by concurrency. - Document empty response payloads as intentional design matching existing benchmark runner behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add receive_blocking_timed() to the BlockingTransport trait that captures a monotonic timestamp after raw bytes are read but before bincode deserialization. This excludes deserialization overhead from one-way latency measurements. - Add default implementation on BlockingTransport trait (backward compatible -- captures timestamp after full receive) - Override in TCP, UDS, and SHM blocking transports to place timestamp between raw I/O read and deserialization - SHM-direct uses default (no bincode deserialization to exclude) - Update handle_client_connection and standalone single-client server to use receive_blocking_timed Impact is most visible with large payloads where deserialization is non-trivial. 64KB one-way TCP test shows min latency dropped from 41us (post-deserialize) to 14us (pre-deserialize), a ~27us improvement representing the bincode deserialization time excluded from measurement. Mean dropped 28% (73us to 52us) and P99 dropped 14% (132us to 113us). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #105 (Fix/streaming timestamps) added send_timestamp_ns parameter to MessageLatencyRecord::new(). Update all 4 call sites in standalone client to capture wall-clock timestamp at send time and pass it as the 6th argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f320d06 to
3e17727
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Move ~3100 lines of standalone client/server code from the binary crate (main.rs) into two new library modules, following the existing flat-file convention (benchmark.rs/benchmark_blocking.rs pattern). Structure: - standalone_server.rs (1982 lines): constants, shared helpers, server dispatch, multi-accept TCP/UDS, async server paths - standalone_client.rs (1146 lines): retry helpers, client dispatch, single/concurrent blocking and async paths - main.rs reduced from ~4200 to ~1120 lines (thin dispatch layer) Additional changes: - Promote logging.rs from binary-private to library-public module - Move set_affinity() to utils.rs as pub function - All standalone functions now pub for tarpaulin coverage measurement and integration test access No behavioral changes. All 374 tests pass. Benchmark comparison across 3 runs confirms no performance regression (mean latencies within 2-5% run-to-run variance). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Socket configuration failures (set_nonblocking, set_nodelay) in multi-accept servers now log a warning and skip the bad connection instead of crashing the entire server with ? - Thread join panics in blocking multi-accept servers now logged with warn! instead of silently dropped with let _ = - Streaming latency record failures in client now logged with debug! instead of silently swallowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment was marked as outdated.
This comment was marked as outdated.
dustinblack
left a comment
There was a problem hiding this comment.
Thanks for the quick turnaround, Shawn — I can see the grace period, quiet flag, mutex safety, shm-direct guard, socket buffer tuning, and remainder message fixes all made it into the current code. Well done.
One remaining issue: coverage. Now that the standalone logic lives in library modules, tarpaulin can instrument it — so the 1.29% (standalone_client.rs) and 11.59% (standalone_server.rs) coverage isn't a tarpaulin limitation anymore. The helper functions are well-tested (build_standalone_transport_config, dispatch_server_message, connect_blocking_with_retry, handle_client_connection, aggregate_and_print_server_metrics, effective_concurrency), but none of the orchestration functions are called from tests:
run_standalone_server_blocking/_single/_multi_accept_tcp/_multi_accept_udsrun_standalone_client_blocking_single/_concurrent- The async variants of all of the above
These contain the warmup logic, duration mode, metrics finalization, results output, concurrent worker spawning, and the grace period accept loop. Adding a few integration tests that call these entry points end-to-end (e.g., spawn run_standalone_server_blocking_single in a thread, run run_standalone_client_blocking_single against it with a small message count) should bring coverage up substantially and would give us confidence in the orchestration code — especially important given the AI-assisted development workflow on this project.
One minor note: response messages always use empty payloads (Vec::new()) regardless of request size, making round-trip measurements asymmetric. If that's intentional, a brief comment would help.
Server tests (standalone_server.rs, 82.7% coverage): - test_multi_accept_tcp_server_direct: exercises multi-accept TCP directly - test_single_server_direct: exercises blocking single-client directly - test_server_blocking_dispatch: exercises dispatch logic - test_server_blocking_dispatch_uds: exercises UDS dispatch branch - test_async_multi_accept_tcp_full: exercises async multi-accept TCP - test_async_single_server_path: exercises async single-client - test_async_single_server_one_way_metrics: async one-way metrics - test_async_multi_accept_uds_full: exercises async UDS multi-accept - test_multi_accept_server_with_delayed_client: slow sender resilience - test_multi_accept_server_duration_one_way: duration mode with multi-accept - test_async_multi_accept_server_duration_one_way: async duration mode - test_handle_client_connection_send_failure: client disconnect error path - test_single_server_client_disconnect: single server send error path - test_multi_accept_server_survives_bad_client: garbage input resilience - test_handle_client_connection_garbage_input: deserialization error path - test_run_standalone_server_full_dispatch: full entry point dispatch - test_run_standalone_server_rejects_all_via_dispatch: 'all' validation - test_run_standalone_server_rejects_shm_direct: shm-direct guard - test_run_standalone_server_verbose: -vv logging level branches - test_aggregate_server_metrics_from_handlers: real handler data - test_print_server_one_way_latency_with_data/zero: print paths Client tests (standalone_client.rs, 86.3% coverage): - test_client_blocking_tcp_round_trip/one_way: single client paths - test_client_blocking_tcp_duration_round_trip/one_way: duration mode - test_client_blocking_tcp_concurrent_round_trip/one_way: concurrent - test_client_async_single_round_trip/one_way: async single - test_client_async_duration_round_trip/one_way: async duration - test_client_async_concurrent_round_trip/one_way: async concurrent - test_client_blocking_with_send_delay: send_delay round-trip branch - test_client_blocking_one_way_with_send_delay: send_delay one-way branch - test_client_blocking_with_streaming_output: JSON streaming - test_client_blocking_combined_streaming: combined mode streaming - test_client_blocking_csv_streaming: CSV streaming - test_client_blocking_concurrent_duration_one_way: concurrent duration - test_client_async_concurrent_duration_one_way: async concurrent duration - test_run_standalone_client_full_dispatch: full entry point dispatch - test_run_standalone_client_rejects_all_via_dispatch: 'all' validation - test_run_standalone_client_rejects_shm_direct: shm-direct guard - test_connect_async_with_retry_succeeds: async retry path Also: changed tracing .init() to .try_init() with eprintln fallback in both server and client for test compatibility. Coverage: standalone_server 82.7%, standalone_client 86.3%, combined 84.8% Total lib tests: 355. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📈 Changed lines coverage: 83.44% (927/1111)🚨 Uncovered lines in this PR
📊 Code Coverage Summary
|
dustinblack
left a comment
There was a problem hiding this comment.
Coverage looks great — 83.44% changed-line coverage, with standalone_client.rs at 86% and standalone_server.rs at 84%. Nice work addressing all the review items.
Code-wise this is ready. Before I approve, I want to validate with hands-on testing:
- Existing benchmark scenarios (non-standalone) still work as expected — regression check
- Standalone
--server/--clientmode on the same host - Standalone mode across container boundaries (container-to-host, container-to-container)
I'll follow up once testing is complete.
Reduce the non-blocking accept loop sleep from 50ms to 5ms in both TCP and UDS multi-accept servers. This cuts connection acceptance latency by 10x with no portability concerns. Discovered during hands-on validation testing of standalone concurrent mode, where the 50ms polling interval was the primary contributor to elevated tail latency under multi-client workloads. Improvement with -c 4 concurrent clients: - RT P95: -46% (65.9us -> 35.5us) - RT P99: -49% (91.4us -> 46.9us) - Throughput: +66% (94.9 -> 157.1 MB/s) Single-client workloads also benefit from faster initial connection acceptance (P99 improved 4-7% across all test modes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
01e7e96 to
15f9c92
Compare
|
@dustinblack just a heads up, I found a performance issue during some hands on testing that I pushed a small fix for in case you're doing your own hands on testing :) |
📈 Changed lines coverage: 83.44% (927/1111)🚨 Uncovered lines in this PR
📊 Code Coverage Summary
|
Add the ability to run the benchmark server and client as independent processes, enabling cross-environment IPC testing (e.g., host and container).
Relates to #11
Description
Brief description of changes
Type of Change
Testing
Checklist