perf: benchmark zstd GRPC compression by anthony-swirldslabs · Pull Request #749 · hashgraph/pbj

anthony-swirldslabs · 2026-03-10T22:26:20Z

Description:
Update on 3/12/26: introduced TestBlockGrpcBench to test actual block data.

Adding a zstd GRPC compression benchmark in PBJ integration tests. To that end:

GrpcCompression.register*() APIs are added to support custom encodings.
PayloadWeight is updated to precompute the payloads, to make them semi-repeatable (to see some effects of compression), and a SUPER payload is added with 2M bytes.
A non-official PbjGrpcCall.setNetworkBytesInspector(PbjGrpcNetworkBytesInspector) API is added to support network latency simulation.
A 1Gbps network is simulated in the benchmarks to show at least some benefit of the compression.

Only unary benchmarks test various compression methods because streaming benchmarks are very slow on its own already, and the compression shouldn't really have a different effect between unary and streaming cases because we always process every single message individually anyway.

See results below. The main outcomes:

Fast network negate the benefits of the compression because it's faster to send the bytes than to spend CPU on compressing them. Slowing down the network (e.g. 100Mbps or slower) could artificially show greater benefits of the compression, but that doesn't seem fair.
Small(er) payloads don't benefit from the compression. Only the 2M payload seems to benefit sometimes. It's questionable if this payload size is truly typical though.
zstd is faster with those larger payloads. However, it's a lot slower than gzip with smaller payloads. In turn, gzip is slower than identity with those smaller payloads.

Ultimately, the compression is only beneficial with large requests/replies (>>8KB), assuming they contain enough compressible data. And if network speeds are very fast (>1Gbps), then the compression becomes irrelevant.

Related issue(s):

Fixes #746

Notes for reviewer:
Updated results on 3/17/2026:

Benchmark                              (encodings)  (maxBlockSize)   Mode  Cnt     Score     Error  Units  IPS
TestBlockGrpcBench.benchBidiStreaming     identity          102400  thrpt    3   900.908 ± 101.119  ops/s  315K
TestBlockGrpcBench.benchBidiStreaming     identity          524288  thrpt    3   176.079 ±  10.660  ops/s  317K
TestBlockGrpcBench.benchBidiStreaming     identity         2048000  thrpt    3    46.429 ±   3.484  ops/s  325K
TestBlockGrpcBench.benchBidiStreaming         gzip          102400  thrpt    3   573.893 ±  60.821  ops/s  201K
TestBlockGrpcBench.benchBidiStreaming         gzip          524288  thrpt    3   103.225 ±   3.426  ops/s  186K
TestBlockGrpcBench.benchBidiStreaming         gzip         2048000  thrpt    3    27.067 ±   1.305  ops/s  189K
TestBlockGrpcBench.benchBidiStreaming         zstd          102400  thrpt    3  1735.377 ± 789.987  ops/s  607K
TestBlockGrpcBench.benchBidiStreaming         zstd          524288  thrpt    3   323.607 ±  21.641  ops/s  582K
TestBlockGrpcBench.benchBidiStreaming         zstd         2048000  thrpt    3    91.564 ±   4.128  ops/s  641K
TestBlockGrpcBench.benchBidiStreaming       zstd10          102400  thrpt    3   781.124 ±  77.676  ops/s  273K
TestBlockGrpcBench.benchBidiStreaming       zstd10          524288  thrpt    3   148.500 ±  16.553  ops/s  267K
TestBlockGrpcBench.benchBidiStreaming       zstd10         2048000  thrpt    3    37.497 ±   1.823  ops/s  262K
TestBlockGrpcBench.benchBidiStreaming       zstd-5          102400  thrpt    3  1606.577 ± 246.504  ops/s  562K
TestBlockGrpcBench.benchBidiStreaming       zstd-5          524288  thrpt    3   280.603 ±  20.727  ops/s  505K
TestBlockGrpcBench.benchBidiStreaming       zstd-5         2048000  thrpt    3    73.976 ±  45.961  ops/s  518K
TestBlockGrpcBench.benchUnary             identity          102400  thrpt    3   726.609 ± 345.864  ops/s  254K
TestBlockGrpcBench.benchUnary             identity          524288  thrpt    3   166.334 ±  13.599  ops/s  299K
TestBlockGrpcBench.benchUnary             identity         2048000  thrpt    3    44.689 ±   3.642  ops/s  313K
TestBlockGrpcBench.benchUnary                 gzip          102400  thrpt    3   341.977 ±   7.401  ops/s  120K
TestBlockGrpcBench.benchUnary                 gzip          524288  thrpt    3    70.284 ±   5.135  ops/s  127K
TestBlockGrpcBench.benchUnary                 gzip         2048000  thrpt    3    18.919 ±   0.144  ops/s  132K
TestBlockGrpcBench.benchUnary                 zstd          102400  thrpt    3   708.404 ±  76.297  ops/s  248K
TestBlockGrpcBench.benchUnary                 zstd          524288  thrpt    3   186.305 ±  16.827  ops/s  335K
TestBlockGrpcBench.benchUnary                 zstd         2048000  thrpt    3    57.511 ±   2.269  ops/s  403K
TestBlockGrpcBench.benchUnary               zstd10          102400  thrpt    3   430.910 ±  28.807  ops/s  151K
TestBlockGrpcBench.benchUnary               zstd10          524288  thrpt    3   104.016 ±  10.584  ops/s  187K
TestBlockGrpcBench.benchUnary               zstd10         2048000  thrpt    3    28.662 ±   2.335  ops/s  201K
TestBlockGrpcBench.benchUnary               zstd-5          102400  thrpt    3   848.160 ± 112.154  ops/s  297K
TestBlockGrpcBench.benchUnary               zstd-5          524288  thrpt    3   218.547 ±  41.507  ops/s  393K
TestBlockGrpcBench.benchUnary               zstd-5         2048000  thrpt    3    61.696 ±   3.951  ops/s  432K

Updated results:

The bench has been rewritten to use real block data. The items in the sample are:

Test blocks of maxBlockSize 2048000:
   0: 7234 items, 2047972 bytes total, with average item at 283 bytes
   1: 6776 items, 2037785 bytes total, with average item at 300 bytes
   2: 7157 items, 2047982 bytes total, with average item at 286 bytes
...

So we get the following numbers of items for various block sizes:

102400   350
524288   1800
2048000  7000

Based on the above math, here's the results table with a manually added IPS (items per second) column:

Benchmark                              (encodings)  (maxBlockSize)   Mode  Cnt     Score     Error  Units  IPS
TestBlockGrpcBench.benchBidiStreaming     identity          102400  thrpt    3   571.208 ± 391.871  ops/s  199K
TestBlockGrpcBench.benchBidiStreaming     identity          524288  thrpt    3   108.676 ±  92.651  ops/s  194K
TestBlockGrpcBench.benchBidiStreaming     identity         2048000  thrpt    3    35.170 ±   1.578  ops/s  245K
TestBlockGrpcBench.benchBidiStreaming         gzip          102400  thrpt    3   576.079 ±  58.788  ops/s  201K
TestBlockGrpcBench.benchBidiStreaming         gzip          524288  thrpt    3   103.364 ±   0.980  ops/s  185K
TestBlockGrpcBench.benchBidiStreaming         gzip         2048000  thrpt    3    27.226 ±   1.710  ops/s  189K
TestBlockGrpcBench.benchBidiStreaming         zstd          102400  thrpt    3  1303.327 ± 150.537  ops/s  456K
TestBlockGrpcBench.benchBidiStreaming         zstd          524288  thrpt    3   262.027 ±  10.394  ops/s  471K
TestBlockGrpcBench.benchBidiStreaming         zstd         2048000  thrpt    3    73.482 ±   3.155  ops/s  511K
TestBlockGrpcBench.benchBidiStreaming        zstd0          102400  thrpt    3  1291.796 ± 107.170  ops/s  452K
TestBlockGrpcBench.benchBidiStreaming        zstd0          524288  thrpt    3   252.795 ±  14.902  ops/s  455K
TestBlockGrpcBench.benchBidiStreaming        zstd0         2048000  thrpt    3    72.517 ±   1.303  ops/s  507K
TestBlockGrpcBench.benchBidiStreaming       zstd-5          102400  thrpt    3  1170.695 ±  58.590  ops/s  409K
TestBlockGrpcBench.benchBidiStreaming       zstd-5          524288  thrpt    3   211.161 ±  69.433  ops/s  380K
TestBlockGrpcBench.benchBidiStreaming       zstd-5         2048000  thrpt    3    65.638 ±   2.561  ops/s  459K
TestBlockGrpcBench.benchUnary             identity          102400  thrpt    3   447.294 ± 357.286  ops/s  156K
TestBlockGrpcBench.benchUnary             identity          524288  thrpt    3   102.628 ±  49.036  ops/s  184K
TestBlockGrpcBench.benchUnary             identity         2048000  thrpt    3    30.458 ±   3.870  ops/s  213K
TestBlockGrpcBench.benchUnary                 gzip          102400  thrpt    3   298.344 ±  77.744  ops/s  104K
TestBlockGrpcBench.benchUnary                 gzip          524288  thrpt    3    66.626 ±   4.734  ops/s  120K
TestBlockGrpcBench.benchUnary                 gzip         2048000  thrpt    3    17.656 ±   2.364  ops/s  123K
TestBlockGrpcBench.benchUnary                 zstd          102400  thrpt    3   603.907 ±  21.967  ops/s  211K
TestBlockGrpcBench.benchUnary                 zstd          524288  thrpt    3   146.474 ±  18.600  ops/s  263K
TestBlockGrpcBench.benchUnary                 zstd         2048000  thrpt    3    48.994 ±   1.385  ops/s  343K
TestBlockGrpcBench.benchUnary                zstd0          102400  thrpt    3   604.090 ±  62.128  ops/s  244K
TestBlockGrpcBench.benchUnary                zstd0          524288  thrpt    3   146.230 ±  41.440  ops/s  263K
TestBlockGrpcBench.benchUnary                zstd0         2048000  thrpt    3    50.207 ±  31.698  ops/s  351K
TestBlockGrpcBench.benchUnary               zstd-5          102400  thrpt    3   691.045 ± 258.739  ops/s  242K
TestBlockGrpcBench.benchUnary               zstd-5          524288  thrpt    3   158.872 ±  74.167  ops/s  286K
TestBlockGrpcBench.benchUnary               zstd-5         2048000  thrpt    3    54.725 ±   2.770  ops/s  383K

Below are older results with the regular Greeter bench.

Comments in the issue show more various results, but below is the final run:

Benchmark                          (encodings)  (streamCount)  (weight)   Mode  Cnt     Score      Error  Units
PbjGrpcBench.benchUnary               identity            N/A     LIGHT  thrpt    3  7685.976 ± 1340.012  ops/s
PbjGrpcBench.benchUnary               identity            N/A    NORMAL  thrpt    3  6737.614 ±  576.778  ops/s
PbjGrpcBench.benchUnary               identity            N/A     HEAVY  thrpt    3  2730.245 ±  284.283  ops/s
PbjGrpcBench.benchUnary               identity            N/A     SUPER  thrpt    3    15.300 ±    6.864  ops/s
PbjGrpcBench.benchUnary                   gzip            N/A     LIGHT  thrpt    3  5635.441 ±  533.935  ops/s
PbjGrpcBench.benchUnary                   gzip            N/A    NORMAL  thrpt    3  5035.319 ±  748.946  ops/s
PbjGrpcBench.benchUnary                   gzip            N/A     HEAVY  thrpt    3  1995.138 ±  350.897  ops/s
PbjGrpcBench.benchUnary                   gzip            N/A     SUPER  thrpt    3     9.364 ±    4.476  ops/s
PbjGrpcBench.benchUnary                   zstd            N/A     LIGHT  thrpt    3  4267.069 ± 3984.615  ops/s
PbjGrpcBench.benchUnary                   zstd            N/A    NORMAL  thrpt    3  3922.068 ±  877.392  ops/s
PbjGrpcBench.benchUnary                   zstd            N/A     HEAVY  thrpt    3  2782.024 ±  332.568  ops/s
PbjGrpcBench.benchUnary                   zstd            N/A     SUPER  thrpt    3    23.687 ±   17.162  ops/s

Checklist

Documented (Code comments, README, etc.)
Tested (unit, integration, etc.)

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

github-actions · 2026-03-10T22:28:09Z

JUnit Test Report

78 files ±0 78 suites ±0 3m 44s ⏱️ +5s
1 352 tests ±0 1 348 ✅ ±0 4 💤 ±0 0 ❌ ±0
7 234 runs ±0 7 214 ✅ ±0 20 💤 ±0 0 ❌ ±0

Results for commit f3b1981. ± Comparison against base commit 348053f.

This pull request removes 6 and adds 6 tests. Note that renamed tests count towards both.

com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] FLOAT, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048ceb0@3b28ab9b, [0.1, 0.5, 100.0], 12, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d0c0@16c1345b
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] STRING, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04972a0@6ce8bf64, [string 1, testing here, testing there], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04974b0@6413eeb7
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] BYTES, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04976c0@4c678a1f, [010203, ff7f0f, 42da07370bff], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04978d0@217009bd
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] DOUBLE, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d2d0@1443539, [0.1, 0.5, 100.0, 1.7653472635472653E240], 32, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d4e0@5b160208
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [3] BOOL, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d6f0@16a15261, [true, false, false, true, true, true], 6, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d900@36ec4071
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [4] ENUM, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048db10@20d92f1e, [0, 2, 1], 3, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048dd20@3cf7433e

com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] FLOAT, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d370@62cb977a, [0.1, 0.5, 100.0], 12, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d580@7db70494
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [1] STRING, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04976c0@58189132, [string 1, testing here, testing there], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c04978d0@2305aad0
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] BYTES, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c0497ae0@54cce500, [010203, ff7f0f, 42da07370bff], com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c0497cf0@755033c5
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [2] DOUBLE, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d790@36ec4071, [0.1, 0.5, 100.0, 1.7653472635472653E240], 32, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048d9a0@5d8112e6
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [3] BOOL, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048dbb0@3cf7433e, [true, false, false, true, true, true], 6, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048ddc0@68cc6319
com.hedera.pbj.runtime.ProtoWriterToolsTest ‑ [4] ENUM, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048dfd0@544733a4, [0, 2, 1], 3, com.hedera.pbj.runtime.ProtoWriterToolsTest$$Lambda/0x00000007c048e1e0@522f74a1

♻️ This comment has been updated with latest results.

github-actions · 2026-03-10T22:29:42Z

Integration Test Report

418 files + 3 418 suites +3 17m 52s ⏱️ - 1m 14s
114 977 tests +88 114 977 ✅ +88 0 💤 ±0 0 ❌ ±0
115 219 runs +88 115 219 ✅ +88 0 💤 ±0 0 ❌ ±0

Results for commit f3b1981. ± Comparison against base commit 348053f.

This pull request removes 3 and adds 91 tests. Note that renamed tests count towards both.

com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000713344c36bb0@1f368b6a
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000713344c36de0@5baff897
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [3] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000713344c37010@3f7f9015

com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000799d27c431f0@41b9662c
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000799d27c43420@579f4595
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [3] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000799d27c43650@4465b66d
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [10] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [11] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [12] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [13] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [14] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [15] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
pbj.integration.tests.pbj.integration.tests.tests.TestBlockItemTest ‑ [16] NoToStringWrapper{pbj.integration.tests.pbj.integration.tests.TestBlockItem}
…

♻️ This comment has been updated with latest results.

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

jasperpotts · 2026-03-13T00:08:56Z

Concerns / Issues

@Setup(Level.Invocation) in TestBlockGrpcBench.BenchState — Starting and stopping the server + client on every JMH invocation is very expensive and will dominate the measurement for smaller payloads. This is a significant benchmark methodology issue. Level.Trial or Level.Iteration would be more appropriate. The existing PbjGrpcBench uses Level.Trial for the server and Level.Iteration for the client — TestBlockGrpcBench should follow the same pattern.
Network latency simulator is a global static side effect — NetworkLatencySimulator.simulate() installs a Thread.sleep()-based inspector via a global static field on PbjGrpcCall. Both PbjGrpcBench and TestBlockGrpcBench call this in their static {} blocks. If both benchmarks run in the same JVM fork, the second one overwrites the first's inspector (one has printSizes=false, the other printSizes=true). Since JMH uses separate forks by default (@Fork(1)), this is probably fine in practice, but it's fragile.
Thread.sleep for network simulation is coarse — Thread.sleep(millis, nanos) typically has millisecond granularity on most JVMs/OSes. For a 1Gbps network and a 100-byte payload, the calculated sleep is ~800ns, which will round to 0ms. The simulator effectively does nothing for small payloads and only kicks in for large ones (>~125KB). This means the "1Gbps simulation" is mostly a no-op for the 102K block size. The PR description acknowledges this implicitly ("fast network negates benefits"), but it's worth noting the simulation is imprecise.
GrpcCompression maps changed from immutable to mutable HashMap — The COMPRESSOR_MAP and DECOMPRESSOR_MAP are now plain HashMap with no synchronization. This is fine for benchmarks (registered once at startup), but since this is in pbj-runtime (production code), concurrent reads during registration could cause issues. A ConcurrentHashMap would be safer.
Error swallowing in benchmarks — Both benchUnary and benchBidiStreaming catch Exception, print the stack trace, and continue. This means if compression is broken or the server errors out, the benchmark silently produces incorrect results with fewer actual operations than INVOCATIONS. The @OperationsPerInvocation(INVOCATIONS) will then report inflated throughput.
Socket closed handling in PbjGrpcCall — The new UncheckedIOException / SocketException catch block with string matching (se.getMessage().contains("Socket closed")) is fragile. This is production code being changed to support a benchmark edge case. String-matching on exception messages is locale/JVM-dependent.

For the alternative to Thread.sleep()

For nanosecond-precision delays, a busy-wait spin loop is the standard approach in benchmarking:

private void sleep(long bytes) {
    final long nanos = nanosPerByte * bytes;
    final long deadline = System.nanoTime() + nanos;
    while (System.nanoTime() < deadline) {
        Thread.onSpinWait(); // hint to the CPU (JDK 9+)
    }
}

Thread.onSpinWait() emits a PAUSE instruction on x86 (or equivalent on ARM), which reduces power consumption and avoids starving sibling hyperthreads while spinning.

Why this works for benchmarks:

System.nanoTime() has sub-microsecond resolution on modern OSes
Thread.sleep(0, 800) typically sleeps for ~1ms due to OS scheduling granularity, which is 1000x too long for an 800ns target
In a JMH benchmark, burning CPU on a spin-wait is acceptable — you're already dedicating cores to the benchmark

Why Thread.sleep is wrong here:

At 1Gbps, 100KB = ~800µs. Thread.sleep can handle that, but barely.
At 1Gbps, 1KB = ~8µs. Thread.sleep will overshoot by 100x+.
At 1Gbps, 100B = ~800ns. Thread.sleep rounds to 0 or ~1ms — either a no-op or 1000x too much.

The tradeoff is that busy-wait consumes a full CPU core, but that's expected and acceptable in a JMH context.

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

anthony-swirldslabs · 2026-03-17T22:21:53Z

@jasperpotts :

@setup(Level.Invocation): this does not "dominate the measurement" for any payload size because the benchmark state is being initialized outside of the measurement method. So there's no any "benchmark methodology issues". The benchmark uses an idiomatic approach. That statement above is completely false. I agree that using Level.Trial may be a bit more efficient from the perspective of the overall time it takes to run the benchmark. However, there's an issue related to flow-control as described at Benchmark zstd-jni #746 (comment) / GRPC streaming server may die #758, and therefore the benchmark has to restart the server for every invocation for the bidi streaming case specifically (it may or may not affect the unary case as well, but I haven't observed it.) That issue is unclear and is outside of scope of this benchmark. For this reason, I keep this part as is.
The NetworkLatencySimulator is a utility class exposing a static method to activate it. This is because the network inspector in the PbjGrpcCall is static, which is by design because applications never have direct access to the Call object. I suppose we could move it into the PbjGrpcClient as a mutable instance member, however, that would have slight negative performance hit because the PbjGrpcCall would have to retrieve that member from another object after receiving/sending every datagram. Also, this class is not designed to be used in several benchmarks running in parallel. Our current JMH setup does not run our benchmark in parallel in the same JVM, and we don't have any plans changing that because this could spoil the measurement results. So this isn't an issue. However, I'll add a note to its javadoc to mention this.
Thread.sleep(): Sounds good. Busy wait using the onSpinWait() works for me. Updated. However, The PR description acknowledges this implicitly ("fast network negates benefits"), but it's worth noting the simulation is imprecise. doesn't make sense. The statement in the PR description is still true, and doesn't depend on the precision of the sleep implementation.
Compressor maps: Good point. However, concurrent reads shouldn't cause any failures, and we don't want to synchronize on reads for performance reasons. But writing to the maps is something that needs to be synchronized, indeed. Adding synchronization.
Errors: The errors are swallowed by design because these benchmarks rely on a real networking stack (albeit using the loopback network interface only.) Errors may occur sporadically, and we don't want to fail a long benchmark run because of a single broken connection. We have numerous integration tests that verify that the GRPC client and server work and don't throw exceptions randomly. So when running the benchmark, we can be assured that nothing is broken. And we don't want to fail a run or omit an iteration because of a random failure in the OS networking stack. So this part isn't changing.
Socket closed: This is a weird one. It's not really a "benchmark edge case", it's a real problem. However, I'm unsure how much this problem hurts real applications, and I agree that the current solution isn't very elegant (although I hardly see any other alternative.) To address the comment, I moved this logic over to the benchmark itself for now.

anthony-swirldslabs · 2026-03-20T00:15:28Z

@jasperpotts : an update regarding the Level.Invocation/Level.Trial - I think I found a cause of that at #758 , but it will be a separate fix. Once merged, I'll update the benchmarks to use the Level.Trial, again, in a separate future PR. For this PR, we go with the same approach that is currently used in the existing PbjGrpcBench and use the Level.Invocation. As mentioned above, the setup of the state happens outside of the measurement, and therefore, it doesn't affect the measurement itself. So it's not a critical issue by any means as it doesn't affect the benchmark, other than by making it run just a tiny bit longer.

perf: benchmark zstd GRPC compression

a46c57a

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

anthony-swirldslabs self-assigned this Mar 10, 2026

anthony-swirldslabs requested review from a team as code owners March 10, 2026 22:26

anthony-swirldslabs requested a review from rbarker-dev March 10, 2026 22:26

anthony-swirldslabs added 4 commits March 10, 2026 15:30

please Gradle

77aede9

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

please Gradle more

770c698

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

add GRPC blocks bench

5094e8f

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

merge from upstream

8c64ee9

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

anthony-swirldslabs added 3 commits March 13, 2026 15:46

merge from upstream

f5e742b

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

merge from upstream

021956d

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

addressing comments

f3b1981

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>

imalygin approved these changes Mar 23, 2026

View reviewed changes

anthony-swirldslabs merged commit 8f8b634 into main Mar 24, 2026
15 checks passed

anthony-swirldslabs deleted the 746-zstdBench branch March 24, 2026 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: benchmark zstd GRPC compression#749

perf: benchmark zstd GRPC compression#749
anthony-swirldslabs merged 8 commits intomainfrom
746-zstdBench

anthony-swirldslabs commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

jasperpotts commented Mar 13, 2026

Uh oh!

anthony-swirldslabs commented Mar 17, 2026 •

edited

Loading

Uh oh!

anthony-swirldslabs commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anthony-swirldslabs commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JUnit Test Report

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration Test Report

Uh oh!

jasperpotts commented Mar 13, 2026

Concerns / Issues

For the alternative to Thread.sleep()

Uh oh!

anthony-swirldslabs commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anthony-swirldslabs commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anthony-swirldslabs commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

anthony-swirldslabs commented Mar 17, 2026 •

edited

Loading