feat: Gc benchmarking by stanbrub · Pull Request #421 · deephaven/benchmark

stanbrub · 2026-03-26T21:29:06Z

Added "training tests" that are representative benchmarks for comparing JDK versions, GC types, Python versions, etc. They are meant to provide as much coverage as possible with the fewest tests
Added a LocalParquetGenerator to generate very large parquet files into the DHC data directory. The typical standard benchmarks generate data through DHC, which is great for small to mid sized data sets.
Tests added: AggBy, Filter, Join, UpdateBy, Formula,

…vious GC pass

cpwright

All the benchmarks are going to do work. I have some concerns that we might do too much work related to the timestamp calculation and traversing a UnionSourceManager (merge) where not necessary.

cpwright · 2026-03-27T16:46:33Z

            merge([
                read('/data/timed.parquet').view(formulas=[${loadColumns}])${headRows}
            ] * ${scaleFactor}).update_view([
                'timestamp=timestamp.plusMillis((long)(ii / ${rows}) * ${rows})'


Is there a reason we can't use the timestamp from the file? I have a few worries about doing rowset calculation as part of the benchmark (to come up with ii).

For the actual test benchmarks, without a select we would also just prefer more/bigger parquet files to avoid the overhead of going through the merge data structures. We might even be able to get away with symlinks to have the data just repeate itself.

For the "train" benchmarks, since we don't use Scale Factors, that section of code will not be hit. This is only used when we are doing merges to simulate larger data sets. So for the nightly runs, this will happen BEFORE the "select" into memory, which is not included in the measurement. But for the "train" benchmarks, we only read timestamps directly from the parquet file(s), and that only if they are used in the benchmark (like for rollingtime).

…arquet files

stanbrub and others added 6 commits March 13, 2026 14:09

Disabled some benchmarks and scaled

9d490f5

Scaled up basic math combo

47f066f

Merge branch 'deephaven:main' into gc-benchmarking

dea74d7

Added a Local Parquet Generator as opposed to going through Kafka

15cf1f4

Added local parquet generator and 1st training test

8604111

Added more train benchmarks. Improved Local Parquet Generator

83b1c11

stanbrub self-assigned this Mar 26, 2026

stanbrub added 3 commits March 26, 2026 15:32

Revert BasicMathCombo

c552c01

Revert BasicMathCombo

62aa96a

Reverted scale and disabled for pre-train standard tests used for pre…

f78ca22

…vious GC pass

cpwright reviewed Mar 27, 2026

View reviewed changes

stanbrub and others added 19 commits March 30, 2026 23:09

Parallelized local parquet. worked around directory link failures

e5412e7

Added 1st pass at benchmark even retrieval with JFR

ff4d891

Merge branch 'deephaven:main' into gc-benchmarking

f35ab4f

Added jfr events

25629cc

Merge branch 'deephaven:main' into gc-benchmarking

254cca0

Added UGP events

528c365

Rescaled only static trained for 120 secs

bd5ff02

Updated adhoc for local parquet env variables

75449bb

Open up dh data dir so local parquet can work

ec2d95e

More logging for benchmark runs

a402a54

Scaling back AggBy because of system lockup

4cf8357

Restrict the number of parquet threads and memory for the runner

8507794

Fixed NaturalJoin OOM

c0b5e7a

Added separate scalling for static vs inc

8f1a77f

Better separation for running static and inc. Added ugp deltas

2938992

turn on JFR metrics

9b326e0

Turn off Inc runs

7fe14cc

Added ss_log budget metric

5e1d59c

Added runner setting for auto tune cycle factor

a1316d4

stanbrub added 30 commits June 4, 2026 17:32

Change inc release to 80%

1b351a5

Change inc release to 100%

4f4124a

Scale for Java 25

11dbba7

Roll back changes some unwanted changes

c048170

Switch to java 17

c9af58a

Change to JVM 25

5364601

Rescaled for 100ms benchmarks

4af7393

Changed autotune to 90% for testing

920c89c

Add 1.10% inc target

37dc336

Turn off static for now. Do inc 100p

c6f4a00

Do huge mem options for 1 sec

e29b3bb

Doing Just Inc for 80% throughput now

83f4c78

Doing JVM 17 with p1.0 inc and static

6837afa

Scale updateBy for 1sec cycles

e093003

Added sleep for slow DH startups

11f7985

Yet more waiting for ZGC

7ed69b6

More p90 100ms j25 huge benchmarks

f68564d

1st try at GC report

0f5c61c

Reformatted summary tables

d580534

Fixed static names

390f019

Fixed png names

8989a84

Fixed inc memory leak. Made train dashboard.

dd15089

Copyright updates

758520f

Got train dashboard working against GCloud with csvs pulled up into p…

3b9293b

…arquet files

Added all gc rankings tables

1f092aa

Added train dashboard usage

c6190b6

Revamped tables images gor gc-report

16d1343

Fixed a mispelling

e7daf30

Renamed docker compose file

146d770

Added more train dashboard usage

b42aff5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Gc benchmarking#421

feat: Gc benchmarking#421
stanbrub wants to merge 81 commits into
deephaven:mainfrom
stanbrub:gc-benchmarking

stanbrub commented Mar 26, 2026

Uh oh!

cpwright left a comment

Uh oh!

cpwright Mar 27, 2026

Uh oh!

stanbrub Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stanbrub commented Mar 26, 2026

Uh oh!

cpwright left a comment

Choose a reason for hiding this comment

Uh oh!

cpwright Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

stanbrub Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants