Skip to content

Add initial dhat heap profile tests#20734

Open
cetra3 wants to merge 1 commit intoapache:mainfrom
pydantic:add_dhat_heap_profile_tests
Open

Add initial dhat heap profile tests#20734
cetra3 wants to merge 1 commit intoapache:mainfrom
pydantic:add_dhat_heap_profile_tests

Conversation

@cetra3
Copy link
Contributor

@cetra3 cetra3 commented Mar 5, 2026

Which issue does this PR close?

Related to #20714 but doesn't actually fix any issues at the moment, merely highlights that memory accounting is currently inaccurate

Here's a table of measured heap profiles based upon these tests:

Test Operator Pool Peak Ratio Assert
heap_profile_repartition RepartitionExec 10MB 20.7MB 1.97x TODO
heap_profile_hash_join HashJoinExec 40MB 44.4MB 1.06x active
heap_profile_sort_merge_join SortMergeJoinExec 40MB 33.3MB 0.79x active
heap_profile_sort SortExec 10MB 65.9MB 6.28x TODO
heap_profile_hash_aggregate GroupedHashAggregate 10MB 121.7MB 11.60x TODO
heap_profile_window WindowAggExec 10MB 33.4MB 3.18x TODO
heap_profile_parquet_sort Parquet + SortExec 20MB 66.7MB 3.18x TODO

Rationale for this change

This adds some preliminary heap profile testing using dhat-rs to record heap allocations and report on memory usage for a handful of canned queries. Each of them allow some head room as there can be some overhead (1.1x) in other parts of the process.

While more test types can be added in the future, already most of these heap profile tests blow out memory usage.

For the failing tests, these are commented out with some information about why they might fail. Essentially we should raise PRs to fix memory accounting and then add the appropriate assertions.

What changes are included in this PR?

  • Adds dhat as a dev dependency
  • Creates a handful of tests for the heap profile.

Are these changes tested?

Yes, as they are only tests

Are there any user-facing changes?

None at this stage.

@github-actions github-actions bot added the core Core DataFusion crate label Mar 5, 2026
@cetra3 cetra3 force-pushed the add_dhat_heap_profile_tests branch from 5bb1e8e to 9e5441d Compare March 5, 2026 20:49
@cetra3 cetra3 changed the title Add initial dhat profile tests Add initial dhat heap profile tests Mar 5, 2026
@cetra3 cetra3 force-pushed the add_dhat_heap_profile_tests branch from 9e5441d to f876543 Compare March 5, 2026 21:10
@comphead
Copy link
Contributor

comphead commented Mar 5, 2026

Thats awesome, thanks @cetra3, please help me to read the table correctly

heap_profile_repartition RepartitionExec 10MB 20.7MB 1.97x

it means 10MB accounted by the internal memory but the real usage was 20.7?

@cetra3
Copy link
Contributor Author

cetra3 commented Mar 5, 2026

Yes, that's right. The memory limit on the datafusion test is 10mb, but the measured heap is actually twice that amount. You would expect there is a bit of overhead in data structures etc.. but double the amount of memory is far too much I think

@comphead
Copy link
Contributor

comphead commented Mar 5, 2026

Yes, that's right. The memory limit on the datafusion test is 10mb, but the measured heap is actually twice that amount. You would expect there is a bit of overhead in data structures etc.. but double the amount of memory is far too much I think

true, in Comet we also were struggling with unexpected OOMs

@andygrove @mbutrovich FYI

let me go through the approach


const MEMORY_LIMIT: usize = 10 * 1024 * 1024; // 10MB

#[tokio::test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether these new IT tests are going to stay but if they do then maybe the dhat dev-dependency should be optional and loaded only when some new feature is enabled, e.g. memory-profiling, and all IT tests be gated behind this feature.

);

// Start profiling after table creation
let _profiler = dhat::Profiler::builder().testing().build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you start the profiling after the setup but in many of the other IT tests this is done before the setup

@martin-g
Copy link
Member

martin-g commented Mar 6, 2026

The memory limit on the datafusion test is 10mb, but the measured heap is actually twice that amount

The limit is only for the FairSpillPool, not for the whole application.
DHat provides heap dump functionality and it could be used to find what consumes the bigger chunks of the used memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants