Skip to content

fix: account for initial hash table allocation in ArrowBytesMap/ArrowBytesViewMap#21249

Open
yashrb24 wants to merge 1 commit intoapache:mainfrom
yashrb24:fix/binary-map-initial-map-size
Open

fix: account for initial hash table allocation in ArrowBytesMap/ArrowBytesViewMap#21249
yashrb24 wants to merge 1 commit intoapache:mainfrom
yashrb24:fix/binary-map-initial-map-size

Conversation

@yashrb24
Copy link
Copy Markdown

@yashrb24 yashrb24 commented Mar 30, 2026

Which issue does this PR close?

Rationale for this change

ArrowBytesMap and ArrowBytesViewMap allocate their hash tables with HashTable::with_capacity(INITIAL_MAP_CAPACITY) but initialize map_size to 0. The insert_accounted method only tracks incremental growth beyond the current capacity, so the initial allocation from with_capacity is never counted. size() understates memory usage until the first resize.

What changes are included in this PR?

Initialize map_size with map.allocation_size() in both ArrowBytesMap::new and ArrowBytesViewMap::new to capture the pre-allocated memory.

Are these changes tested?

The change is a one-line init fix. Existing tests cover the size() method behavior.

Are there any user-facing changes?

No API changes. size() now returns the memory estimate that includes the initial hash table allocation.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Mar 30, 2026
@yashrb24 yashrb24 force-pushed the fix/binary-map-initial-map-size branch from 03429d9 to 0673088 Compare March 30, 2026 12:08
@yashrb24 yashrb24 force-pushed the fix/binary-map-initial-map-size branch 3 times, most recently from ed98020 to 3076119 Compare April 30, 2026 17:19
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning origin/main
    Building datafusion-physical-expr-common v53.1.0 (current)
       Built [  19.624s] (current)
     Parsing datafusion-physical-expr-common v53.1.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-physical-expr-common v53.1.0 (baseline)
       Built [  18.951s] (baseline)
     Parsing datafusion-physical-expr-common v53.1.0 (baseline)
      Parsed [   0.021s] (baseline)
    Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.300s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure trait_method_missing: pub trait method removed or renamed ---

Description:
A trait method is no longer callable, and may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-item-signature
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/trait_method_missing.ron

Failed in:
  method expression_id of trait PhysicalExpr, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-origin_main/1b4aa23fc54dabface2da814e74fe26e0b84c6a8/datafusion/physical-expr-common/src/physical_expr.rs:464

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  39.880s] datafusion-physical-expr-common

…BytesViewMap

ArrowBytesMap and ArrowBytesViewMap allocate their hash tables with
HashTable::with_capacity(INITIAL_MAP_CAPACITY) but initialize map_size
to 0. insert_accounted only tracks incremental growth beyond the current
capacity, so the initial allocation is never counted and size() understates
memory usage until the first resize.

Initialize map_size with map.allocation_size() in both constructors,
and add regression tests covering both map types.

Closes apache#21248.
@yashrb24 yashrb24 force-pushed the fix/binary-map-initial-map-size branch from 3076119 to 5cafef0 Compare May 6, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArrowBytesMap and ArrowBytesViewMap undercount memory by not accounting for initial hash table allocation

1 participant