Skip to content

feat: size_of over all live DistArrays of a type in a World#553

Merged
evaleev merged 1 commit into
masterfrom
evaleev/feature/size-of-live-distarrays
May 22, 2026
Merged

feat: size_of over all live DistArrays of a type in a World#553
evaleev merged 1 commit into
masterfrom
evaleev/feature/size-of-live-distarrays

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 21, 2026

Summary

Adds a ground-truth memory-accounting facility: find every live DistArray of a given type in one or more Worlds by walking each World's WorldObject registry, instead of summing size_of over a set of handles.

Motivation: a DistArray is a shallow-copy handle, so summing size_of over handles double-counts arrays referenced from multiple places (e.g. the same tensor held both in an MPQC factory cache and in SeQuant's eval cache). Each array's tile storage is a single detail::DistributedStorage WorldObject, so walking the registry counts each array exactly once — usable as a cross-check against handle-based accounting.

API (TiledArray namespace, dist_array.h)

  • size_of<DistArrayT, S>(World&) — walk world.get_object_ids(), recover each registered pointer as the common polymorphic base madness::WorldObjectBase, dynamic_cast to the DistributedStorage matching DistArrayT's tile type (non-matching objects skipped), sum DistributedStorage::local_size_in_bytes<S>().
  • size_of_live_distarrays<S, DistArrayTs...>(worlds)[world][type] matrix of the above.

DistributedStorage gains local_size_in_bytes<S>() (distributed_storage.h): sums TiledArray::size_of<S>(tile) over locally-owned, set futures — the same tile set size_of(DistArray) iterates.

Type-safety

The registry stores void* (registered as static_cast<Derived*>(this)), so a blind static_cast to the wrong type would be UB. The cast goes through madness::WorldObjectBase (the polymorphic base of WorldObject<Derived>) and then dynamic_cast, which yields nullptr for non-matching types. This is sound as long as WorldObjectBase sits at offset 0 of every registered WorldObject — true for the single-inheritance class X : public WorldObject<X> idiom. Verified across MADNESS, TiledArray, and MPQC that no WorldObject-derived class lists WorldObject as a non-primary base.

Caveats

  • Counts only locally-owned tiles whose futures are set; excludes TiledRange/shape metadata and remote-tile caches. Call at a quiescent point (after a fence).
  • S is the leading template arg of the variadic form (it has a default but precedes the pack): size_of_live_distarrays<MemorySpace::Host, ArrayA, ArrayB>(worlds).

Test plan

  • array_suite/size_of_live_in_world (np=1): two distinct arrays + a shallow copy of one; asserts the walk equals the 2-array deduplicated tile total (not the 3-handle sum), a ToT-typed walk ignores regular-tensor arrays, and the variadic matrix form. Full array_suite green at np=1.
  • np=2 — left to CI; the local macOS ASan+mpirun environment crashes all tests (verified an untouched pre-existing test crashes identically), so it can't validate np=2 locally.

@evaleev evaleev force-pushed the evaleev/feature/size-of-live-distarrays branch 2 times, most recently from 070c987 to 8c52609 Compare May 22, 2026 06:57
@evaleev evaleev requested a review from Copilot May 22, 2026 07:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “ground-truth” memory-accounting path for DistArray tile storage by walking a World’s WorldObject registry to find live detail::DistributedStorage<T> instances and summing locally-owned, already-set tiles (deduplicating shallow-copy handles). This complements existing handle-based size_of(DistArray) accounting, which can double-count shared storage.

Changes:

  • Add detail::DistributedStorage::for_each_local_tile() to iterate locally-owned, set tile futures without communication.
  • Add TiledArray::size_of(S, DistributedStorage), size_of_live_distarray_storage(World&), and a variadic size_of_live_distarrays_storage(worlds) “matrix” helper.
  • Add a unit test validating deduplication across shallow copies and type-filtering behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/dist_array.cpp Adds coverage for registry-walk storage accounting vs handle-based accounting (including shallow-copy deduplication and variadic API).
src/TiledArray/distributed_storage.h Introduces a tile-iteration helper over locally-owned, already-set futures to support local-only accounting.
src/TiledArray/dist_array.h Adds public APIs to compute live DistributedStorage tile-data sizes per World and across multiple types/worlds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/TiledArray/dist_array.h
Comment thread src/TiledArray/dist_array.h
Adds a ground-truth tile-data accounting facility that finds every live
DistArray of a given type by walking a World's WorldObject registry,
rather than summing size_of over a set of handles. Because each array's
tile storage is a single DistributedStorage WorldObject, an array
referenced by N shallow-copy handles is counted exactly once -- handle
summation double-counts shared storage, which makes it unsuitable as a
cross-check.

API (TiledArray namespace, dist_array.h):
  - size_of<S>(const detail::DistributedStorage<T>&)
      Tile-data bytes of one storage object (sum of size_of<S>(tile) over
      locally-owned, set tiles).
  - size_of_live_distarray_storage<DistArrayT, S>(World&)
      Walks world.get_object_ids(), recovers each registered pointer as
      the common polymorphic base madness::WorldObjectBase, dynamic_casts
      to the DistributedStorage matching DistArrayT's tile type (others
      skipped), and sums the above.
  - size_of_live_distarrays_storage<S, DistArrayTs...>(worlds)
      [world][type] matrix of the above.

IMPORTANT: these report the DistributedStorage (tile-data) footprint
ONLY. They exclude the DistArray-level TiledRange, Shape, and Pmap --
those live in the owning ArrayImpl/TensorImpl, not the storage, and are
not reachable from the registered WorldObject. Under SparsePolicy the
Shape (per-tile Frobenius-norm table) can be sizeable, so the result is
NOT comparable term-for-term with a sum of size_of(const DistArray&)
over handles (which includes the shape). Names say "storage" to make
this explicit.

DistributedStorage gains for_each_local_tile(op): applies op to each
locally-owned, set tile -- the same tile set size_of(DistArray)
iterates. The size_of<S>(tile) summation is done by the size_of(storage)
overload in dist_array.h, where the tile-type overloads are visible
(they need not be at the point this low-level header is parsed).

Type-safety rests on WorldObjectBase sitting at offset 0 of every
registered WorldObject; verified that across MADNESS, TiledArray, and
MPQC no WorldObject-derived class has WorldObject as a non-primary base
(the single-inheritance "class X : public WorldObject<X>" idiom). The
recovered base is dynamic_cast, so a wrong type yields nullptr, not UB.

Counts only locally-owned set tiles; excludes remote-tile caches. Call
at a quiescent point (after a fence).

Test (array_suite/live_storage_size_in_world): builds two distinct
arrays plus a shallow copy of one, checks the storage walk equals the
two-array (deduplicated) tile-data total -- not the three-handle sum --
that a ToT-typed walk does not pick up regular-tensor arrays, and the
variadic matrix form. Passes at np=1 and np=2 (CI).
@evaleev evaleev force-pushed the evaleev/feature/size-of-live-distarrays branch from 8c52609 to b5e1e83 Compare May 22, 2026 07:40
@evaleev evaleev merged commit 00996ce into master May 22, 2026
8 of 9 checks passed
@evaleev evaleev deleted the evaleev/feature/size-of-live-distarrays branch May 22, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants