feat: size_of over all live DistArrays of a type in a World#553
Merged
Conversation
070c987 to
8c52609
Compare
There was a problem hiding this comment.
Pull request overview
Adds a “ground-truth” memory-accounting path for DistArray tile storage by walking a World’s WorldObject registry to find live detail::DistributedStorage<T> instances and summing locally-owned, already-set tiles (deduplicating shallow-copy handles). This complements existing handle-based size_of(DistArray) accounting, which can double-count shared storage.
Changes:
- Add
detail::DistributedStorage::for_each_local_tile()to iterate locally-owned, set tile futures without communication. - Add
TiledArray::size_of(S, DistributedStorage),size_of_live_distarray_storage(World&), and a variadicsize_of_live_distarrays_storage(worlds)“matrix” helper. - Add a unit test validating deduplication across shallow copies and type-filtering behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/dist_array.cpp |
Adds coverage for registry-walk storage accounting vs handle-based accounting (including shallow-copy deduplication and variadic API). |
src/TiledArray/distributed_storage.h |
Introduces a tile-iteration helper over locally-owned, already-set futures to support local-only accounting. |
src/TiledArray/dist_array.h |
Adds public APIs to compute live DistributedStorage tile-data sizes per World and across multiple types/worlds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds a ground-truth tile-data accounting facility that finds every live
DistArray of a given type by walking a World's WorldObject registry,
rather than summing size_of over a set of handles. Because each array's
tile storage is a single DistributedStorage WorldObject, an array
referenced by N shallow-copy handles is counted exactly once -- handle
summation double-counts shared storage, which makes it unsuitable as a
cross-check.
API (TiledArray namespace, dist_array.h):
- size_of<S>(const detail::DistributedStorage<T>&)
Tile-data bytes of one storage object (sum of size_of<S>(tile) over
locally-owned, set tiles).
- size_of_live_distarray_storage<DistArrayT, S>(World&)
Walks world.get_object_ids(), recovers each registered pointer as
the common polymorphic base madness::WorldObjectBase, dynamic_casts
to the DistributedStorage matching DistArrayT's tile type (others
skipped), and sums the above.
- size_of_live_distarrays_storage<S, DistArrayTs...>(worlds)
[world][type] matrix of the above.
IMPORTANT: these report the DistributedStorage (tile-data) footprint
ONLY. They exclude the DistArray-level TiledRange, Shape, and Pmap --
those live in the owning ArrayImpl/TensorImpl, not the storage, and are
not reachable from the registered WorldObject. Under SparsePolicy the
Shape (per-tile Frobenius-norm table) can be sizeable, so the result is
NOT comparable term-for-term with a sum of size_of(const DistArray&)
over handles (which includes the shape). Names say "storage" to make
this explicit.
DistributedStorage gains for_each_local_tile(op): applies op to each
locally-owned, set tile -- the same tile set size_of(DistArray)
iterates. The size_of<S>(tile) summation is done by the size_of(storage)
overload in dist_array.h, where the tile-type overloads are visible
(they need not be at the point this low-level header is parsed).
Type-safety rests on WorldObjectBase sitting at offset 0 of every
registered WorldObject; verified that across MADNESS, TiledArray, and
MPQC no WorldObject-derived class has WorldObject as a non-primary base
(the single-inheritance "class X : public WorldObject<X>" idiom). The
recovered base is dynamic_cast, so a wrong type yields nullptr, not UB.
Counts only locally-owned set tiles; excludes remote-tile caches. Call
at a quiescent point (after a fence).
Test (array_suite/live_storage_size_in_world): builds two distinct
arrays plus a shallow copy of one, checks the storage walk equals the
two-array (deduplicated) tile-data total -- not the three-handle sum --
that a ToT-typed walk does not pick up regular-tensor arrays, and the
variadic matrix form. Passes at np=1 and np=2 (CI).
8c52609 to
b5e1e83
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a ground-truth memory-accounting facility: find every live
DistArrayof a given type in one or moreWorlds by walking each World'sWorldObjectregistry, instead of summingsize_ofover a set of handles.Motivation: a
DistArrayis a shallow-copy handle, so summingsize_ofover handles double-counts arrays referenced from multiple places (e.g. the same tensor held both in an MPQC factory cache and in SeQuant's eval cache). Each array's tile storage is a singledetail::DistributedStorageWorldObject, so walking the registry counts each array exactly once — usable as a cross-check against handle-based accounting.API (
TiledArraynamespace,dist_array.h)size_of<DistArrayT, S>(World&)— walkworld.get_object_ids(), recover each registered pointer as the common polymorphic basemadness::WorldObjectBase,dynamic_castto theDistributedStoragematchingDistArrayT's tile type (non-matching objects skipped), sumDistributedStorage::local_size_in_bytes<S>().size_of_live_distarrays<S, DistArrayTs...>(worlds)—[world][type]matrix of the above.DistributedStoragegainslocal_size_in_bytes<S>()(distributed_storage.h): sumsTiledArray::size_of<S>(tile)over locally-owned, set futures — the same tile setsize_of(DistArray)iterates.Type-safety
The registry stores
void*(registered asstatic_cast<Derived*>(this)), so a blindstatic_castto the wrong type would be UB. The cast goes throughmadness::WorldObjectBase(the polymorphic base ofWorldObject<Derived>) and thendynamic_cast, which yieldsnullptrfor non-matching types. This is sound as long asWorldObjectBasesits at offset 0 of every registeredWorldObject— true for the single-inheritanceclass X : public WorldObject<X>idiom. Verified across MADNESS, TiledArray, and MPQC that noWorldObject-derived class listsWorldObjectas a non-primary base.Caveats
TiledRange/shape metadata and remote-tile caches. Call at a quiescent point (after a fence).Sis the leading template arg of the variadic form (it has a default but precedes the pack):size_of_live_distarrays<MemorySpace::Host, ArrayA, ArrayB>(worlds).Test plan
array_suite/size_of_live_in_world(np=1): two distinct arrays + a shallow copy of one; asserts the walk equals the 2-array deduplicated tile total (not the 3-handle sum), a ToT-typed walk ignores regular-tensor arrays, and the variadic matrix form. Fullarray_suitegreen at np=1.