Skip to content

ENH: Out-of-core architecture rewrite and filter optimizations#1568

Draft
joeykleingers wants to merge 41 commits into
BlueQuartzSoftware:developfrom
joeykleingers:worktree-ooc-architecture-rewrite
Draft

ENH: Out-of-core architecture rewrite and filter optimizations#1568
joeykleingers wants to merge 41 commits into
BlueQuartzSoftware:developfrom
joeykleingers:worktree-ooc-architecture-rewrite

Conversation

@joeykleingers

@joeykleingers joeykleingers commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Rewrites the out-of-core (OOC) architecture in simplnx — replacing the old chunk-based API with a new bulk I/O design built around copyIntoBuffer/copyFromBuffer on AbstractDataStoreand adds the OOC-optimized algorithm variants for 30+ filters that build on it. The core infrastructure and the filter optimizations (previously tracked separately in #1575) are consolidated onto this branch, so this PR now contains both layers:

  1. Core OOC architecture — the bulk I/O API, store management, file import/recovery, and runtime dispatch infrastructure.
  2. Filter optimizationsDispatchAlgorithm-based in-core (Direct/BFS) vs OOC (Scanline/CCL) variants for 30+ filters, with benchmarks.

Core OOC Architecture

Core Architecture Changes

  • Removed old chunk API from AbstractDataStore / IDataStore (loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds, getChunkShape)
  • Added copyIntoBuffer / copyFromBuffer pure virtual bulk I/O methods to AbstractDataStore with implementations in DataStore, EmptyDataStore, and HDF5ChunkedStore (in SimplnxOoc plugin)
  • Added StoreType enum (InMemory, OutOfCore, Empty) to IDataStore; IsOutOfCore() now checks StoreType instead of getChunkShape()
  • HDF5ChunkedStore performs I/O via HDF5 hyperslab selections with Z-slice-aligned default chunk shape {1,Y,X} for 3D data
  • copyFromBuffer fast path: skips read-modify-write for tuple-aligned writes
  • copyIntoBuffer fast path: direct span-based readTuples for tuple-aligned reads
  • HDF5 DatasetIO gains readTuples/writeTuples for direct hyperslab-based bulk tuple I/O

New Core Utilities

  • DispatchAlgorithm — Runtime dispatch between in-core (Direct) and OOC (Scanline/CCL) algorithm variants based on data store type
  • SliceBufferedTransfer — Type-dispatched Z-slice buffered tuple copy utility that eliminates per-element OOC overhead during morphological transfer phases
  • UnionFind — Vector-based disjoint set data structure with union-by-rank and path-halving compression for chunk-sequential CCL algorithms
  • SegmentFeatures OOC path — Z-slice CCL-based connected component labeling with UnionFind equivalence tracking, replacing BFS/DFS flood fill for OOC data
  • AlignSections OOC path — Bulk slice read/write with AlignSectionsTransferDataOocImpl
  • DataArrayUtilities bulk I/OImportFromBinaryFile, AppendData, CopyData, and mirror swap_ranges updated with chunked bulk I/O (runtime OOC check preserves original in-core performance)

OOC Store Management

  • DataIOCollection / IDataIOManager — Updated for OOC store lifecycle management
  • ImportH5ObjectPathsAction — OOC-aware file import with recovery metadata
  • DataStoreIO — Detect OOC recovery attributes in ReadDataStore for safe data restoration
  • Legacy .dream3d support — Handle legacy file formats in OOC backfill operations

Filter Optimizations

Adds out-of-core (OOC) optimized algorithm variants for 30+ filters, using DispatchAlgorithm to select between in-core (Direct/BFS) and OOC (Scanline/CCL) code paths at runtime based on data store type. A preparatory rename commit gives git rename tracking so that GitHub shows meaningful diffs against the original algorithm code.

Algorithm Rename for Git Tracking

Renames 13 algorithm files to their in-core variant names before any logic changes, so that when dispatch variants are introduced, GitHub shows proper diffs against the original code instead of "new file" with no context.

Original Renamed To
FillBadData FillBadDataBFS
IdentifySample IdentifySampleBFS
ComputeBoundaryCells ComputeBoundaryCellsDirect
ComputeFeatureNeighbors ComputeFeatureNeighborsDirect
ComputeSurfaceAreaToVolume ComputeSurfaceAreaToVolumeDirect
ComputeSurfaceFeatures ComputeSurfaceFeaturesDirect
SurfaceNets SurfaceNetsDirect
QuickSurfaceMesh QuickSurfaceMeshDirect
DBSCAN DBSCANDirect
ComputeKMedoids ComputeKMedoidsDirect
MultiThresholdObjects MultiThresholdObjectsDirect
BadDataNeighborOrientationCheck BadDataNeighborOrientationCheckWorklist
ComputeGBCDPoleFigure ComputeGBCDPoleFigureDirect

Bug Fixes

OOC import of legacy SIMPL files with multi-dimensional component arrays

Legacy SIMPL .dream3d files store multi-dimensional component arrays (e.g., GBCD with componentShape [10,10,10,20,20,2]) with HDF5 physical dimensions in reversed order relative to the ComponentDimensions attribute.

Two fixes address this at different layers:

  • AbstractOocStore::readHdf5 (SimplnxOoc): Detects shape mismatch between logical and physical dimensions before the streaming import path. Falls back to flat bulk read (H5S_ALL) when shapes differ, preserving correct byte order.
  • ImportH5ObjectPathsAction::backfillReadOnlyOocStores (simplnx): The read-only reference store optimization creates stores pointing directly at the source file. For mismatched arrays, the N-D hyperslabs would be out-of-bounds. Detects the mismatch and creates a writable OOC store populated via readHdf5 (which triggers the flat-read fallback) instead of a read-only reference.

Filter Optimization Details

Group B — Face-Neighbor Filters (5 filters)

Split into Direct (in-core) and Scanline (OOC) algorithm classes using DispatchAlgorithm. Scanline variants use Z-slice rolling windows (prev/cur/next) for cross-slice neighbor access with zero per-element OOC overhead.

Filters: ComputeBoundaryCells, ComputeSurfaceFeatures, ComputeFeatureNeighbors, ComputeSurfaceAreaToVolume, BadDataNeighborOrientationCheck

Group C — Morphological / Neighbor Replacement (5 filters)

Z-slice rolling buffers for all 6 face-neighbor reads from RAM. SliceBufferedTransfer for type-dispatched bulk tuple copy.

Filters: ErodeDilateBadData, ErodeDilateCoordinationNumber, ErodeDilateMask, ReplaceElementAttributesWithNeighborValues, NeighborOrientationCorrelation

Group D — CCL Segmentation (5 filters)

Chunk-sequential Connected Component Labeling using UnionFind equivalence tracking, replacing BFS/DFS flood fill for OOC data.

Filters: ScalarSegmentFeatures, EBSDSegmentFeatures, CAxisSegmentFeatures, FillBadData, IdentifySample

Group E — AlignSections Family (4 filters)

Bulk slice read/write via AlignSectionsTransferDataOocImpl. Per-filter OOC findShifts with 2-slice buffers and bulk mask reads.

Filters: AlignSectionsMisorientation, AlignSectionsMutualInformation, AlignSectionsFeatureCentroid, AlignSectionsListFilter

QuickSurfaceMesh

DispatchAlgorithm<QuickSurfaceMeshDirect, QuickSurfaceMeshScanline>. Scanline eliminates the O(volume) nodeIds array (7.5 GB for 1000³) with rolling 2-plane node buffers (16 MB). Two-pass architecture: counting pass + mesh creation pass. All output arrays (triangle connectivity, faceLabels, vertex coordinates, nodeTypes) buffered per z-slice and flushed with copyFromBuffer. Batch quickSurfaceTransferBatch API added to TupleTransfer for bulk source-read/dest-write of cell and feature data.

SurfaceNets

DispatchAlgorithm<SurfaceNetsDirect, SurfaceNetsScanline>. Scanline is a complete reimplementation (881 lines) eliminating the O(n) Cell[] array — uses O(surface) hash map + vertex vectors with slice-by-slice FeatureIds reading. All output arrays (vertices, nodeTypes, triangle connectivity, faceLabels) buffered and flushed with copyFromBuffer. Batch surfaceNetsTransferBatch API added to TupleTransfer for bulk I/O.

Mesh Infrastructure (RepairTriangleWinding + GeometryHelpers)

  • RepairTriangleWinding: Bulk-reads triangle face list and faceLabels into local buffers; all BFS work operates on local memory; modified triangles written back via copyFromBuffer.
  • FindElementsContainingVert / FindElementNeighbors (GeometryHelpers.hpp): Chunked bulk I/O with 65K-element chunks for sequential passes. Random neighbor lookups check if candidate is in the current chunk (cache hit) before falling back to per-element copyIntoBuffer. Together with RepairTriangleWinding buffering, this reduced SurfaceNets Winding from 515s to 2.9s.

Clustering Filters (3 filters)

  • DBSCAN: DispatchAlgorithm<DBSCANDirect, DBSCANScanline> — chunked grid construction, on-demand per-grid-cell coordinate reads in canMerge. 653s → 12s (54x)
  • ComputeKMedoids: DispatchAlgorithm<Direct, Scanline> — chunked findClusters, per-cluster optimizeClusters with O(max_cluster_size) peak memory. 74s → 13s (5.7x)
  • ComputeFeatureClustering: Single implementation with feature-level array caching. 203s → 77s (2.6x)

Pipeline Prerequisite Filters (2 filters)

  • MultiThresholdObjects: DispatchAlgorithm<Direct, Scanline> — eliminates O(n) tempResultVector in OOC path
  • ConvertOrientations: Single implementation with chunked bulk I/O in macro-generated Convertor classes (4096-tuple chunks)

Together these reduced the AlignSectionsMisorientation pipeline test from 635s to 5.9s (107x).

OrientationAnalysis Misc (10 filters)

  • ComputeTwinBoundaries: Bulk-read all face/feature/ensemble arrays into local vectors. 179s → 44s (4x)
  • ComputeKernelAvgMisorientations: Slab-based bulk I/O with cached CrystalStructures
  • ComputeAvgCAxes: Already OOC-optimized (chunked reads, cached feature output). Compute-bound.
  • ReadH5Ebsd: copyFromBuffer in CopyData template, phase copy, Euler interleaving. 463s → 241s (1.9x)
  • ComputeGBCDPoleFigure: DispatchAlgorithm<Direct, Scanline> — Direct caches full GBCD, Scanline caches only the phase-of-interest slice (bounded by bin resolution, not cell count). 853s → 0.9s (948x)
  • ComputeFeatureReferenceCAxisMisorientations: Z-slice buffered I/O for all cell-level arrays (featureIds, cellPhases, quats, output). Cached ensemble/feature-level arrays (crystalStructures, avgCAxes). 196s → 5.4s (36x)
  • ComputeFeatureNeighborCAxisMisalignments: Bulk-read all feature-level arrays (featurePhases, featureAvgQuat, crystalStructures) and buffered avgCAxisMisalignment output.
  • MergeTwins: Chunked bulk I/O for voxel-level parent ID fill and assignment loop. Feature-level featureParentIds cached locally for lookup. 67s → 1.8s (37x)
  • ReadCtfData: Bulk copyFromBuffer for all cell arrays (phases, euler angles, bands, error, MAD, BC, BS, X, Y). Euler angle interleave uses chunked 64K buffer. Crystal structures cached locally for hex correction. 231s → 0.25s
  • ReadAngData: Same bulk copyFromBuffer pattern. Phase validation done in-place on EbsdLib buffer before single bulk write. Euler interleave chunked.

Pipeline-Critical Filters (6 filters)

Optimizations targeting the filters responsible for OOC pipeline timeouts (4 of 5 timed-out pipelines blocked by ComputeIPFColors):

  • ComputeIPFColors: DispatchAlgorithm<ComputeIPFColorsDirect, ComputeIPFColorsScanline>. Direct keeps parallel ParallelDataAlgorithm for in-core; Scanline uses chunked sequential bulk I/O (65K-tuple chunks) with locally cached crystal structures. ForceOocAlgorithmGuard added to test. 1,937ms → 90ms (21.5x)
  • ComputeFeatureSizes: DispatchAlgorithm<ComputeFeatureSizesDirect, ComputeFeatureSizesScanline> — Direct keeps the parallel in-core accumulation (tbb::combinable Kahan summation); Scanline uses chunked copyIntoBuffer for featureIds (ImageGeom path) and featureIds + elemSizes (RectGrid path, Kahan summation preserved). 813ms → 28ms (29x) on the OOC/Scanline path.
  • ComputeAvgOrientations: Chunked featureIds/phases/quats reads, locally cached crystal structures and avgQuats (feature-level). Bulk copyFromBuffer for output arrays.
  • ComputeFeatureReferenceMisorientations: Chunked all cell-level arrays (featureIds, phases, quats, GB distances, output misorientations). Locally cached crystal structures, avgQuats, and center quaternions (all feature/ensemble-level). 106ms → 1ms (106x)
  • ComputeFeatureCentroids: Replaced AbstractDataStore intermediate arrays (sum, center, count, rangeX/Y/Z) with plain std::vector — eliminates ~119M virtual dispatch calls per run. Chunked featureIds reads. Inline coordinate computation from spacing/origin. 39,724ms → 25ms (1,589x)
  • RequireMinimumSizeFeatures: Three-part optimization:
    • removeSmallFeatures: Chunked featureIds read/write (65K-tuple batches)
    • assignBadVoxels: 3-slice rolling slab buffer for neighbor voting scan (O(slice) memory), sparse changed-voxel tracking to skip full-volume transfer when few/no voxels changed. 14,592ms → 142ms (103x)
    • RemoveInactiveObjects (shared utility in DataGroupUtilities.cpp): Chunked featureIds renumbering with copyIntoBuffer/copyFromBuffer. 5,573ms → 50ms (111x)
    • Combined: 20,184ms → 210ms (96x)

Additional Filters

  • ComputeEuclideanDistMap: Bulk-read featureIds and distance stores into local vectors; flood-fill operates on local memory; bulk-write output. 116s → 1.1s (105x)
  • AppendImageGeometry: Bulk I/O for mirror operations (scanline-based reversal instead of per-tuple swaps). 469s → 113s (4.2x)

GBCD Filter Group (5 filters)

All five GBCD filters optimized for OOC with zero cell-level O(n) allocations, cancel checking, and progress messaging:

  • ComputeGBCDPoleFigure: DispatchAlgorithm<Direct, Scanline> with ForceOocAlgorithmGuard in test. Scanline caches only the phase-of-interest GBCD slice via copyIntoBuffer.
  • WriteGBCDGMTFile: Phase-of-interest GBCD slice cached via copyIntoBuffer; crystal structures cached locally.
  • WriteGBCDTriangleData: Chunked triangle I/O (8K chunks), feature-level euler cache, buffered file output via fmt::format_to + fmt::memory_buffer.
  • ComputeGBCD: Feature-level caching (eulers, phases, crystalStructures), chunked triangle array reads per 50K-triangle iteration, GBCD output accumulated in local buffer (bounded by phases × bins) then written back via copyFromBuffer.
  • ComputeGBCDMetricBased: Eliminated O(n) triIncluded allocation (replaced with per-chunk sequential area accumulation). Feature-level caching (phases, eulers, crystalStructures, featureFaceLabels). Chunked triangle I/O in totalFaceArea scan. Raw pointer access in parallel TrianglesSelector worker.

HDF5 Import + Pole Figure Filters (3 filters)

  • FillOocDataStore (shared infrastructure): Streaming chunked HDF5 hyperslab reads + copyFromBuffer, with zero O(n) temp allocations — batched reads even for partial hyperslabs. Benefits all HDF5 import paths.
  • ReadH5EspritData: copyFromBuffer bulk writes from raw HDF5 reader buffers, replacing 9+ per-element operator[] writes per point.
  • WritePoleFigure: Chunked iteration over eulerAngles/phases/mask per-phase using bounded buffers (no O(n) pre-caching); bulk-write intensity and image outputs via copyFromBuffer.
  • ReadHDF5Dataset: Cancel checking + per-dataset progress messages.
  • Test comparison loops in WritePoleFigureTest and ReadHDF5DatasetTest optimized with copyIntoBuffer.

Core Utilities + Geometry Filters

  • ImportFromBinaryFile: copyFromBuffer instead of per-element writes. ReadRawBinary Case1: 1076s → 29s (37x)
  • CropImageGeometry: Row-based bulk I/O. 27s → 2.6s (10x)
  • RandomizeFeatureIds (ClusteringUtilities): Chunked bulk I/O for both overloads — benefits all callers (segmentation filters, SharedFeatureFace, MergeTwins).
  • AppendData/CopyData/mirror swaps: Runtime OOC check — chunked bulk I/O for OOC, original code for in-core (verified zero in-core regression)
  • TupleTransfer: Added quickSurfaceTransferBatch and surfaceNetsTransferBatch batch APIs with bulk copyIntoBuffer/copyFromBuffer for source reads and destination writes. Used by QuickSurfaceMeshScanline and SurfaceNetsScanline.

Cancel + Progress Messaging

All in-core and OOC algorithm variants now have:

  • m_ShouldCancel checks at the top of major outer loops
  • ThrottledMessenger-based progress reporting with descriptive phase messages and percentage completion

OOC Performance Results

All benchmarks on arm64 Release build with forceOocData = true.

Mesh Generation Filters (full ctest wall-clock, OOC build)

Test Before (s) After (s) Speedup
QuickSurfaceMesh: Base 11.30 0.19 59x
QuickSurfaceMesh: Winding 22.70 0.22 103x
QuickSurfaceMesh: Problem Voxels 11.18 0.19 59x
QuickSurfaceMesh: Winding+PV 21.96 0.22 100x
SurfaceNets: Default 176 2.40 73x
SurfaceNets: Smoothing 224 2.62 85x
SurfaceNets: Winding 515 2.86 180x
SurfaceNets: Winding Smoothing 416 3.22 129x

Groups B–E (200³ dataset, filter.execute() only)

Filter Before (s) After (s) Speedup
ComputeBoundaryCells 6.69 0.25 27x
ComputeSurfaceFeatures 4.01 0.28 14x
ComputeFeatureNeighbors 8.93 0.81 11x
ComputeSurfaceAreaToVolume 8.59 0.24 36x
BadDataNeighborOrientationCheck 97.1 5.25 18x
ErodeDilateBadData 25.09 3.80 7x
ErodeDilateCoordinationNumber 12.43 2.30 5x
ErodeDilateMask 6.43 0.40 16x
ReplaceElementAttrsWithNeighborValues 6.05 4.00 1.5x
NeighborOrientationCorrelation 67.94 5.70 12x
ScalarSegmentFeatures 708.3 1.77 400x
EBSDSegmentFeatures 972.6 2.10 463x
CAxisSegmentFeatures 824.1 1.39 593x
FillBadData 8.6 2.26 4x
IdentifySample 825.0 0.27 3056x
AlignSectionsMisorientation 32.89 0.80 41x
AlignSectionsMutualInformation 15.61 0.81 19x
AlignSectionsFeatureCentroid 8.41 0.39 22x
AlignSectionsListFilter 7.50 0.39 19x

Pipeline-Critical Filters (filter.execute() only, OOC build)

Filter Before After Speedup
ComputeFeatureCentroids 39.7s 25ms 1,589x
RequireMinimumSizeFeatures 20.2s 210ms 96x
ComputeIPFColors 1.94s 90ms 21.5x
ComputeFeatureSizes 813ms 28ms 29x
ComputeFeatureReferenceMisorientations (AvgOri) 106ms 1ms 106x
ComputeFeatureReferenceMisorientations (EuclDist) 136ms 1ms 136x

OrientationAnalysis Filters (full ctest wall-clock, OOC build)

Filter Before (s) After (s) Speedup
ComputeFeatureReferenceCAxisMisorientations 196 5.4 36x
ComputeEuclideanDistMap 116 1.1 105x

GBCD Filter Group (full ctest wall-clock)

Filter Before (s) After (s) Speedup
ComputeGBCDPoleFigure 833 (fail) 2.4 350x
ComputeGBCD 1500 (timeout) ~10 150x
WriteGBCDGMTFile 162 (fail) 6.0 27x
ComputeGBCDMetricBased 38.1 28.9 1.3x
WriteGBCDTriangleData 23.5 19.2 1.2x

HDF5 Import + Pole Figure Filters (full ctest wall-clock)

Filter Before (s) After (s) Speedup
WritePoleFigure (3 tests) 4500 (timeout) 11.7 385x
ReadH5EspritData (3 tests) 2060 (timeout) 6.8 303x
ReadHDF5Dataset 1500 (timeout) 6.7 224x

Additional Optimizations (full ctest wall-clock)

Filter Before (s) After (s) Speedup
ReadRawBinary (Case1) 1076 29 37x
ComputeGBCDPoleFigure 853 0.9 948x
DBSCAN 3D 653 12 54x
AlignSectionsMisorientation Pipeline 635 5.9 107x
ReadH5Ebsd 463 2.1 220x
ReadCtfData 231 0.25 924x
AppendImageGeometry 469 113 4.2x
ComputeFeatureClustering 203 77 2.6x
ComputeTwinBoundaries 179 44 4x
MergeTwins 67 1.8 37x
ComputeKMedoids 74 13 5.7x
CropImageGeometry (X) 27 2.6 10x
WriteAvizoRectilinear 22.8 2.3 10x
WriteAvizoUniform 22.3 2.0 11x

Test Infrastructure

  • CompareDataArrays rewritten to use copyIntoBuffer in 40K-element chunks instead of per-element operator[]
  • Comparison function bulk I/OCompareFloatArraysWithNans, CompareArrays, and CompareDataArraysByComponent (UnitTestCommon.hpp) rewritten with chunked copyIntoBuffer reads (40K elements/chunk). Per-element operator[] access caused extreme slowdowns on OOC-backed arrays — this alone reduced the ComputeGBCD test from 1500s (timeout) to ~10s (the filter itself runs in ~3s).
  • Rotation filter bulk I/ORotateSampleRefFrame uses slab-based copyIntoBuffer/copyFromBuffer in RotateImageGeometryWithNearestNeighbor (no O(n) allocation); RotateEulerRefFrame uses chunked 65K-tuple I/O (19.5s → 4.8s, 4x). Together these reduced ReadH5Ebsd from 241s to 2.1s (117x).
  • ForceOocAlgorithmGuard coverage in all optimized filter tests for both algorithm paths
  • SIMPLNX_TEST_ALGORITHM_PATH CMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path control
  • Programmatic test data builders with Z-slice batched bulk writes for OOC efficiency

Related PRs

Test Plan

  • Tests pass on in-core build (SIMPLNX_TEST_ALGORITHM_PATH=2)
  • Tests pass on out-of-core build (SIMPLNX_TEST_ALGORITHM_PATH=1)
  • Tests pass with both algorithm paths (SIMPLNX_TEST_ALGORITHM_PATH=0)
  • All optimized filters produce correct results on both algorithm paths
  • In-core performance verified: no regression on utility changes (CopyData, AppendData, mirror swaps)

@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from b4ef97f to 99b49ed Compare March 24, 2026 18:13
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 2 times, most recently from b4ef97f to bb09048 Compare March 24, 2026 18:51
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 102c436 to b4c1358 Compare April 2, 2026 00:55
@joeykleingers joeykleingers changed the title WIP: OOC architecture rewrite — new bulk I/O API, SimplnxOoc plugin, and filter optimizations ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 2, 2026
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 6 times, most recently from 2bd614a to 110c054 Compare April 8, 2026 17:41
@joeykleingers joeykleingers changed the title ENH: OOC architecture rewrite — new bulk I/O API and infrastructure WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 8, 2026
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 35aecd0 to 3a88bbf Compare April 16, 2026 13:03
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from bdfed87 to 6fbfc8d Compare April 27, 2026 18:18
joeykleingers and others added 28 commits June 12, 2026 10:55
* Replace per-triangle getFaceCoordinates() random reads with a
  chunked pipeline: bulk-read 65K triangle connectivity indices per
  pass, determine the referenced vertex-index span, and bulk-load
  that vertex range into a local buffer
* Parallelize the area compute on the local buffer (reads/writes
  touch plain C++ arrays only, so threads are safe — no DataStore
  access inside the parallel region)
* Bulk-write the chunk's area output in one call
* Guard against pathological meshes whose vertex indexing spans
  more than 16M entries per chunk with a serial per-triangle
  fallback; filter-generated meshes stay well under this cap

CT_align (mesh-scale triangle areas): 26 s -> <1 s (~26x).

Tests: 1/1 pass on both in-core and OOC builds.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
… inputs

Bump the FeatureId (and RectGrid element-size) bulk-read chunk size
from 64K to 256K tuples. The voxel-counting pass is I/O-bound on
OOC-backed stores; larger chunks reduce copyIntoBuffer() round-trip
overhead on datasets with tens of thousands of chunks while keeping
per-chunk working-set memory bounded (1 MB for the int32 buffer,
and an additional 1 MB for the float32 element-size buffer on the
RectGrid path).

CT_align (1.97 B voxels, Image path): 14 s -> 13 s.

Tests: 9/9 pass on the OOC build.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
* Rewrite the markdown Algorithm section to explain the crop as a 3D
  subarray copy from first principles, teach the Z-slice-batched
  bulk I/O strategy step-by-step, and quantify why batching by
  K Z-slices collapses HDF5 chunk-op overhead
* Add a Doxygen block on CropImageGeomDataArray describing the
  per-pass pipeline (bulk read slab -> in-memory extract -> bulk
  write) and the O(slab), non-O(volume) memory bound

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Rewrite the Algorithm section so a reader unfamiliar with the filter
can follow the two-phase pipeline end-to-end:

* Phase 1 (feature removal): motivate why small features get pruned,
  describe the 64K-tuple chunked scan, and explain the "skip write
  when chunk unchanged" optimization
* Phase 2 (gap fill by majority-vote): teach the rolling 3-slice
  buffer scan, the sparse parallel vectors that replace the old
  O(n) dense index array, the per-array ChunkedTransferWorker with
  its +/-1 Z-margin slab read + interior-only write-back, and the
  outer ParallelTaskAlgorithm across arrays
* Add a memory-footprint summary clarifying that every data
  structure is O(slice) or O(iteration bad count), never O(volume)

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add a new Algorithm section that teaches the filter from scratch:

* Explain conceptually which triangles are kept (all three vertices
  inside the user-specified node-type range) and what the output
  geometry looks like (compact vertex list, compact triangle list,
  remapped connectivity)
* Document the downstream-invariant that forces vertNewIndex to stay
  a dense per-vertex map (triangle 0's three fresh vertices land at
  new indices 0..2 in traversal order)
* Explain the triMask bitset + triPrefixSum sparse popcount table
  that replaces the legacy dense triangle map for ~6.4x memory
  savings, and how remapIndex() turns an O(1) table lookup plus a
  small popcount into each triangle's compact new index
* Walk the six streaming passes (vertex-ok mask, triangle scan +
  vertex-index assignment, prefix-sum build, vertex copy, triangle
  remap copy, per-vertex/per-triangle attached-array copy)
* Summarize the memory footprint so the vertNewIndex dominance is
  clear on very large meshes

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add a comprehensive Algorithm section covering both the node-geometry
and image-geometry paths from first principles:

* Describe how every supported transform (rotation, scale, manual
  matrix, etc.) collapses to a single 4x4 homogeneous matrix M and
  how M composes with prior transforms
* Node geometries: walk the 16K-vertex chunked read -> multiply ->
  write pipeline and explain why in-place topology+attribute data
  is correct
* Image geometries: teach the re-gridding problem (why output voxels
  need to look up source values via M^-1), and contrast nearest-
  neighbor vs. trilinear interpolation
* Z-slice slab cache: analytically deriving the per-output-slice
  source-Z range and the +/-2 trilinear margin
* Sliding-window slab updates via memmove + delta copyIntoBuffer
  reads when consecutive output slices overlap heavily
* Intra-slice parallelism via ParallelDataAlgorithm with thread
  safety argued from shared-read + disjoint-write access patterns
  and per-thread pValues scratch

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add an Algorithm section that walks the chunked pipeline step-by-step
for a reader unfamiliar with the optimization:

* Establish the closed-form per-triangle math (0.5 * |(A-B) x (A-C)|)
  so there is no confusion about the compute
* Quantify the naive access pattern (six OOC chunk-cache hits per
  triangle, hundreds of millions of virtual dispatches on CT-scale
  meshes) to motivate the chunking
* Walk the five-step per-chunk pipeline: bulk triangle connectivity
  read -> analyze vertex-index span -> span-bounded bulk vertex
  coords read -> parallel compute on plain buffers -> bulk area
  write
* Explain the 16M-vertex span cap and the serial per-triangle
  fallback for pathological meshes
* Summarize memory footprint (bounded O(chunk), not O(mesh))

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Rewrite the Algorithm section to fully teach the filter:

* State what the three output arrays (NumElements, Volume,
  EquivalentDiameter) represent and show the spherical/circular
  diameter formulas
* Image Geometry path: explain the uniform-voxel-volume shortcut
  that lets the filter skip per-voxel volume computations, then
  walk the 256K-tuple chunked count pass and the per-feature
  output pass; cover the 2D fallback rules and the
  two-empty-dimensions preflight error
* RectGrid path: contrast with the Image case, describe the
  lockstep FeatureIds + elementSizes chunked read, and explain
  why Kahan summation is needed to avoid float32 rounding error
  on billion-voxel volumes
* Justify the 256K chunk size choice based on HDF5 chunk-lookup
  overhead vs. L2 cache residency
* Summarize memory footprint

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Three sites in the algorithm multiplied double-precision resolution and
angle values by k_PiOver180F (a float constant). Float promotion to
double preserves the quantized float bits rather than recovering the
true M_PI/180.0, introducing a ~1e-10 deviation from the legacy
SIMPL algorithm (which uses double k_PiOver180). Over ~756k triangles
times ~2300 symmetry operations the deviation flipped two near-boundary
triangles in/out of the selected set, shifting a handful of
distribution bin values by ~3e-2.

Switch the three multiplications to the existing k_PiOver180D double
constant so the resolution thresholds and fixed-misorientation angle
are computed at full double precision.

The stored 6_6_find_gbcd_metric_based.tar.gz exemplar was generated by
the original float-precision DREAM3D FindGBCDMetricBased filter and no
longer matches the simplnx algorithm after this fix. Publish a fresh
exemplar from the double-precision legacy pipeline and repoint the
tests at it.

* Rename archive and top-level folder from 6_6_find_gbcd_metric_based
  to compute_gbcd_metric_based (drops the legacy 6_6_ prefix in
  accordance with current archive-naming conventions).
* Drop the 6_6_ prefix from the stored .dat exemplar filenames;
  input .dream3d filename follows the folder name.
* ComputeGBPDMetricBasedTest's InValid section reuses the GBCD archive
  (for crystal-structures and mesh input); update its paths too.
* CMakeLists.txt download_test_data entry bumped to the new archive
  name and SHA512.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…x bool-mask bulk I/O

Three logically related changes that finish reconciling the rebased
branch with Nathan Young's PR BlueQuartzSoftware#1590 (ENH: Standardize 2D Image Handling)
and fix one resulting OOC perf cliff:

1. Wholesale port of PR BlueQuartzSoftware#1590's two algorithm rewrites into the renamed
   in-core dispatch variants:
   - ComputeFeatureNeighborsDirect.cpp gets Nathan's templated
     ComputeFeatureNeighborsFunctor<ImageDimensionStateT> and ProcessVoxels
     dispatcher in place of the OOC-commit-era custom in-core logic.
   - IdentifySampleBFS.cpp gets Nathan's templated IdentifySampleFunctor
     plus the corresponding ProcessVoxels dispatch.
   The Scanline OOC variant of ComputeFeatureNeighbors is updated to
   reference the namespaced VoxelNeighbors<Image3D>:: constants while
   preserving its Z-slice rolling-window bulk-I/O structure.

2. Reapply PR BlueQuartzSoftware#1590's constexpr/const cleanups across the algorithm
   files where the rebase took --theirs (the OOC commit version) at the
   2aa00ee conflict and dropped Nathan's small adjustments:
     SimplnxCore: ComputeBoundaryCellsDirect, ErodeDilateBadData,
       ErodeDilateCoordinationNumber, ErodeDilateMask,
       ReplaceElementAttributesWithNeighborValues,
       RequireMinimumSizeFeatures
     OrientationAnalysis: BadDataNeighborOrientationCheckWorklist,
       NeighborOrientationCorrelation
   The pattern is uniform: promote the inlined `6` neighbor-array sizes
   to use VoxelNeighbors<Image3D>::k_FaceNeighborCount via a local
   k_NumFaceNeighbors alias, make neighborVoxelIndexOffsets const,
   make faceNeighborInternalIdx constexpr, make isValidFaceNeighbor
   const where it is not mutated, drop the now-unused DataGroup.hpp
   include, and const-ify NeighborOrientationCorrelation's orientationOps.
   ComputeFeatureNeighborsFilter.md picks up Nathan's all-dimension
   note about user-set spacing for shared surface area calculation.

3. Fix a per-element OOC fallback in BadDataNeighborOrientationCheckScanline
   that was triggered whenever the input mask was a BoolArray rather
   than a UInt8Array. The previous code routed bool masks through
   maskCompare->isTrue / maskCompare->setValue per voxel per Z-slice,
   causing chunk thrashing under chunked OOC storage. The Small_IN100
   pipeline test (a 189x201x117 volume with a bool mask produced by
   MultiThresholdObjects) ran in 4.7 s on simplnx-Rel but 3+ minutes
   on simplnx-ooc-Rel. AbstractDataStore<bool> already exposes
   copyIntoBuffer/copyFromBuffer just like AbstractDataStore<uint8>;
   the comment claiming otherwise was stale. Resolve a typed
   AbstractDataStore<bool>* alongside the existing uint8 store pointer
   and route both load and write-back through bulk I/O, with a small
   per-slice std::unique_ptr<bool[]> scratch buffer bridging between
   the algorithm's uint8 slice buffers and the bool data store's typed
   bulk API. With this change Small_IN100 OOC drops to 4.6 s
   (~1.6x in-core, in line with normal OOC overhead).

Tests updated:
  - IdentifySampleTest.cpp adopts Nathan's PR BlueQuartzSoftware#1590 hand-built 2D Empty
    Z/Y/X Non-Square regression tests plus the parameterized
    identify_sample_v2 exemplar test and the SIMPL Backwards Compatibility
    test, all wrapped with the OOC dual-path pattern (ForceOocAlgorithmGuard
    + GENERATE(from_range(k_ForceOocTestValues))). The pre-existing
    200x200x200 large-scale OOC validation test is retained.

Verified: simplnx-Rel and simplnx-ooc-Rel preset builds both clean.
All 43 affected-filter tests pass on simplnx-Rel; all 86 affected-filter
tests pass on simplnx-ooc-Rel (regex covering ComputeFeatureNeighbors,
IdentifySample, BadDataNeighborOrientation, ComputeBoundaryCells,
ErodeDilate*, NeighborOrientationCorrelation,
ReplaceElementAttributesWithNeighborValues, RequireMinimumSizeFeatures).
* Replace CreateDataStore + CreateResolvedDataStore with a single
  resolver-aware CreateDataStore(DataStructure, DataPath, ...) that
  always consults the registered format resolver. Old explicit-format
  overload deleted.
* Replace CreateListStore similarly so NeighborList backing storage
  is OOC-eligible when the OOC plugin is loaded and thresholds permit.
* Inline action-layer caller in ArrayCreationUtilities::CreateArray
  using GetIOCollection().createDataStoreWithType directly.
* Migrate 23 CreateResolvedDataStore call sites (mechanical rename).
* Migrate 13 cell-level test fixtures that were silently in-memory in
  OOC builds to the resolver-aware path so OOC builds actually exercise
  OOC stores.
* Migrate 6 in-memory non-test callers (ComputeFeatureCentroids scratch
  buffers, HDF5 readers in DataStoreIO and DatasetIO) to direct
  std::make_shared<DataStore<T>> since they have no DataStructure
  context.
* Migrate 2 NeighborListIO HDF5 readers to std::make_shared<ListStore<T>>
  for the same reason (in-core branch of the import pipeline).
* Wire CreateNeighbors action helper through the resolver-aware
  CreateListStore.
* Rewrite IOFormat.cpp tests to exercise the resolver path.

ImageGeom and RectGridGeom findElementSizes now route through the new
CreateDataStore so the voxel-sizes array can go OOC for very large
structured grids. RectGridGeom's inner loop also refactored from
per-voxel setValue calls to per-axis precompute + Z-slice
copyFromBuffer to avoid catastrophic OOC perf when the array is
OOC-backed.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
… Format tests

* DataIOCollection's constructor registers the OOC format via
  SimplnxOoc::registerIOManager under SIMPLNX_USE_OOC, so getManager("HDF5-OOC")
  resolves in the compile-time-switch OOC build.
* IOFormat: guard the in-core large-data-format preference tests to
  #ifndef SIMPLNX_USE_OOC (the OOC-build defaults are covered by SimplnxOoc's
  DataFormatPreferenceTest) and update the "not configured" assertion to the
  seeded k_InMemoryFormat default.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
The rebase onto upstream/develop brought in the parallelized in-core
ComputeFeatureSizes (tbb::combinable thread-local accumulation), while the
OOC branch had replaced that loop with serial chunked bulk I/O. Rather than
discard either, split the algorithm into the established Direct/Scanline
dispatch pattern so each storage backing uses its optimal strategy.

* ComputeFeatureSizesDirect: in-core parallel accumulation (the upstream
  ParallelDataAlgorithm + tbb::combinable Kahan-summation implementation)
* ComputeFeatureSizesScanline: out-of-core chunked copyIntoBuffer streaming
  (renamed from the former single ComputeFeatureSizes implementation)
* ComputeFeatureSizes: thin DispatchAlgorithm<Direct, Scanline> dispatcher
  selecting on whether the FeatureIds array is out-of-core
* Register both new algorithm units in the SimplnxCore CMakeLists
* Exercise both paths in the existing tests via ForceOocAlgorithmGuard +
  GENERATE(from_range(k_ForceOocTestValues))

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
The in-core build previously forced SIMPLNX_TEST_ALGORITHM_PATH to InCoreOnly
whenever SIMPLNX_USE_OOC was OFF, on the assumption that no out-of-core paths
exist to test. That is no longer true: the Direct/Scanline dispatch classes
are always compiled into the plugins, and forcing the Scanline (OOC) path runs
it against in-core data via copyIntoBuffer (a plain std::copy here), staying
fast while verifying the OOC algorithm matches the in-core result.

* Only coerce the nonsensical OocOnly (1) to InCoreOnly (2) when OOC is off
* Allow Both (0) so a single in-core build can validate both algorithm paths

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
DatasetIO::createEmptyDataset created the dataset with no creation
property list, so out-of-core arrays large enough to take the two-step
streaming write (createEmptyDataset + hyperslab writes) were always
written contiguous and uncompressed, even when WriteOptions requested
compression. The single-shot writeSpan path already applied it.

* Build the dataset creation property list via BuildChunkedDeflateDcpl
  in createEmptyDataset, matching writeSpan
* Preserves the existing fall-throughs to contiguous storage for
  compression level 0 and for arrays below the small-array threshold

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
The "Recovery file with all in-core data" test hardcoded
StoreType::InMemory and relied on ambient preferences, so it failed
whenever forceOocData was set: under forceOoc, the recovery file's
inline arrays correctly load as out-of-core stores backed by the
recovery file itself.

* Assert the expected store type via RequireExpectedStoreType, which
  tracks the active large-data preferences (OutOfCore under forceOoc,
  InMemory otherwise)
* Correct stale comments that claimed OOC was not compiled in

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add MemoryBudgetManager::maxBudgetBytes() (max(min(total-6GiB, 0.95*total), 1GiB)) and make setBudgetBytes() clamp the upper bound and report whether it clamped. Deduplicate the platform total-RAM ifdef through Memory::GetTotalMemory().

Apply the --memory-budget override to the manager in nxrunner so the cap and the override actually take effect in headless runs (previously the override was written only to Preferences, which the manager never reads in CLI mode).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the SIMPLNX_USE_OOC compile-time switch and the format/preference
plumbing that depended on it with runtime interfaces: an injectable store-format
resolver, IO-manager lifecycle hooks fanned out by DataIOCollection, and a
tri-state DataStorageMode preference. simplnx core no longer references any OOC
symbol or on-disk format name; out-of-core capability is supplied entirely by a
registered IO manager at runtime. 50 files changed, +1175 / -643 lines.

================================================================================
1. Store-format resolver abstraction
================================================================================
Files: src/simplnx/DataStructure/IO/Generic/IDataStoreFormatResolver.{hpp,cpp},
InMemoryFormatResolver.hpp, src/simplnx/Utilities/ArrayCreationUtilities.{hpp,cpp},
DataStoreUtilities.hpp, DataArrayUtilities.hpp, DataStructure.{hpp,cpp},
Filter/Actions/CreateNeighborListAction.{hpp,cpp}

Introduce IDataStoreFormatResolver, a const, thread-safe policy interface that
decides which registered format a soon-to-be-created array uses, returning ""
for the in-memory default. InMemoryFormatResolver is the trivial default policy.
ArrayCreationUtilities::ResolveStorageFormat becomes the single decision point
shared by every creation call site, applying a fixed order: the authoritative
unstructured/poly-geometry gate (ParentGeometrySupportsOoc) forces in-core,
then an explicit per-filter override wins, then the DataStructure's resolver
decides. DataStructure carries a per-instance resolver plus a lazily-seeded
process-wide default, neither serialized. CreateArray, CreateListStore, and
CreateNeighborListAction route through this helper; CreateNeighborListAction and
CreateNeighbors now thread an explicit dataFormat override through to the store.

================================================================================
2. Runtime IO-manager lifecycle hooks and DataIOCollection fan-out
================================================================================
Files: src/simplnx/DataStructure/IO/Generic/IDataIOManager.hpp,
DataIOCollection.{hpp,cpp}, IO/HDF5/DataStructureWriter.{hpp,cpp},
DataStructure/StringArray.{hpp,cpp}

Add no-op virtual lifecycle hooks to IDataIOManager (finalizesImport,
onImportFinalize, onRecoveryWrite, onFinalizeStores, setBaseDirectory,
shutdownManager) so an OOC manager can participate in import finalization,
recovery writes, store read-only transition, and shutdown without core knowing
the specifics. DataIOCollection aggregates these: finalizeStores now fans out to
every manager's hook instead of forwarding to a compiled-in SimplnxOoc call, and
new anyManagerFinalizesImport / onImportFinalize / onRecoveryWrite /
setBaseDirectory / shutdownManagers dispatch to the registered managers. The
HDF5 writer's recovery-write path calls the collection hook rather than a direct
SimplnxOoc function. StringArray gains an isPlaceholder() override.

================================================================================
3. DataStorageMode tri-state preference with legacy migration
================================================================================
Files: src/simplnx/Core/Preferences.{hpp,cpp}

Replace the largeDataFormat / forceOocData preference surface with a single
canonical DataStorageMode enum (Adaptive, ForceInCore, ForceOutOfCore) persisted
as an integer under data_storage_mode. The enum is deliberately OOC-vocabulary-
free: core states user intent, the OOC build maps it onto a concrete format.
dataStorageMode() is the single source of truth and migrates older preference
files from the retained legacy keys; useOocData() becomes a convenience view
(true unless ForceInCore). The cached m_UseOoc flag and checkUseOoc() are
removed.

================================================================================
4. Remove the SIMPLNX_USE_OOC compile-time switch from core
================================================================================
Files: CMakeLists.txt, cmake/SimplnxConfig.hpp.in, IO/HDF5/DataStoreIO.hpp,
Utilities/Parsing/DREAM3D/Dream3dIO.cpp

Drop the SIMPLNX_USE_OOC option, the SIMPLNX_OOC_SOURCE_DIR compile-in of the
private SimplnxOoc sources, and the OOC test-suite wiring from CMake. The
generated config header no longer defines the macro. All previously #ifdef'd
creation, spill-to-disk, and import code paths are now unconditional and route
through the runtime interfaces; the import path decides eager-vs-deferred load
via anyManagerFinalizesImport() instead of the macro.

================================================================================
5. Resolver-aware load overloads
================================================================================
Files: src/simplnx/Utilities/Parsing/DREAM3D/Dream3dIO.{hpp,cpp}

Add LoadDataStructure and LoadDataStructureArrays overloads that stamp a
per-DataStructure resolver before import finalization runs, so a caller (e.g. a
read-only visualization load) can direct arrays to disk-backed stores for fast
first-show. nullptr matches the existing no-resolver behavior.

================================================================================
6. Test migration and new coverage
================================================================================
Files: test/DataStoreFormatResolverTest.cpp, test/DataStorageModeMigrationTest.cpp,
test/CMakeLists.txt, test/IOFormat.cpp, test/Dream3dLoadingApiTest.cpp,
test/UnitTestCommon/UnitTestCommon.{hpp,cpp}, and the SimplnxCore /
OrientationAnalysis filter tests.

Migrate every test from setForceOocData/setLargeDataFormat to setDataStorageMode
/ DataStorageMode. PreferencesSentinel takes a DataStorageMode; the UnitTestCommon
load helper consults dataStorageMode() and anyManagerFinalizesImport(). New tests
cover the resolver (InMemoryFormatResolver, per-instance vs process-default
isolation, ParentGeometrySupportsOoc) and the legacy-key migration.

================================================================================
Verification
================================================================================
No build or test run was performed as part of this squash. The squash is
verified content-faithful: the squashed commit's tree is confirmed identical to
the original range tip (step 8).
…d_alloc -272 net)

Add a non-blocking preflight warning (-271) when an in-core array would exceed
currently-available RAM (OOC arrays excluded; EmptyDataStore::memoryUsage() now
reports 0 for out-of-core placeholders, format resolved in preflight via a shared
ResolveStorageFormat helper), and a std::bad_alloc safety net (-272) at the single
IFilter::execute -> executeImpl boundary so an out-of-memory condition is a clean
pipeline error instead of a crash. Additive to the existing -264 total-RAM hard block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After the out-of-core resolver refactor (f68689d), Dream3dIO.cpp exceeds MSVC's
default COMDAT section limit and fails to compile in Debug with C1128. /bigobj
raises the limit; Release was unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the inline "Automatic" / "In Memory" strings in DataIOCollection with
k_AutomaticDisplayName / k_InMemoryDisplayName constants so the labels have a
single definition. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SliceBufferedTransfer.hpp and SimplnxCore's IdentifySampleCommon.hpp were added on
this branch but never listed in any CMake, so they were invisible in the IDE. Add
them to SIMPLNX_HDRS and PLUGIN_EXTRA_SOURCES respectively, where the existing
source_group(TREE ...) auto-groups them (simplnx/Utilities, Filters/Algorithms).

Also lift simplnx_test's inline source list into a variable and source_group it
under "test" / "Generated", mirroring the plugin-test convention in
cmake/Plugin.cmake. No build/behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
zlib was previously present only transitively, through the hdf5
dependency's "zlib" feature. The out-of-core layer that consumers
compile into libsimplnx (via SimplnxOoc's OOC.cmake) now uses zlib
directly: DeflateChunkLoader inflates raw HDF5 deflate chunks with
uncompress() so chunk decompression can run in parallel, off the
global HDF5 mutex, and OOC.cmake links ZLIB::ZLIB via
find_package(ZLIB). Declaring zlib as a direct dependency records
that direct use in the manifest and keeps the build from silently
breaking if hdf5's feature set ever drops the transitive pull-in.
* Append each plugin's unit-test target to the SIMPLNX_UNIT_TEST_TARGETS
  global property in create_simplnx_plugin_unit_test()
* Lets consumer builds that include simplnx via add_subdirectory attach
  additional sources or settings to the test executables without simplnx
  knowing about the consumer

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
* Replace the per-cell loop over FeatureIds and CellPhases with
  chunk-sequential copyIntoBuffer reads (bounded 64K-tuple buffers) and
  a single bulk copyFromBuffer write of the feature-level result
* Replace the per-cell std::map lookup with feature-level vectors;
  warning semantics and output are unchanged (warning set membership is
  identical under previous-value comparison, and the last phase seen
  still wins)
* Move the cancel check to the per-chunk loop and add throttled
  progress messaging
* Request disk-backed stores in the filter test via PreferencesSentinel
  so the OOC build exercises the filter against HDF5-backed data
* Benchmark (748,800 cells): disk-backed stores 11.7 s -> 35 ms,
  in-memory stores 8 ms -> 1 ms

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…base

* Pass the ColorKey choice through the ComputeIPFColorsScanline
  generateIPFColor call so non-TSL color keys reach the OOC dispatch
  path (the in-core Direct path already forwarded it)
* Port the V&V ColorKey plumbing test to the LoadDataStructure API
  that replaced ImportDataStructureFromFile on this branch
* Drop a stale comment reference to the removed
  ImportDataStructureFromFile in Dream3dIO.cpp

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from ec3a5e7 to 3c8903e Compare June 12, 2026 15:32
* defaultBudgetBytes() returned 50% of RAM without checking
  maxBudgetBytes(), so on machines under 12 GiB the 6 GiB reserve made
  the cap smaller than the default (e.g. 7 GiB CI runners: 3.5 GiB
  default vs 1 GiB cap), failing the cap-and-clamping unit test
* The constructor seeds the budget directly from the default, so such
  machines also ran with an over-cap budget at startup
* Clamp the default to the cap; machines with 12 GiB or more are
  unaffected

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants