Skip to content

[Feature] Base/Row alignment support for Tensors#136

Open
zacharyvincze wants to merge 43 commits intoROCm:developfrom
zacharyvincze:zv/feature/add-memalign-class
Open

[Feature] Base/Row alignment support for Tensors#136
zacharyvincze wants to merge 43 commits intoROCm:developfrom
zacharyvincze:zv/feature/add-memalign-class

Conversation

@zacharyvincze
Copy link
Copy Markdown
Contributor

PR Description

General

  • Adds padding (row/base pointer alignment) support for roccv::Tensor.
  • Adds MemAlign class which is used as a parameter when allocating memory for Tensors. This specifies what the row alignment and base alignment should be for newly created tensors.

Samples

  • Changes support reading/writing images from host <-> device with tensor padding in mind.
  • Improves help messages and command line argument parsing across all samples.

Tensors

  • Stride calculation logic changed to take non-contiguous data (row padding) into account.
  • Reshape method implementation changed to take into account non-contiguous tensors. Ensures that the shape cannot be changed across the row padding boundary, as this is supposed to be a zero-copy operation which retains the initial memory layout.
  • Quality of life methods for checking whether a tensor is contiguous, and for returning the amount of non-padded (non-garbage) bytes being used by the tensor.
  • Adds copyToHost and copyFromHost methods to the tensor, which accepts and produces contiguous host buffers. Since padding makes copying memory from the host to the device non-trivial, this allows the user to perform a copy which is agnostic to the underlying row padding that a tensor may have.

Python bindings (rocpycv)

  • Adds correctness changes to DLPack conversion. Checks whether strides are provided by DLPack first, if so, uses them directly after converting from element-wise to byte-wise strides, otherwise, assumes a contiguous tensor is being provided by DLPack and recalculates with that assumption.

Tests

  • Helpers which copy from host to device now use Tensor::copyFromHost, Tensor::copyToHost directly instead of using duplicate code.
  • Adds Tensor unit tests to ensure padding is calculated properly, and that any additional methods added to roccv::Tensor are covered. Special attention is paid to the reshape method, since this has undergone the most changes.

@zacharyvincze zacharyvincze self-assigned this Mar 10, 2026
@zacharyvincze zacharyvincze added the enhancement New feature or request label Mar 10, 2026
@simonCatBot
Copy link
Copy Markdown

Review: [Feature] Base/Row alignment support for Tensors

Major feature addition for Tensor memory layout:

Key additions:

  • MemAlign class for memory allocation control
  • Row/base pointer alignment support
  • Non-contiguous tensor handling with proper stride calculations
  • copyToHost/copyFromHost for padded tensor handling
  • Updated reshape logic respecting padding boundaries
  • DLPack stride conversion fixes
  • Comprehensive unit tests

Assessment: Needs Review - Large architectural change.

This is a substantial PR (+2010/-1165 lines across 24 files) that changes core tensor semantics. The approach looks solid - explicit handling of padding makes the API clearer. Questions for maintainers:

  1. Backward compatibility with existing code?
  2. Performance impact of stride calculations?
  3. Memory overhead of the alignment system?

Well documented and includes tests. Ready for deep review.

@zacharyvincze zacharyvincze marked this pull request as ready for review March 23, 2026 22:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds base-pointer and per-row alignment (padding) support to roccv::Tensor, updating stride/reshape semantics and introducing host<->device copy helpers that are padding-aware. It also updates samples, Python bindings, benchmarks, and tests to operate correctly with non-contiguous (row-padded) tensors.

Changes:

  • Introduce MemAlignment and update Tensor allocation/stride/reshape logic to support row/base alignment and non-contiguous layouts.
  • Add Tensor::copyFromHost / Tensor::copyToHost and refactor tests/helpers/samples to use padding-aware copies.
  • Update Python DLPack import to honor provided strides (and convert element-strides to byte-strides), with a contiguous fallback.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/core/tensor.cpp Core implementation updates: alignment-aware stride calc, reshape for non-contiguous tensors, host copy helpers.
include/core/tensor.hpp Public API updates for alignment-aware construction + new copy/contiguity helpers.
include/core/mem_alignment.hpp / src/core/mem_alignment.cpp New alignment configuration type used during tensor allocation.
include/core/utils.hpp New low-level helpers (power-of-two + align-up) used for alignment/stride logic.
src/core/tensor_storage.cpp / include/core/tensor_storage.hpp Expose allocator used by TensorStorage.
src/op_non_max_suppression.cpp Update reshape call sites to new reshape(dtype, shape) signature.
python/src/py_tensor.cpp / python/include/py_tensor.hpp DLPack import correctness fixes + documentation/typing cleanup.
tests/roccv/cpp/src/tests/core/tensor/test_tensor.cpp Add/adjust tensor tests for padding/reshape/copy helpers.
tests/roccv/cpp/include/test_helpers.hpp Refactor host<->device copy helpers to use new tensor copy APIs.
samples/common/utils.hpp Add batch image load/write utilities that respect tensor padding.
samples/*.cpp (multiple) Update CLI parsing + use new image IO helpers (padding-aware).
benchmarks/src/roccv/roccv_bench_helpers.cpp Update benchmark filling logic to account for padded allocation sizes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.87234% with 52 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/core/tensor.cpp 76.47% 33 Missing and 15 partials ⚠️
include/core/utils.hpp 78.57% 3 Missing ⚠️
src/core/tensor_storage.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #136      +/-   ##
===========================================
+ Coverage    73.88%   73.97%   +0.09%     
===========================================
  Files           77       80       +3     
  Lines         2998     3189     +191     
  Branches       645      683      +38     
===========================================
+ Hits          2215     2359     +144     
- Misses         339      373      +34     
- Partials       444      457      +13     
Files with missing lines Coverage Δ
include/core/mem_alignment.hpp 100.00% <100.00%> (ø)
include/core/tensor.hpp 62.50% <ø> (ø)
src/core/mem_alignment.cpp 100.00% <100.00%> (ø)
src/op_non_max_suppression.cpp 41.18% <100.00%> (ø)
src/core/tensor_storage.cpp 81.48% <0.00%> (-3.13%) ⬇️
include/core/utils.hpp 78.57% <78.57%> (ø)
src/core/tensor.cpp 72.53% <76.47%> (+1.23%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:precheckin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants