Calculate num_strings based on all dimensions for OrtStringViewTensor… by bhkumar007 · Pull Request #1038 · microsoft/onnxruntime-extensions

bhkumar007 · 2026-03-23T20:37:52Z

Summary
OrtStringViewTensorStorage computes num_strings using only the first shape dimension (shape[0]) instead of the product of all dimensions. This causes a heap buffer overflow when processing string tensors with rank ≥ 2.

Bug
In custom_op_lite.h, the OrtStringViewTensorStorage constructor calculates the number of strings as:

size_t num_strings = 1;
if ((*shape_).size() > 0) {
    num_strings = static_cast<size_t>((*shape_)[0]);  // ← only first dimension
}

This under-counts the total number of strings for any tensor with rank ≥ 2. The under-sized offsets vector is then passed to GetStringTensorContent, which writes beyond its bounds.

For example, a string tensor with shape [2, 3] contains 6 strings, but this code allocates offsets for only 2.

Note: the sibling class OrtStringTensorStorage computes this correctly by multiplying all dimensions.

Fix
Replace the partial dimension calculation with a simple loop over all shape dimensions:

While String tensors in ONNX are often rank 1 in practice, The ORT API doesn't care about intent. GetStringTensorContent expects offsets_count to equal the actual total number of strings in the tensor. If ORT has a [2, 3] tensor with 6 strings and you pass an offsets buffer of size 2, it writes 6 entries into a 2-slot buffer. There's no graceful "I only want the first dimension's worth" mode — it's an unconditional overflow. Also, the sibling class OrtStringTensorStorage handles all dimensions, and both classes implement the same IStringTensorStorage interface — callers have no way to know which backing class they're hitting

…Storage Summary OrtStringViewTensorStorage computes num_strings using only the first shape dimension (shape[0]) instead of the product of all dimensions. This causes a heap buffer overflow when processing string tensors with rank ≥ 2. Bug In custom_op_lite.h, the OrtStringViewTensorStorage constructor calculates the number of strings as: size_t num_strings = 1; if ((*shape_).size() > 0) { num_strings = static_cast<size_t>((*shape_)[0]); // ← only first dimension } This under-counts the total number of strings for any tensor with rank ≥ 2. The under-sized offsets vector is then passed to GetStringTensorContent, which writes beyond its bounds. For example, a string tensor with shape [2, 3] contains 6 strings, but this code allocates offsets for only 2. Note: the sibling class OrtStringTensorStorage computes this correctly by multiplying all dimensions. Fix Replace the partial dimension calculation with a simple loop over all shape dimensions: While String tensors in ONNX are often rank 1 in practice, The ORT API doesn't care about intent. GetStringTensorContent expects offsets_count to equal the actual total number of strings in the tensor. If ORT has a [2, 3] tensor with 6 strings and you pass an offsets buffer of size 2, it writes 6 entries into a 2-slot buffer. There's no graceful "I only want the first dimension's worth" mode — it's an unconditional overflow. Also, the sibling class OrtStringTensorStorage handles all dimensions, and both classes implement the same IStringTensorStorage interface — callers have no way to know which backing class they're hitting

bhkumar007 requested a review from a team as a code owner March 23, 2026 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate num_strings based on all dimensions for OrtStringViewTensor…#1038

Calculate num_strings based on all dimensions for OrtStringViewTensor…#1038
bhkumar007 wants to merge 1 commit intomicrosoft:mainfrom
bhkumar007:patch-3

bhkumar007 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bhkumar007 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant