Skip to content

Fix escaped string parsing in numpy headers#6380

Open
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:fix-numpy-header-escape-oob
Open

Fix escaped string parsing in numpy headers#6380
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:fix-numpy-header-escape-oob

Conversation

@JanuszL
Copy link
Copy Markdown
Contributor

@JanuszL JanuszL commented Jun 1, 2026

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Fixes malformed .npy header parsing when a string value ends with an
unfinished backslash escape.

ParseStringValue advanced to the string terminator while handling the
escape and then allowed the loop increment to move past the terminator. The
parser now rejects a trailing escape before continuing, so malformed headers
fail cleanly instead of reading past the header buffer.

Additional information:

Affected modules and functionalities:

  • dali/util/numpy.cc: reject an unfinished escape sequence in numpy header
    string values.
  • dali/util/numpy_test.cc: add parser coverage for a descr field ending
    with a backslash escape.

Key points relevant for the review:

Please check that the pointer advancement in ParseStringValue still
preserves the existing behavior for supported escape sequences while rejecting
the unterminated escape case before the loop increment runs.

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Ran:

cmake --build compile/dali --target dali_test -j"$(nproc)"
DALI_EXTRA_PATH=/home/jlisiecki/Dali/DALI_extra \
LD_LIBRARY_PATH=/home/jlisiecki/Dali/dali/compile/dali/python/nvidia/dali \
compile/dali/python/nvidia/dali/test/dali_test.bin --gtest_filter=NumpyLoaderTest.*

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Jun 1, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

This PR fixes a buffer-overread in ParseStringValue (in dali/util/numpy.cc) that occurred when a .npy header string ended with a lone backslash escape. The original switch (*++input) advanced past the null terminator into uninitialized memory; the fix splits the increment from the switch and adds a DALI_ENFORCE guard before the switch.

  • dali/util/numpy.cc: Increments input explicitly, then asserts *input != '\\0' before dispatching the switch — all supported escape sequences (\\\\, \\', \ , \ , \\\") and the default path are behaviorally unchanged.
  • dali/util/numpy_test.cc: Adds \"{'descr':'<f4\\\\" (trailing backslash) to the error-case vector and updates the copyright year to 2026.

Confidence Score: 5/5

The change is a minimal, targeted fix to a single parser function with no side effects on callers or data paths.

The fix correctly splits the combined increment-and-dereference into two steps, adds a null-terminator guard before the switch, and leaves all existing escape-sequence paths unchanged. The new GTest entry directly exercises the previously crashing input. No regressions are expected.

No files require special attention.

Important Files Changed

Filename Overview
dali/util/numpy.cc Fixes out-of-bounds read in ParseStringValue by checking for null terminator after incrementing past a backslash escape before entering the switch; logic and all existing escape sequences are preserved.
dali/util/numpy_test.cc Adds a regression test for the trailing-escape bug and updates the copyright year; the new entry is correctly comma-separated in the error-case vector.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["for loop: *input != '\\0'"] --> B{"*input == '\\\\'?"}
    B -- No --> C{"*input == delim_end?"}
    C -- Yes --> D[break]
    C -- No --> E["out += *input"]
    E --> A
    B -- Yes --> F["input++ (advance past backslash)"]
    F --> G{"*input == '\\0'?"}
    G -- Yes --> H["DALI_ENFORCE fails\n(new guard — trailing escape rejected)"]
    G -- No --> I["switch(*input)\n(process escape sequence)"]
    I --> A
    D --> J["DALI_ENFORCE *input == delim_end"]
    J --> K[return out]
Loading

Reviews (3): Last reviewed commit: "Fix escaped string parsing in numpy head..." | Re-trigger Greptile

Reject a trailing escape in numpy header string values before the
parser advances past the null terminator. This prevents malformed
.npy headers from triggering an out-of-bounds read during header
parsing.

Add a focused parser regression for a descr field ending with an
escape character.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL force-pushed the fix-numpy-header-escape-oob branch from 177e4a5 to 8cd7b9b Compare June 1, 2026 06:03
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented Jun 1, 2026

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [53264998]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [53264998]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

important-fix Fixes an important issue in the software or development environment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants