fix(pnm,jxl): Prevent PNM/JXL readers from loading arbitrarily large non-image files by maxwelliverson · Pull Request #5203 · AcademySoftwareFoundation/OpenImageIO

maxwelliverson · 2026-05-16T23:49:06Z

Description

PNM

The PNM reader did not provide an optimized valid_file implementation. While PNMInput::open would perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with having open unconditionally read arbitrarily large files into memory, this also meant that any call to valid_file would necessarily do the same (as the default implementation delegates to open when ioproxy support is present).

To fix these issues, the following changes were made:

Provide an efficient implementation of valid_file for PNMInput that only loads the header, and then validates magic number/dimensions.
In PNMInput::open, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid.
Added a rough limit to header read size of 1KiB.
Added a rough limit to full file read size of 1GiB.

Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing altogether.

JPEGXL

The JXL reader already provided an efficient valid_file override, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated in JxlInput::open before attempting to decode the file.
The process of JXL decoding appears to read the entire contents of the file to memory (at least for some non-JXL inputs). Taken in combination, this means that when JXLInput::open was called on arbitrarily large non-JXL inputs, the entire file was being read into memory before open would fail.

As a simple fix for this, JxlInput::open now checks the result of valid_file before attempting any decoding.

Tests

I did not add a new testsuite case for this, but did test on my end that for both PNM and JPEGXL, large invalid files are no longer being read into memory, while valid images (of their respective types) are still read/opened successfully.

Checklist:

I have read the guidelines on contributions and code review procedures.
I have read the Policy on AI Coding Assistants
and if I used AI coding assistants, I have an Assisted-by: TOOL / MODEL
line in the pull request description above.
I have updated the documentation if my PR adds features or changes
behavior.
I am sure that this PR's changes are tested in the testsuite.
I have run and passed the testsuite in CI before submitting the
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
My code follows the prevailing code style of this project and I
fixed any problems reported by the clang-format CI test.
If I added or modified a public C++ API call, I have also amended the
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.

…memory The PNM reader did not provide an efficient `valid_file` implementation. While `PNMInput::open` would perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with having `open` unconditionally read arbitrarily large files into memory, this also meant that any call to `valid_file` would necessarily do the same (as the default implementation delegates to `open` when ioproxy support is present). To fix these issues, the following changes were made: - Provide an efficient implementation of `valid_file` for `PNMInput` that only loads the header, and then validates magic number/dimensions. - In `PNMInput::open`, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid. - Added a rough limit to header read size of 1KiB. - Added a rough limit to full file read size of 1GiB. Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing alltogether. Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

…memory The JXL reader already provided an efficient `valid_file` override, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated in `JxlInput::open` before attempting to decode the file. The process of JXL decoding appears to read the entire contents of the file to memory (at least when provided non-JXL input). Taken in combination, this means that when `JXLInput::open` was called on arbitrarily large non-JXL inputs, the entire file was being read into memory before open would fail. As a simple fix for this, `JxlInput::open` now checks the result of `valid_file` before attempting any decoding. Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

linux-foundation-easycla · 2026-05-16T23:49:12Z

The committers listed above are authorized under a signed CLA.

✅ login: maxwelliverson / name: maxwelliverson (0e6281d, 22bd8de, 473b6fe, 4b9adb6, 5eb6016, 6ce2997, 9812245, 9b1a08b, df4869c)

maxwelliverson · 2026-05-17T00:18:14Z

Running into issues at the moment with easycla; following the "please click here to be authorized" link above takes me to a page that starts to load something before quickly going all white. Checking the console indicates that there's possibly some authentication error of some kind?
I've enabled and authorized the github integration, ensured that third party cookies are enabled in case needed for this service to work, tried on different browsers on different hosts, all the same....

Help I've been unpersoned by the Linux Foundation

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

lgritz · 2026-05-17T00:43:17Z

@maxwelliverson I checked, you're definitely on the list for this project. Are you sure that the email you used (your home gmail) on the commits is the same as what GitHub itself thinks you use? If so, then try the "Please click here to be authorized" again? Maybe paste that link into an incognito window just in case your normal browser has some cookie or plugin that's interfering with the site somehow?

If all that fails, then we'll poke LF for help. Sometimes something get wedged.

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

maxwelliverson · 2026-05-17T01:02:13Z

Are you sure that the email you used (your home gmail) on the commits is the same as what GitHub itself thinks you use?

Yup definitely, this is the primary email associated with this account.

Maybe paste that link into an incognito window just in case your normal browser has some cookie or plugin that's interfering with the site somehow?

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor. Unfortunately, it then also warns me that the site needs cookies to work and is likely to not be functional in a private window, and indeed the "proceed" buttons are greyed-out and disabled....

Given this seems like some issue with browser state, I'm going to try purging some caches/disabling plugins to see if that does the trick.

… defs Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

lgritz · 2026-05-17T00:51:36Z

+
+
+
+using PNMBasicInfo = std::tuple<PNMType, int, int>; // type, width, height


Is there a reason not to just define a tiny helper struct that's more self-documenting? Like

struct PNMBasicInfo { PNMtype type; int width; int height; };

You could also add an Err value to the enum, and then you wouldn't need optional<> at all because the struct would tell you it's invalid because of an error.

No real strong reason other than a personal affinity for the following construct

if (auto result = returns_optional_tuple()) { // unpack *result }

But I think you're right that the self documenting nature of a simple struct makes more sense here.

I've made a change to swap out the tuple for a struct, but before I commit/push, I wanted to ask if there was any particular reason we want to avoid using std::optional? I personally think it does make for clearer code than having type double as an error indicator, but I can appreciate there may be reasons to not want to introduce it (eg. avoid reliance on c++17-features-not-named-string_view)

I think it's mostly historical. std::optional is a C++17 feature, and it's only fairly recently (OIIO 3.0) that we raised our minimum supported C++ to 17. So it really just hasn't come up, since all the major APIs were in place already. I guess that I, in particular, just still haven't worked it into my standard toolbox.

lgritz · 2026-05-17T01:00:04Z

+static string_view
+read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,
+                          string_view remaining)


Command and slight renaming suggestion:

Suggested change

static string_view

read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,

string_view remaining)

// buffer contains at most the first 1K of the file. At this point, we know

// the file seems valid. Read the rest in, appending to what we have, and

// return the adjusted string_view of the conents.

static string_view

append_remainder_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,

string_view remaining)

I'm also curious why you didn't make this a method of PNMInput. Would that simplify anything? Including making it so that you didn't need to pull those other methods out of the class and turn them into standalone functions when they needed no other changes?

This was something I was conscious of, but hadn't managed to convince myself 100% one way or the other.
My personal preference for code organization tends towards something along the lines of "a function should only be a (possibly static) method if it requires access to class internals, or if it forms part of the interface of the class". For this reason, I made read_header_into_buffer a free function (as the buffer in question would not necessarily be m_file_contents). Then, for parity with read_header_into_buffer, read_image_data_to_buffer was also made a free function with a nearly identical declaration. It could, however, just as easily have been a private method, and perhaps does make more sense as such, as it is only ever called on the private data members of the class anyways...

I think your instinct is mostly correct, and for fresh code I would also lean toward free functions for things that don't need class internals. Though the counter-weight to that is a desire to use methods as a kind of scoping -- if it's never called by anyone but methods of this one class, keep it in the most local scope possible.

I don't feel strongly about it. It doesn't need access to class internals, but it is operating on data members (as it's used, in the one and only place it's called). Do whatever you think makes the code simplest and easiest to read.

I think that makes a lot of sense. I just pushed another commit that I think strikes a decent balance: both functions are now static methods of PNMInput, but are otherwise unchanged. I also moved their definitions further down so they're just before valid_file and open.

lgritz · 2026-05-17T01:08:20Z

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor.

I can't tell if you're just telling me how far you got, or expressing that you don't know which to choose. Choose corporate.

maxwelliverson · 2026-05-17T01:12:08Z

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor.

I can't tell if you're just telling me how far you got, or expressing that you don't know which to choose. Choose corporate.

Sorry, I should have been more clear there, I do know that I need to choose corporate. When I was saying that the "proceed" buttons were greyed out/disabled, I meant that those two buttons specifically were disabled, ie. I'm unable to choose either

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

…unctions Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

…:open Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

maxwelliverson · 2026-05-17T03:12:43Z

Finally managed to get the CLA sorted out! It seemed to have been some weird issue with my linux foundation account in general; going to any linux foundation login page gave me the same blank screen no matter which device/browser I tried from.

The only thing that eventually worked was.... downloading a brand new browser I had never used before and using that

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

…thods Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

maxwelliverson added 2 commits May 16, 2026 16:19

fix: Fix failed std::min template type deduction on MacOS

4b9adb6

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

lgritz reviewed May 17, 2026

View reviewed changes

Comment thread src/pnm.imageio/pnminput.cpp Outdated

style: Adjusted PNMInput changes to better adhere to clang-format rules

0e6281d

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

style: Removed unnecessary protection against windows.h min/max macro…

5eb6016

… defs Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

lgritz reviewed May 17, 2026

View reviewed changes

Comment thread src/pnm.imageio/pnminput.cpp Outdated

maxwelliverson added 4 commits May 16, 2026 18:39

style: Fixed a couple clang-format issues that were previously missed

df4869c

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

style: Added comments/symbol rename to better document new PNM read f…

22bd8de

…unctions Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

fix: Fixed missed reference to renamed append_remainder_to_buffer symbol

9b1a08b

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

refactor: Move definition of PNMInput::valid_file to before PNMInput:…

9812245

…:open Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

maxwelliverson added 3 commits May 16, 2026 20:39

refactor: Change PNMBasicInfo from std::tuple alias to POD struct

6f318c7

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

style: Conform to clang-format once and for all

246780f

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>

refactor: Move PNM reading functions into PNMInput class as static me…

0fc67b1

…thods Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>




		using PNMBasicInfo = std::tuple<PNMType, int, int>; // type, width, height

Conversation

maxwelliverson commented May 16, 2026

Description

PNM

JPEGXL

Tests

Checklist:

Uh oh!

linux-foundation-easycla Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxwelliverson commented May 17, 2026

Uh oh!

lgritz commented May 17, 2026

Uh oh!

Uh oh!

maxwelliverson commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgritz May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgritz commented May 17, 2026

Uh oh!

Uh oh!

maxwelliverson commented May 17, 2026

Uh oh!

maxwelliverson commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linux-foundation-easycla Bot commented May 16, 2026 •

edited

Loading

maxwelliverson commented May 17, 2026 •

edited

Loading

lgritz May 17, 2026 •

edited

Loading