Skip to content

fix(pnm,jxl): Prevent PNM/JXL readers from loading arbitrarily large non-image files#5203

Open
maxwelliverson wants to merge 12 commits into
AcademySoftwareFoundation:mainfrom
maxwelliverson:pnm-jxl-input-safeguards
Open

fix(pnm,jxl): Prevent PNM/JXL readers from loading arbitrarily large non-image files#5203
maxwelliverson wants to merge 12 commits into
AcademySoftwareFoundation:mainfrom
maxwelliverson:pnm-jxl-input-safeguards

Conversation

@maxwelliverson
Copy link
Copy Markdown

Description

PNM

The PNM reader did not provide an optimized valid_file implementation. While PNMInput::open would perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with having open unconditionally read arbitrarily large files into memory, this also meant that any call to valid_file would necessarily do the same (as the default implementation delegates to open when ioproxy support is present).

To fix these issues, the following changes were made:

  • Provide an efficient implementation of valid_file for PNMInput that only loads the header, and then validates magic number/dimensions.
  • In PNMInput::open, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid.
  • Added a rough limit to header read size of 1KiB.
  • Added a rough limit to full file read size of 1GiB.

Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing altogether.

JPEGXL

The JXL reader already provided an efficient valid_file override, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated in JxlInput::open before attempting to decode the file.
The process of JXL decoding appears to read the entire contents of the file to memory (at least for some non-JXL inputs). Taken in combination, this means that when JXLInput::open was called on arbitrarily large non-JXL inputs, the entire file was being read into memory before open would fail.

As a simple fix for this, JxlInput::open now checks the result of valid_file before attempting any decoding.

Tests

I did not add a new testsuite case for this, but did test on my end that for both PNM and JPEGXL, large invalid files are no longer being read into memory, while valid images (of their respective types) are still read/opened successfully.

Checklist:

  • I have read the guidelines on contributions and code review procedures.
  • I have read the Policy on AI Coding Assistants
    and if I used AI coding assistants, I have an Assisted-by: TOOL / MODEL
    line in the pull request description above.
  • I have updated the documentation if my PR adds features or changes
    behavior.
  • I am sure that this PR's changes are tested in the testsuite.
  • I have run and passed the testsuite in CI before submitting the
    PR, by pushing the changes to my fork and seeing that the automated CI
    passed there. (Exceptions: If most tests pass and you can't figure out why
    the remaining ones fail, it's ok to submit the PR and ask for help. Or if
    any failures seem entirely unrelated to your change; sometimes things break
    on the GitHub runners.)
  • My code follows the prevailing code style of this project and I
    fixed any problems reported by the clang-format CI test.
  • If I added or modified a public C++ API call, I have also amended the
    corresponding Python bindings. If altering ImageBufAlgo functions, I also
    exposed the new functionality as oiiotool options.

…memory

The PNM reader did not provide an efficient `valid_file` implementation. While `PNMInput::open` would perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with having `open` unconditionally read arbitrarily large files into memory, this also meant that any call to `valid_file` would necessarily do the same (as the default implementation delegates to `open` when ioproxy support is present).

To fix these issues, the following changes were made:
- Provide an efficient implementation of `valid_file` for `PNMInput` that only loads the header, and then validates magic number/dimensions.
- In `PNMInput::open`, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid.
- Added a rough limit to header read size of 1KiB.
- Added a rough limit to full file read size of 1GiB.

Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing alltogether.

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…memory

The JXL reader already provided an efficient `valid_file` override, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated in `JxlInput::open` before attempting to decode the file.
The process of JXL decoding appears to read the entire contents of the file to memory (at least when provided non-JXL input). Taken in combination, this means that when `JXLInput::open` was called on arbitrarily large non-JXL inputs, the entire file was being read into memory before open would fail.

As a simple fix for this, `JxlInput::open` now checks the result of `valid_file` before attempting any decoding.

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 16, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

@maxwelliverson
Copy link
Copy Markdown
Author

Running into issues at the moment with easycla; following the "please click here to be authorized" link above takes me to a page that starts to load something before quickly going all white. Checking the console indicates that there's possibly some authentication error of some kind?
I've enabled and authorized the github integration, ensured that third party cookies are enabled in case needed for this service to work, tried on different browsers on different hosts, all the same....

Help I've been unpersoned by the Linux Foundation

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
@lgritz
Copy link
Copy Markdown
Collaborator

lgritz commented May 17, 2026

@maxwelliverson I checked, you're definitely on the list for this project. Are you sure that the email you used (your home gmail) on the commits is the same as what GitHub itself thinks you use? If so, then try the "Please click here to be authorized" again? Maybe paste that link into an incognito window just in case your normal browser has some cookie or plugin that's interfering with the site somehow?

If all that fails, then we'll poke LF for help. Sometimes something get wedged.

Comment thread src/pnm.imageio/pnminput.cpp Outdated
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
@maxwelliverson
Copy link
Copy Markdown
Author

maxwelliverson commented May 17, 2026

Are you sure that the email you used (your home gmail) on the commits is the same as what GitHub itself thinks you use?

Yup definitely, this is the primary email associated with this account.

Maybe paste that link into an incognito window just in case your normal browser has some cookie or plugin that's interfering with the site somehow?

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor. Unfortunately, it then also warns me that the site needs cookies to work and is likely to not be functional in a private window, and indeed the "proceed" buttons are greyed-out and disabled....

Given this seems like some issue with browser state, I'm going to try purging some caches/disabling plugins to see if that does the trick.

… defs

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Comment thread src/pnm.imageio/pnminput.cpp Outdated



using PNMBasicInfo = std::tuple<PNMType, int, int>; // type, width, height
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just define a tiny helper struct that's more self-documenting? Like

struct PNMBasicInfo {
    PNMtype type;
    int width;
    int height;
};

You could also add an Err value to the enum, and then you wouldn't need optional<> at all because the struct would tell you it's invalid because of an error.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real strong reason other than a personal affinity for the following construct

if (auto result = returns_optional_tuple()) {
    // unpack *result
}

But I think you're right that the self documenting nature of a simple struct makes more sense here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a change to swap out the tuple for a struct, but before I commit/push, I wanted to ask if there was any particular reason we want to avoid using std::optional? I personally think it does make for clearer code than having type double as an error indicator, but I can appreciate there may be reasons to not want to introduce it (eg. avoid reliance on c++17-features-not-named-string_view)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's mostly historical. std::optional is a C++17 feature, and it's only fairly recently (OIIO 3.0) that we raised our minimum supported C++ to 17. So it really just hasn't come up, since all the major APIs were in place already. I guess that I, in particular, just still haven't worked it into my standard toolbox.

Comment thread src/pnm.imageio/pnminput.cpp Outdated
Comment thread src/pnm.imageio/pnminput.cpp Outdated
Comment on lines +125 to +127
static string_view
read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,
string_view remaining)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command and slight renaming suggestion:

Suggested change
static string_view
read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,
string_view remaining)
// buffer contains at most the first 1K of the file. At this point, we know
// the file seems valid. Read the rest in, appending to what we have, and
// return the adjusted string_view of the conents.
static string_view
append_remainder_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io,
string_view remaining)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious why you didn't make this a method of PNMInput. Would that simplify anything? Including making it so that you didn't need to pull those other methods out of the class and turn them into standalone functions when they needed no other changes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something I was conscious of, but hadn't managed to convince myself 100% one way or the other.
My personal preference for code organization tends towards something along the lines of "a function should only be a (possibly static) method if it requires access to class internals, or if it forms part of the interface of the class". For this reason, I made read_header_into_buffer a free function (as the buffer in question would not necessarily be m_file_contents). Then, for parity with read_header_into_buffer, read_image_data_to_buffer was also made a free function with a nearly identical declaration. It could, however, just as easily have been a private method, and perhaps does make more sense as such, as it is only ever called on the private data members of the class anyways...

Copy link
Copy Markdown
Collaborator

@lgritz lgritz May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your instinct is mostly correct, and for fresh code I would also lean toward free functions for things that don't need class internals. Though the counter-weight to that is a desire to use methods as a kind of scoping -- if it's never called by anyone but methods of this one class, keep it in the most local scope possible.

I don't feel strongly about it. It doesn't need access to class internals, but it is operating on data members (as it's used, in the one and only place it's called). Do whatever you think makes the code simplest and easiest to read.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes a lot of sense. I just pushed another commit that I think strikes a decent balance: both functions are now static methods of PNMInput, but are otherwise unchanged. I also moved their definitions further down so they're just before valid_file and open.

@lgritz
Copy link
Copy Markdown
Collaborator

lgritz commented May 17, 2026

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor.

I can't tell if you're just telling me how far you got, or expressing that you don't know which to choose. Choose corporate.

Comment thread src/pnm.imageio/pnminput.cpp Outdated
@maxwelliverson
Copy link
Copy Markdown
Author

Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor.

I can't tell if you're just telling me how far you got, or expressing that you don't know which to choose. Choose corporate.

Sorry, I should have been more clear there, I do know that I need to choose corporate. When I was saying that the "proceed" buttons were greyed out/disabled, I meant that those two buttons specifically were disabled, ie. I'm unable to choose either

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…unctions

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…:open

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
@maxwelliverson
Copy link
Copy Markdown
Author

Finally managed to get the CLA sorted out! It seemed to have been some weird issue with my linux foundation account in general; going to any linux foundation login page gave me the same blank screen no matter which device/browser I tried from.

The only thing that eventually worked was.... downloading a brand new browser I had never used before and using that

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…thods

Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants