fix(pnm,jxl): Prevent PNM/JXL readers from loading arbitrarily large non-image files#5203
Conversation
…memory The PNM reader did not provide an efficient `valid_file` implementation. While `PNMInput::open` would perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with having `open` unconditionally read arbitrarily large files into memory, this also meant that any call to `valid_file` would necessarily do the same (as the default implementation delegates to `open` when ioproxy support is present). To fix these issues, the following changes were made: - Provide an efficient implementation of `valid_file` for `PNMInput` that only loads the header, and then validates magic number/dimensions. - In `PNMInput::open`, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid. - Added a rough limit to header read size of 1KiB. - Added a rough limit to full file read size of 1GiB. Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing alltogether. Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…memory The JXL reader already provided an efficient `valid_file` override, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated in `JxlInput::open` before attempting to decode the file. The process of JXL decoding appears to read the entire contents of the file to memory (at least when provided non-JXL input). Taken in combination, this means that when `JXLInput::open` was called on arbitrarily large non-JXL inputs, the entire file was being read into memory before open would fail. As a simple fix for this, `JxlInput::open` now checks the result of `valid_file` before attempting any decoding. Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
|
Running into issues at the moment with easycla; following the "please click here to be authorized" link above takes me to a page that starts to load something before quickly going all white. Checking the console indicates that there's possibly some authentication error of some kind? Help I've been unpersoned by the Linux Foundation |
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
|
@maxwelliverson I checked, you're definitely on the list for this project. Are you sure that the email you used (your home gmail) on the commits is the same as what GitHub itself thinks you use? If so, then try the "Please click here to be authorized" again? Maybe paste that link into an incognito window just in case your normal browser has some cookie or plugin that's interfering with the site somehow? If all that fails, then we'll poke LF for help. Sometimes something get wedged. |
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Yup definitely, this is the primary email associated with this account.
Just tried this and it actually got me to a non-blank page, which then prompts me to choose whether to proceed as an individual contributor or a corporate contributor. Unfortunately, it then also warns me that the site needs cookies to work and is likely to not be functional in a private window, and indeed the "proceed" buttons are greyed-out and disabled.... Given this seems like some issue with browser state, I'm going to try purging some caches/disabling plugins to see if that does the trick. |
… defs Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
|
|
||
|
|
||
|
|
||
| using PNMBasicInfo = std::tuple<PNMType, int, int>; // type, width, height |
There was a problem hiding this comment.
Is there a reason not to just define a tiny helper struct that's more self-documenting? Like
struct PNMBasicInfo {
PNMtype type;
int width;
int height;
};
You could also add an Err value to the enum, and then you wouldn't need optional<> at all because the struct would tell you it's invalid because of an error.
There was a problem hiding this comment.
No real strong reason other than a personal affinity for the following construct
if (auto result = returns_optional_tuple()) {
// unpack *result
}
But I think you're right that the self documenting nature of a simple struct makes more sense here.
There was a problem hiding this comment.
I've made a change to swap out the tuple for a struct, but before I commit/push, I wanted to ask if there was any particular reason we want to avoid using std::optional? I personally think it does make for clearer code than having type double as an error indicator, but I can appreciate there may be reasons to not want to introduce it (eg. avoid reliance on c++17-features-not-named-string_view)
There was a problem hiding this comment.
I think it's mostly historical. std::optional is a C++17 feature, and it's only fairly recently (OIIO 3.0) that we raised our minimum supported C++ to 17. So it really just hasn't come up, since all the major APIs were in place already. I guess that I, in particular, just still haven't worked it into my standard toolbox.
| static string_view | ||
| read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io, | ||
| string_view remaining) |
There was a problem hiding this comment.
Command and slight renaming suggestion:
| static string_view | |
| read_image_data_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io, | |
| string_view remaining) | |
| // buffer contains at most the first 1K of the file. At this point, we know | |
| // the file seems valid. Read the rest in, appending to what we have, and | |
| // return the adjusted string_view of the conents. | |
| static string_view | |
| append_remainder_to_buffer(std::vector<char>& buffer, Filesystem::IOProxy* io, | |
| string_view remaining) |
There was a problem hiding this comment.
I'm also curious why you didn't make this a method of PNMInput. Would that simplify anything? Including making it so that you didn't need to pull those other methods out of the class and turn them into standalone functions when they needed no other changes?
There was a problem hiding this comment.
This was something I was conscious of, but hadn't managed to convince myself 100% one way or the other.
My personal preference for code organization tends towards something along the lines of "a function should only be a (possibly static) method if it requires access to class internals, or if it forms part of the interface of the class". For this reason, I made read_header_into_buffer a free function (as the buffer in question would not necessarily be m_file_contents). Then, for parity with read_header_into_buffer, read_image_data_to_buffer was also made a free function with a nearly identical declaration. It could, however, just as easily have been a private method, and perhaps does make more sense as such, as it is only ever called on the private data members of the class anyways...
There was a problem hiding this comment.
I think your instinct is mostly correct, and for fresh code I would also lean toward free functions for things that don't need class internals. Though the counter-weight to that is a desire to use methods as a kind of scoping -- if it's never called by anyone but methods of this one class, keep it in the most local scope possible.
I don't feel strongly about it. It doesn't need access to class internals, but it is operating on data members (as it's used, in the one and only place it's called). Do whatever you think makes the code simplest and easiest to read.
There was a problem hiding this comment.
I think that makes a lot of sense. I just pushed another commit that I think strikes a decent balance: both functions are now static methods of PNMInput, but are otherwise unchanged. I also moved their definitions further down so they're just before valid_file and open.
I can't tell if you're just telling me how far you got, or expressing that you don't know which to choose. Choose corporate. |
Sorry, I should have been more clear there, I do know that I need to choose corporate. When I was saying that the "proceed" buttons were greyed out/disabled, I meant that those two buttons specifically were disabled, ie. I'm unable to choose either |
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…unctions Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…:open Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
|
Finally managed to get the CLA sorted out! It seemed to have been some weird issue with my linux foundation account in general; going to any linux foundation login page gave me the same blank screen no matter which device/browser I tried from. The only thing that eventually worked was.... downloading a brand new browser I had never used before and using that |
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
…thods Signed-off-by: maxwelliverson <maxwelliverson@gmail.com>
Description
PNM
The PNM reader did not provide an optimized
valid_fileimplementation. WhilePNMInput::openwould perform basic validation of the PNM header, it would only do this after having read the entire contents of the file into memory. Besides the obvious risks/issues with havingopenunconditionally read arbitrarily large files into memory, this also meant that any call tovalid_filewould necessarily do the same (as the default implementation delegates toopenwhen ioproxy support is present).To fix these issues, the following changes were made:
valid_fileforPNMInputthat only loads the header, and then validates magic number/dimensions.PNMInput::open, first load only the header to memory, and subsequently, only load the rest of the image data if the header is parsed successfully, and the data contained within is valid.Note that, for now, if the file size exceeds the 1GiB limit, the read is simply truncated to 1GiB, rather than failing altogether.
JPEGXL
The JXL reader already provided an efficient
valid_fileoverride, implemented by validating the 128 byte JXL signature. The signature, however, was not being validated inJxlInput::openbefore attempting to decode the file.The process of JXL decoding appears to read the entire contents of the file to memory (at least for some non-JXL inputs). Taken in combination, this means that when
JXLInput::openwas called on arbitrarily large non-JXL inputs, the entire file was being read into memory beforeopenwould fail.As a simple fix for this,
JxlInput::opennow checks the result ofvalid_filebefore attempting any decoding.Tests
I did not add a new testsuite case for this, but did test on my end that for both PNM and JPEGXL, large invalid files are no longer being read into memory, while valid images (of their respective types) are still read/opened successfully.
Checklist:
and if I used AI coding assistants, I have an
Assisted-by: TOOL / MODELline in the pull request description above.
behavior.
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
fixed any problems reported by the clang-format CI test.
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.