feat(c++): add configurable deserialization size guardrails#3455
feat(c++): add configurable deserialization size guardrails#3455shivendra-dev54 wants to merge 5 commits intoapache:mainfrom
Conversation
5388e92 to
88e9197
Compare
|
|
||
| // 2. Convert bytes to element count and check max_collection_size | ||
| size_t elem_count = total_bytes_u32 / sizeof(T); | ||
| if (FORY_PREDICT_FALSE(elem_count > ctx.config().max_collection_size)) { |
There was a problem hiding this comment.
Could we use binary size for check instead? The collectionm with basic numeric types are taken as binary array?
There was a problem hiding this comment.
I have updated the arithmetic vector path to use max_binary_size instead of max_collection_size, as suggested. I've also updated the unit tests to verify this behavior.
| Error::invalid_data("Collection length exceeds max_collection_size")); | ||
| return std::vector<T, Alloc>(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Nice addition on the read() path. One gap remains: several read_data() paths still trust payload size without max_collection_size checks (for example vector non-arithmetic, vector, list, deque, and forward_list). Those paths are used in native/nested deserialization, so guardrails can still be bypassed there.
| Error::invalid_data("Collection length exceeds max_collection_size")); | ||
| return std::deque<T, Alloc>(); | ||
| } | ||
|
|
There was a problem hiding this comment.
There is still an unguarded preallocation in forward_list::read: temp.reserve(length) happens before any max_collection_size validation. A malicious length can force a large allocation before we fail. Please validate length immediately after reading it, before reserve.
|
|
||
| // Test max_collection_size using objects (e.g., strings) | ||
| TEST(CollectionSerializerTest, MaxCollectionSizeGuardrail) { | ||
| auto fory = Fory::builder() |
There was a problem hiding this comment.
These tests are helpful, but they only cover xlang(true). The new limits should also be validated in xlang(false)/read_data paths, otherwise we can miss regressions in native mode (especially for vector/list/deque/forward_list).
cpp/fory/serialization/context.h
Outdated
| void reset(); | ||
|
|
||
| /// get associated configuration. | ||
| inline const Config &config() const { return *config_; } |
There was a problem hiding this comment.
Read-side code now uses config(), but I do not see write-side usage of WriteContext::config(). If no write-path consumer is planned, we can drop this accessor to keep the API surface smaller.
There was a problem hiding this comment.
If it do not have any use then I think dropping it is a better idea
|
@chaokunyang |
Why?
We currently don't have any size limits for incoming payloads in the C++ implementation. This is a security risk because a malicious or malformed payload can claim to have a massive collection or binary length, forcing the system to pre-allocate gigabytes of memory (via
.reserve()or constructors) before actually reading the data. This makes the system vulnerable to simple Out-of-Memory (OOM) Denial-of-Service attacks.What does this PR do?
This PR adds two essential security guardrails to the deserialization path:
max_binary_sizeandmax_collection_size.Changes included:
Config & API: Added the two new limits to
serialization::Configand updatedForyBuilderso users can easily set these at runtime. Defaults are 64MB for binary and 1M entries for collections.Security Enforcement:
Integrated checks into all sensitive pre-allocation paths, including
std::vector,std::list,std::deque,std::set, andstd::unordered_set.Added entry-count validation for Maps (both fast and slow paths).
Specifically handled arithmetic vectors by converting byte-lengths to element counts to ensure
max_collection_sizeis respected.Context Access: Exposed a public
config()accessor inReadContextandWriteContextso internal serializers can reach these settings.Tests: Added new test cases in
collection_serializer_test.ccandmap_serializer_test.ccto verify that deserialization fails immediately with a descriptive error when limits are exceeded.Related issues
Fixes #3408
Does this PR introduce any user-facing change?
Yes, it adds two new methods (
max_binary_sizeandmax_collection_size) to theForyBuilder.Benchmark
The performance impact is negligible. The checks are simple integer comparisons performed once per collection/binary read, occurring right before the expensive allocation phase. All 30 existing C++ test targets pass with no measurable change in execution time.