Skip to content

feat(c++): add configurable deserialization size guardrails#3455

Open
shivendra-dev54 wants to merge 5 commits intoapache:mainfrom
shivendra-dev54:configurable_size_guardrails_for_untrusted_payloads
Open

feat(c++): add configurable deserialization size guardrails#3455
shivendra-dev54 wants to merge 5 commits intoapache:mainfrom
shivendra-dev54:configurable_size_guardrails_for_untrusted_payloads

Conversation

@shivendra-dev54
Copy link

Why?

We currently don't have any size limits for incoming payloads in the C++ implementation. This is a security risk because a malicious or malformed payload can claim to have a massive collection or binary length, forcing the system to pre-allocate gigabytes of memory (via .reserve() or constructors) before actually reading the data. This makes the system vulnerable to simple Out-of-Memory (OOM) Denial-of-Service attacks.

What does this PR do?

This PR adds two essential security guardrails to the deserialization path: max_binary_size and max_collection_size.

Changes included:

  • Config & API: Added the two new limits to serialization::Config and updated ForyBuilder so users can easily set these at runtime. Defaults are 64MB for binary and 1M entries for collections.

  • Security Enforcement:

  • Integrated checks into all sensitive pre-allocation paths, including std::vector, std::list, std::deque, std::set, and std::unordered_set.

  • Added entry-count validation for Maps (both fast and slow paths).

  • Specifically handled arithmetic vectors by converting byte-lengths to element counts to ensure max_collection_size is respected.

  • Context Access: Exposed a public config() accessor in ReadContext and WriteContext so internal serializers can reach these settings.

  • Tests: Added new test cases in collection_serializer_test.cc and map_serializer_test.cc to verify that deserialization fails immediately with a descriptive error when limits are exceeded.

Related issues

Fixes #3408

Does this PR introduce any user-facing change?

Yes, it adds two new methods (max_binary_size and max_collection_size) to the ForyBuilder.

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

The performance impact is negligible. The checks are simple integer comparisons performed once per collection/binary read, occurring right before the expensive allocation phase. All 30 existing C++ test targets pass with no measurable change in execution time.

@shivendra-dev54 shivendra-dev54 force-pushed the configurable_size_guardrails_for_untrusted_payloads branch from 5388e92 to 88e9197 Compare March 4, 2026 12:28

// 2. Convert bytes to element count and check max_collection_size
size_t elem_count = total_bytes_u32 / sizeof(T);
if (FORY_PREDICT_FALSE(elem_count > ctx.config().max_collection_size)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use binary size for check instead? The collectionm with basic numeric types are taken as binary array?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the arithmetic vector path to use max_binary_size instead of max_collection_size, as suggested. I've also updated the unit tests to verify this behavior.

Error::invalid_data("Collection length exceeds max_collection_size"));
return std::vector<T, Alloc>();
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition on the read() path. One gap remains: several read_data() paths still trust payload size without max_collection_size checks (for example vector non-arithmetic, vector, list, deque, and forward_list). Those paths are used in native/nested deserialization, so guardrails can still be bypassed there.

Error::invalid_data("Collection length exceeds max_collection_size"));
return std::deque<T, Alloc>();
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an unguarded preallocation in forward_list::read: temp.reserve(length) happens before any max_collection_size validation. A malicious length can force a large allocation before we fail. Please validate length immediately after reading it, before reserve.


// Test max_collection_size using objects (e.g., strings)
TEST(CollectionSerializerTest, MaxCollectionSizeGuardrail) {
auto fory = Fory::builder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are helpful, but they only cover xlang(true). The new limits should also be validated in xlang(false)/read_data paths, otherwise we can miss regressions in native mode (especially for vector/list/deque/forward_list).

void reset();

/// get associated configuration.
inline const Config &config() const { return *config_; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read-side code now uses config(), but I do not see write-side usage of WriteContext::config(). If no write-path consumer is planned, we can drop this accessor to keep the API surface smaller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it do not have any use then I think dropping it is a better idea

@shivendra-dev54
Copy link
Author

@chaokunyang
I have gone through all the points you mentioned.
please check if I am wrong somewhere or there are any additional things i should include

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] configurable size guardrails for untrusted payloads

2 participants