Replace nested Stream.concat with direct iteration in expression merging#6725
Draft
Replace nested Stream.concat with direct iteration in expression merging#6725
Conversation
…he previous value associated with key
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



[NEEDS MORE TEST COVERAGE TO MERGE]
Overview
This PR targets two bottlenecks in
UpdateExpressionConverter.toExpressionas part of the Update operation request pipeline.toExpressionscales and becomes the most noticable bottleneck the bigger the input is: ~40% cpu time in SMALL datasets, and ~74% in HUGE_FLAT datasets.Interactive Flamegraph
1. String.format overhead (~18% CPU time)
groupExpressionsuses String.format to build the update expression for each attribute. Can replace with a simple string concatenation which the JVM optimizes heavily.2. Nested
Stream.concat()inmergeExpressionNames()andmergeExpressionValues()(15-50% CPU time)Current implementation:
The problem is that
Stream.reduce()operation callsExpression.joinNames()for each action type (SET, ADD, REMOVE, DELETE). This method creates a new HashMap and copies all existing entries on every iteration:Example: With N attributes distributed across 4 action types, Stream.reduce() creates 3 intermediate HashMaps. For example, with 20 attributes (5 per action type), it copies 5 entries, then 10, then 15, before producing the final 20 entry result. The cost scales poorly with dataset size.
Instead we can iterate directly on each action and use a single HashMap:
This eliminates intermediate HashMap allocations and theredundant entry copying. The same proposed optimization also applies to
mergeExpressionValues().Results:
toExpressionwent from ~40% CPU time to ~15%groupExpressionswent from 18% CPU time to 6%mergeExpressionValuesandmergeExpressionNameswent down from 18% to ~4%