Allow Spark partial / Comet final for compatible aggregates#2994
Allow Spark partial / Comet final for compatible aggregates#2994Shekharrajak wants to merge 5 commits intoapache:mainfrom
Conversation
f2e6748 to
51869b1
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2994 +/- ##
============================================
- Coverage 56.12% 54.58% -1.54%
- Complexity 976 1256 +280
============================================
Files 119 167 +48
Lines 11743 15505 +3762
Branches 2251 2571 +320
============================================
+ Hits 6591 8464 +1873
- Misses 4012 5822 +1810
- Partials 1140 1219 +79 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| val sparkPlan = | ||
| createSparkPlan( | ||
| spark, | ||
| "SELECT MIN(id), MAX(id), COUNT(*) FROM test_data GROUP BY (id % 3)") |
There was a problem hiding this comment.
It would be nice to add cases for the bit*** functions as well
91e0de7 to
274f38b
Compare
|
Sorry for the late review @Shekharrajak. This LGTM except for the missing end-to-end tests for bitwise aggregates that @parthchandra already stated. I will go ahead and add those tests and push to this branch if permissions allow, or create a new branch from this one. |
Something is wrong with the git history on this branch so I cannot rebase or upmerge. @Shekharrajak let me know if you are still interested in working on this. If not, I will create a new PR based on your changes. |
|
Thanks for checking, Let me work on this. |
274f38b to
863ba03
Compare
863ba03 to
141ba57
Compare
andygrove
left a comment
There was a problem hiding this comment.
LGTM pending CI. Thanks @Shekharrajak
Which issue does this PR close?
Closes #2894
Rationale for this change
Comet currently falls back to Spark for ALL final hash aggregates when there's no Comet partial aggregate in the child plan. This is overly conservative because some aggregates have compatible intermediate buffer formats between Spark and Comet.
For example, MIN, MAX, COUNT, and bitwise aggregates (BIT_AND, BIT_OR, BIT_XOR) have simple intermediate buffers (single value) that are compatible between Spark and Comet. These can safely run with "Spark partial / Comet final" execution.
Other aggregates like SUM, AVG, VARIANCE, etc. have known incompatibilities (e.g., decimal overflow handling differences, complex intermediate buffers) and should continue to fall back when there's no Comet partial aggregate.
What changes are included in this PR?
Added supportsSparkPartialCometFinal method to CometAggregateExpressionSerde trait - Default is false
Added helper function - aggSupportsMixedExecution() in QueryPlanSerde
How are these changes tested?
"CometExecRule should not allow Spark partial and Comet final for unsafe aggregates" - Verifies SUM still falls back to Spark
"CometExecRule should allow Spark partial and Comet final for safe aggregates" - Verifies MIN/MAX/COUNT can use Comet final with Spark partial