[CALCITE-7608] Introduce a SelectMany operator#5031
Conversation
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
|
|
We believe this PR is quite important, so I’d really appreciate it if you could take another look when you have a moment. @zabetak @julianhyde |
| # Validated on Postgres | ||
| SELECT n, a, b | ||
| FROM (SELECT 'test' AS n, ARRAY[1, 2, 3] AS arr1, ARRAY[10, 20] AS arr2) AS u, | ||
| UNNEST(u.arr1, u.arr2) AS t(a, b); |
There was a problem hiding this comment.
how about add order by a
There was a problem hiding this comment.
to make the output deterministic?
|
Generating |
|
Thanks a lot for introducing SelectMany — this is a much cleaner abstraction than the current Correlate + Uncollect pattern. You note in the description that "in the future this rewrite could be moved into SqlToRelConverter as well." I'd like to make the case for doing (at least an initial version of) that in this PR, because I think the rule-based approach has an inherent limitation: A pattern-matching rule only fires when the plan matches the exact shape it expects, and that shape is easily perturbed — by an intervening Project or pushed-down Filter, a trait change, or decorrelation running first. When that happens, a perfectly convertible query silently misses the rewrite and falls back to the less efficient Correlate form, and trying to generalize the pattern to cover every shape tends to become an endless game of whack-a-mole. Generating LogicalSelectMany directly in convertUnnest() sidesteps this. The converter already has everything it needs — the input row, the array expressions, withOrdinality, and the join type. By projecting both the pass-through columns and the collection columns onto the same input row, the array expressions stay as ordinary input references (no correlation variable is ever introduced), and SelectMany can be emitted deterministically — before any optimization can disturb the shape. This makes FROM t CROSS/LEFT JOIN UNNEST(t.arr) skip both the Correlate intermediate form and the subsequent decorrelation step entirely, and guarantees the operator is produced whenever it's applicable, rather than "whenever the rule happens to match." The two approaches are complementary, not competing: the rule is still valuable for plans that already contain Correlate + Unnest (e.g. coming from other frontends), while direct generation covers the SQL path. To stay backwards-compatible, the sql2rel path could be gated behind a SqlToRelConverter.Config flag (off by default), mirroring how you disabled the rewrite rule by default. Happy to help with this if you think it's in scope — otherwise it could make a good follow-up. |
|
There so many config flags in Calcite... Thinking of this, having a way to do it in SqlToRelConverter is probably very useful, so we should offer this possibility. SelectMany is actually not a very good name, the "real" selectMany is actually much more general, it takes an arbitrary function which returns an iterator. We could call it LogicalUnnest. Let's see if there are other comments which I can address in an updated commit. I will try to add it to SqlToRel, but not sure I will get to it during the week-end. Happy to get other naming suggestions as well. |



Jira Link
CALCITE-7608
Changes Proposed
This PR introduces a new operator
SelectMany, which generalizesUncollect. The operator combinesCorrelate+Unnestand has two forms: inner join and left join.The PR is divided into four commits which should probably be left separate:
UNNEST, which only acceptsINNERandLEFT JOINsSelectManyand it's Logical variant and add support to the RelBuilderCorrelate+UnnestintoSelectMany. The rule is not enabled by default for backwards compatibility. This rule should probably be executed before the decorrelator. In the future this rewrite could be moved into SqlToRelConverter as well.SelectManyand the associated Java code generationThis operator strictly more expressive than
Uncollect. (I believe that the existingUncollectdoes not even support LEFT JOINs). In the long termUncollectshould ideally be deprecated in favor of this operator.