Skip to content

[SPARK-56164][SQL] Fix SPJ merged key ordering#54961

Open
peter-toth wants to merge 5 commits intoapache:masterfrom
peter-toth:SPARK-56164-fix-spj-merged-key-ordering
Open

[SPARK-56164][SQL] Fix SPJ merged key ordering#54961
peter-toth wants to merge 5 commits intoapache:masterfrom
peter-toth:SPARK-56164-fix-spj-merged-key-ordering

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Mar 23, 2026

What changes were proposed in this pull request?

Fix a bug in EnsureRequirements where the ordering used to merge and dedup partition keys after reduction was based on the original partition key types rather than the reduced types.

When compatible partition transform reducers are applied (e.g. reducing days keys to years keys), the resulting partition key values may have a different DataType than the originals.

Please note that #54884 fixed the issue when left and right side reduced key types are not equal, but this PR fixes the issue when the common reduced types differ to the left side original types.

Why are the changes needed?

Without this fix, when reducers change the data type of partition keys (e.g. DateTypeLongType), the ordering used to merge partition values is built for the wrong type, which can produce incorrect merge results or runtime failures during storage-partitioned joins with compatible transform reducers.

Does this PR introduce any user-facing change?

This PR fixes a theoretical bug, a real world example is unlikely to exist.

How was this patch tested?

Added a new test case.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

@peter-toth
Copy link
Contributor Author

cc @szehon-ho , @dongjoon-hyun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @peter-toth . This is only for 4.2.0, right?

This PR fixes a theoretical bug, a real world example is unlikely to exist.

@peter-toth
Copy link
Contributor Author

peter-toth commented Mar 23, 2026

Thank you, @peter-toth . This is only for 4.2.0, right?

This PR fixes a theoretical bug, a real world example is unlikely to exist.

Yes, I intend to fix in 4.2 only. I'm working on a new feature that can make this bug appear in real world scenarios.

fn
}
catalog.createFunction(id, fn)
def withFunctions[T](fns: UnboundFunction*)(f: => T): T = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use plural form for the helper functions, do we? The previous name withFunction looks better to me.

protected def withTable(tableNames: String*)(f: => Unit): Unit = {

protected def withDatabase(dbNames: String*)(f: => Unit): Unit = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed in 9e3085f.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (with only one nit naming comment).

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @peter-toth !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants