Skip to content

[SPARK-56168][PS][TESTS] Relax groupby diff-length assertions for pandas 3#54969

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issuse/SPARK-56168/diff_len
Closed

[SPARK-56168][PS][TESTS] Relax groupby diff-length assertions for pandas 3#54969
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issuse/SPARK-56168/diff_len

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Mar 23, 2026

What changes were proposed in this pull request?

This PR updates the pandas-on-Spark diff-frame groupby length test in test_groupby_diff_len.py.

The changes are test-only:

  • restructure the test with subTest to make the failing case easier to identify
  • use almost=True for pandas 3.x
  • add a short comment explaining that pandas 3 includes external group keys for as_index=False and can widen their dtype after aligning mismatched lengths

Why are the changes needed?

The test behavior differs across pandas versions for mismatched-length external group keys.

With pandas 3.x, groupby(..., as_index=False) can include the external group key in the result and widen its dtype after alignment. That makes the strict equality used by this test fail due to dtype differences even though the grouped values still match.

This patch keeps the existing pandas 2 behavior untouched and relaxes the assertions only for pandas 3.x in this localized test.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Updated the related tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

@ueshin
Copy link
Member Author

ueshin commented Mar 23, 2026

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants