Skip to content

fix: round large UInt64 values without narrowing#23033

Open
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:kevin/fix-spark-round-large-uint64
Open

fix: round large UInt64 values without narrowing#23033
Kevin-Li-2025 wants to merge 1 commit into
apache:mainfrom
Kevin-Li-2025:kevin/fix-spark-round-large-uint64

Conversation

@Kevin-Li-2025

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Spark round currently narrows UInt64 inputs through i64. Values above i64::MAX therefore fail even when a non-negative scale makes rounding a no-op. The narrowing also prevents correct negative-scale rounding for the upper half of the UInt64 domain.

What changes are included in this PR?

  • Add a native u64 HALF_UP rounding path with the same ANSI overflow and non-ANSI wrapping behavior as the signed implementation.
  • Use it for both scalar and array UInt64 inputs.
  • Add helper-level coverage for large values and overflow behavior.
  • Add SQL regression coverage for u64::MAX with default/positive scales and a large negative-scale case.

Are these changes tested?

Yes:

  • cargo test -p datafusion-spark function::math::round::tests --lib
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- spark/math/round --test-threads 1
  • cargo clippy -p datafusion-spark --features core --all-targets -- -D warnings
  • cargo fmt --all -- --check

Are there any user-facing changes?

Yes. Spark round now accepts UInt64 values above i64::MAX and returns correct results instead of reporting a narrowing error. There are no public API changes.

Signed-off-by: Kevin-Li-2025 <2242139@qq.com>
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) spark labels Jun 19, 2026

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — this correctly fixes the narrowing path. A few notes from reviewing it:

The new round_unsigned_integer faithfully mirrors round_integer, and correctly omits the remainder <= -threshold branch since value % factor is always non-negative for u64 — that's a valid simplification, not a missing case. The HALF_UP tie-breaking, ANSI checked-overflow vs. non-ANSI wrapping, and the factor overflow → 0 short-circuit all match the signed version.

I verified the overflow test's expected value by hand: round(u64::MAX, -1, non-ansi)remainder = (2^64-1) % 10 = 5, 5 >= 5 rounds up → wrapping_sub(5).wrapping_add(10) = 4, which is what the test asserts.

I ran both unit tests locally and they pass. (I couldn't run the SLT cases in my environment, but they exercise the same round_unsigned_integer paths the unit tests cover, and the expected values check out by hand.)

One optional nit: round_integer hoists let threshold = factor / 2; into a local and reuses it; the unsigned version inlines factor / 2 in the comparison. Matching the sibling function's style would read a bit more consistently — minor, take it or leave it.

@Jefffrey

Copy link
Copy Markdown
Contributor

Fyi we have an existing PR:

This PR seems to fix only spark, while the original issue seems to describe for datafusion function. Also worth noting Spark doesn't support unsigned types in the first place (I made a comment to this affect on the linked PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

round() mishandles large Int64 and UInt64 values

3 participants