fix: return error instead of panicking on zero-dimension fixed-size-list columns#7247
Open
DanielMao1 wants to merge 1 commit into
Open
fix: return error instead of panicking on zero-dimension fixed-size-list columns#7247DanielMao1 wants to merge 1 commit into
DanielMao1 wants to merge 1 commit into
Conversation
b47648a to
06de9c9
Compare
…ist columns A fixed-size-list column with dimension 0 used to panic with 'attempt to divide by zero' (rust/lance-encoding/src/data.rs) on every write path and when reading datasets persisted by older writers. Two guards, following the maintainer guidance in lance-format#5102 (error, not panic): - Schema::validate() rejects zero-dimension fixed-size-list fields, turning every write into a clean schema error. validate() is invoked from Schema::try_from(&ArrowSchema), so all write entry points are covered. - The decoder field-scheduler builders reject zero-dimension fixed-size lists with an invalid-input error, so datasets persisted by old writers fail cleanly instead of panicking at scheduling time. Closes lance-format#5102
143f1e1 to
240a45a
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5102
Problem
A fixed-size-list column with dimension 0 panics with
attempt to divide by zero(
rust/lance-encoding/src/data.rs,FixedSizeListBlock::num_values). As of pylance 7.0.0the panic fires on write for every storage version (
stable/2.1/2.2), and readingdatasets persisted by older writers (which accepted such columns) panics as well.
Reproduction details are in the issue comment:
#5102 (comment)
Approach
Following the maintainer guidance in #5102 (error, not panic), this adds two small guards at
boundaries that already return
Result, instead of changingDataBlock::num_values()toreturn
Result(the approach that made #5159 balloon across the whole encoding crate):Schema::validate()rejects zero-dimension fixed-size-list fields(including nested ones).
validate()runs insideSchema::try_from(&ArrowSchema),so every write entry point surfaces a clean schema error instead of a panic. Writes
currently panic on every storage version, so no working flow changes behavior.
zero-dimension fixed-size lists with an invalid-input error, so datasets persisted by
old writers fail cleanly at scheduling time instead of crashing the process.
How the guards sit in the data flow
Two facts that shape the design:
Schema::try_from(&ArrowSchema)callsvalidate()internally and every write path performsthis conversion, so guard 1 in one place covers all write entry points.
under the
stable(2.0) storage version; reading those files must not crash the process.Tests
lance-core:Schema::try_fromrejects zero-dim FSL at top level and nested in a struct;positive dimensions still validate.
lance-encoding: the scheduler guard rejects zero-dim FSL, including FSL-nested-in-FSL,and accepts positive dimensions.
legacy/stable/2.1,write_datasetnow raises a cleanOSError(same mapping as other schema validation errors) instead ofPanicException.