Skip to content

Add support for FixedSizeList to variant_to_arrow#9663

Merged
alamb merged 13 commits into
apache:mainfrom
rishvin:fixed_size_list_variant_to_arrow
May 20, 2026
Merged

Add support for FixedSizeList to variant_to_arrow#9663
alamb merged 13 commits into
apache:mainfrom
rishvin:fixed_size_list_variant_to_arrow

Conversation

@rishvin
Copy link
Copy Markdown
Contributor

@rishvin rishvin commented Apr 6, 2026

Rationale for this change

Add support for FixedSizeList when invoking variant_to_arrow.

What changes are included in this PR?

  • Introduces a new builder VariantToFixedSizeListArrowRowBuilder.
  • Adds test cases for shredding and getting variant by FixedSizeList.

Are these changes tested?

By adding few test cases.

Are there any user-facing changes?

N/A.

@github-actions github-actions Bot added the parquet-variant parquet-variant* crates label Apr 6, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

@sdf-jkl could you help review this PR?

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rishvin, I left some suggestions.

Comment thread parquet-variant-compute/src/variant_to_arrow.rs Outdated
Comment thread parquet-variant-compute/src/variant_to_arrow.rs Outdated
Comment thread parquet-variant-compute/src/variant_to_arrow.rs
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs
@rishvin rishvin requested a review from sdf-jkl April 11, 2026 21:24
@rishvin
Copy link
Copy Markdown
Contributor Author

rishvin commented Apr 11, 2026

Thanks @sdf-jkl for reviewing changes. I have addressed the comments, could you re-review ?

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @rishvin! Everything looks great, I left some nits.

}

impl<'a> ListElementBuilder<'a> {
fn append_null(&mut self) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment mentioning this is only used for FixedSizeList builder?

Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/variant_get.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Comment thread parquet-variant-compute/src/shred_variant.rs
rishvin and others added 2 commits April 13, 2026 11:53
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
@rishvin rishvin requested a review from sdf-jkl April 13, 2026 19:24
@rishvin
Copy link
Copy Markdown
Contributor Author

rishvin commented Apr 13, 2026

Thanks @sdf-jkl addressed new comments.

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny change.

Comment thread parquet-variant-compute/src/shred_variant.rs Outdated
Co-authored-by: Konstantin Tarasov <33369833+sdf-jkl@users.noreply.github.com>
@rishvin rishvin requested a review from sdf-jkl April 14, 2026 04:25
@rishvin
Copy link
Copy Markdown
Contributor Author

rishvin commented Apr 14, 2026

Thanks again @sdf-jkl ! Addressed it.

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rishvin! LGTM.

@scovich could you please check this when available? Thanks!

Comment on lines +334 to +336
// With `safe` cast option set to false, appending list of wrong size to
// `typed_value_builder` of type `FixedSizeList` will result in an error. In such a
// case, the provided list should be appended to the `value_builder.
Copy link
Copy Markdown
Contributor

@scovich scovich Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the variant shredding spec allows for shredding as a fixed size list, if the resulting layout differs physically from a normal list?

Arrays can be shredded by using a 3-level Parquet list for typed_value.

If the value is not an array, typed_value must be null. If the value is an array, value must be null.

It looks to me like any attempt to shred as fixed-sized list must either succeed (if the size is correct) or hard-fail (because value as fallback is not allowed).

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It differs physically on the Arrow side, but once we write it to Parquet it'd be same as other ListLikeArrays. But this leads to further discussion on adding FixedSizeList support for VariantArray as well as implementing other types, currently not supported in spec.

We're keeping value because we consider this a cast from Variant to FixedSizeList. The extra len check is there because there is no Variant::FixedSizeList enum to match to. If len is incorrect we consider the cast failed and proceed following the safe cast option as if typed_value is Null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When casting from variant to arrow, we can do whatever we want.

But this code here is about going from binary variant to shredded variant. And the variant shredding spec directly forbids value to contain a variant array, when shredding as array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think the core issue is that Parquet currently has only one logical LIST type. If Parquet had a dedicated logical type for FixedSizeList, the spec wording could be more explicit.

Btw, there’s ongoing work on this too: apache/parquet-format#241 (recently revived).

Given the current spec text:

Arrays can be shredded by using a 3-level Parquet list for typed_value.

If the value is not an array, typed_value must be null. If the value is an array, value must be null.

I read “array” as "a value matching the specific list shape we’re shredding into". For List/LargeList/ListView it's List values, for FixedSizeList array it's a FixedSizeList value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the variant spec neither knows nor cares about the intricacies of arrow array types (it also doesn't care about spark or SQL). If we're shredding to a 3-level parquet list, and we encounter a variant array value, the resulting value column entry must be null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the use case should be the first thing we think about before implementing.

FSL could be useful for embeddings and tensors but the question is why would anyone put them inside a Variant for such a heavy compute work? Maybe coordinates inside a log struct?

The fact that we can't store FSL in Parquet is a big reason to not do it.

FixedSizeBinary actually does have a Parquet representation - FIXED_LEN_BYTE_ARRAY and is used for shredding into a UUID, but other than UUID, what could be a use case for storing inside a Variant?

Overall, I think we'd not gain much by including FSL because it's so frail and the use cause is not defined. Isn't the whole point of Variant is to host semi-structured data, if we have fixed-len-something it'd be better to keep a separate column for it. If fixed-length is a logical constraint rather than a physical one, enforcement belongs downstream.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think we'd not gain much by including FSL because it's so frail and the use cause is not defined. Isn't the whole point of Variant is to host semi-structured data, if we have fixed-len-something it'd be better to keep a separate column for it. If fixed-length is a logical constraint rather than a physical one, enforcement belongs downstream.

Agree. I just realized that somebody could create a FSL of [shredded] variant, if the use case just needs flexible typing of N list members. But that also seems somewhat contrived, to have such strongly-typed structure but sufficient uncertainty about the list element type to merit using variant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then keep it hard fail and leave it to the user?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching up late on this. I'm aligned with keeping it a hard fail and leaving it to the user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for reviewing these changes and apologies for delays.
Could you please re-review.

Copy link
Copy Markdown
Contributor

@sdf-jkl sdf-jkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rishvin

@rishvin rishvin requested a review from scovich May 20, 2026 16:25
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rishvin and @sdf-jkl -- looks good to me

Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +336 to +337
self.typed_value_builder
.append_value(&Variant::List(list))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: a bit surprised fmt liked this line break?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes fmt seems to give up for some reason 🤷

@alamb alamb merged commit dc821a9 into apache:main May 20, 2026
17 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 20, 2026

🚀

@rishvin rishvin deleted the fixed_size_list_variant_to_arrow branch May 20, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add variant_to_arrow FixedSizeList type support

5 participants