Skip to content

[Variant] Support ['fieldName'] in VariantPath parser#9276

Open
klion26 wants to merge 7 commits intoapache:mainfrom
klion26:variant-path-field-in-bracket
Open

[Variant] Support ['fieldName'] in VariantPath parser#9276
klion26 wants to merge 7 commits intoapache:mainfrom
klion26:variant-path-field-in-bracket

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Jan 27, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Add [fieldName] support in VariantPath parser, will throw an error if the parser fails.

Also support escaping \ inside brackets. If we force users to use brackets when the field contains special characters, maybe we can also close #8954

Sample behaviors(read more on the code doc)

  • [foo] -> filed foo
  • [2] -> index 2
  • [a.b] -> field a.b
  • [a\]b] -> field a]b
  • [a\xb] -> field axb

Are these changes tested?

Added tests

Are there any user-facing changes?

Yes, there are some user-facing changes, but parquet-variant is still experient for now, so maybe we don't need to wait for a major version.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Jan 27, 2026
@klion26 klion26 force-pushed the variant-path-field-in-bracket branch from 88e1964 to 3201b0e Compare January 27, 2026 08:50
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good to me -- thank you @klion26

I think other than the term unchecked this PR looks reasonable to me

cc @scovich in case you have other thoughts


/// Parses a path string, panics on invalid input.
/// Only use for tests for known-valid input.
pub fn from_str_unchecked(s: &'a str) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rust and other parts of this crate the "unchecked" word is often used to denote unsafe code -- specifically that could lead to undefined behavior

I suggest we don't call it "unchecked"

I think just from_str is probably fine ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. changed to from_str_or_panic. because clippy discourage to implement a from_str manually, and the std::str::FromStr::from_str returns Result<Self, Self::Err>; which is the same as TryFrom.

 error: method `from_str` can be confused for the standard trait method `std::str::FromStr::from_str

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just using try_from("foo").unwrap() ?

@scovich
Copy link
Contributor

scovich commented Jan 29, 2026

Also support escaping \ inside brackets. If we force users to use brackets when the field contains special characters, maybe we can also close #8954

Out of curiosity, why would we need to force escaping anything other than ]? Even a \. wouldn't be enough to make splitting on . safe, so we'd just need a simple parser that scans the string looking for matching ] after encountering a [?

@klion26 klion26 force-pushed the variant-path-field-in-bracket branch from d76d9e7 to 8e2dcdb Compare February 9, 2026 12:24
@klion26 klion26 force-pushed the variant-path-field-in-bracket branch from 8e2dcdb to 634f236 Compare February 9, 2026 12:38
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @scovich thanks for the review and sorry for the late reply, I've been busy with other things lately.

Also support escaping \ inside brackets. If we force users to use brackets when the field contains special characters, maybe we can also close #8954

Out of curiosity, why would we need to force escaping anything other than ]? Even a \. wouldn't be enough to make splitting on . safe, so we'd just need a simple parser that scans the string looking for matching ] after encountering a [?

@scovich Sorry for the confusion. In the description, I want to say we only support special characters like '.' between '[' and ']', if the user wants to use special characters, he/she nees to use them between '[' and ']', so the path parser is simpler and easier to maintain. If we prefer to support special characters in addition to '[' and ']', we can continue with #8954 and then implement it.


/// Parses a path string, panics on invalid input.
/// Only use for tests for known-valid input.
pub fn from_str_unchecked(s: &'a str) -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. changed to from_str_or_panic. because clippy discourage to implement a from_str manually, and the std::str::FromStr::from_str returns Result<Self, Self::Err>; which is the same as TryFrom.

 error: method `from_str` can be confused for the standard trait method `std::str::FromStr::from_str

@alamb
Copy link
Contributor

alamb commented Feb 11, 2026

I'll try and review this one over the next day or two to get it into the 58 release

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klion26 -- this is looking good. I am sorry for the delay in reviewing.

/// let shredding_type = ShreddedSchemaBuilder::default()
/// // store the "time" field as a separate UTC timestamp
/// .with_path("time", (&DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".into())), true))
/// .with_path(VariantPath::from_str_or_panic("time"), (&DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".into())), true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so ugly compared to the previous version

It seems if we are going to do this, we should probably also change the signature of with_path because now P: Into<VariantPath<'a>>, is unlikely to match anything other than VariantPath

What if we changed the signature to take a TryFrom and return the correct error?

Then the example could look like

/// let shredding_type = ShreddedSchemaBuilder::default()
///     // store the "time" field as a separate UTC timestamp
///     .with_path("time", &DataType::Timestamp(TimeUnit::Nanosecond, Some("UTC".into())), true))?  <-- note the ? here
/// ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, this's indeed a better way.

single_variant_get_test(
r#"{"some_field": 1234}"#,
VariantPath::from("some_field"),
VariantPath::from_str_or_panic("some_field"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer to be honest:

Suggested change
VariantPath::from_str_or_panic("some_field"),
VariantPath::try_from("some_field").unwrap(),

Or maybe a helper like

fn parse_path(p: &str) -> VariantPath<'a> {
    VariantPath::try_from(p).unwrap(),
}

And then

            parse_path("some_field")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to try_from(..).unwrap()

#[test]
fn test_variant_path_empty_str() {
let path = VariantPath::from("");
let path = VariantPath::from_str_or_panic("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

        let path = VariantPath::try_from("").unwrap()

Is no less verbose and is easier to understand (as it uses the standard rust traits)

@klion26 klion26 force-pushed the variant-path-field-in-bracket branch from 6e3c4ce to 7a1d447 Compare February 12, 2026 15:58
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I've updated the code, please take another look, thanks.

let path: VariantPath<'a> = path.into();
let path: VariantPath<'a> = path
.try_into()
.map_err(|e| ArrowError::InvalidArgumentError(format!("{:?}", e)))?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this line and L544, because we'll have VaraintPath -> VariantPath, which is Infallible, and there is no From<Infallible> for ArrowError, ArrowError is located in arrow-schema, not sure if it's appropriate to make the change there. If add From<Infallible> for ArrowError is better, I can change it.

@alamb alamb added the api-change Changes to the arrow API label Feb 14, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klion26 -- I think this change looks good to me

FYI @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] support .. and ['fieldName'] syntax in the VariantPath parser [Variant] Support escaping mechanism for VariantPath parsing

3 participants