[Variant] Implement VariantArray::value for shredded variants#8105
[Variant] Implement VariantArray::value for shredded variants#8105alamb merged 8 commits intoapache:mainfrom
VariantArray::value for shredded variants#8105Conversation
There was a problem hiding this comment.
Comment out this for the draft version; I will remove it in the future.
parquet-variant-compute/src/lib.rs
Outdated
There was a problem hiding this comment.
this is an interesting idea -- to have two variations of each macro. It is probably a reasonable place to start
However, I think it would be nice to avoid having macros in the root lib.rs file -- what do you think about making a new module like parquet-variant-compute/src/arrow_types.rs or parquet-variant-compute/src/type_conversion.rs to hold the code to convert back and forth between Arrow types and Variant types
@scovich added similar code in VariantToPrimitive here: #8122
So I am thinking having a module to start collecting such code would be useful
There was a problem hiding this comment.
Place the macros in parquet-variant-compute/src/type_conversion.rs is better, will check #8122 and fix the remainings
There was a problem hiding this comment.
looks good -- we can probably change the DataType::Int32 branch to use this macro too, right?
Also, I think we should add a test
There was a problem hiding this comment.
Yes, only Int16 is implemented to confirm the direction
Indeed, we need to add tests to cover this logic.
parquet-variant-compute/src/lib.rs
Outdated
There was a problem hiding this comment.
Place the macros in parquet-variant-compute/src/type_conversion.rs is better, will check #8122 and fix the remainings
There was a problem hiding this comment.
Yes, only Int16 is implemented to confirm the direction
Indeed, we need to add tests to cover this logic.
|
marking as draft so it is clear this one isn't waiting on a review |
caa9420 to
d682bfd
Compare
19e0b2b to
363f9ad
Compare
|
@alamb, please help review this when you have time. Thanks. This is ready to review now. |
There was a problem hiding this comment.
Changed to a separate macro other than a macro rules in the same macro as before, I think the current implementation is better for the user to choose which one they need.
There was a problem hiding this comment.
👍
I was thinking about if there is some way to avoid defining 2 macros for each type of array, but given the branches are different (array.is_null --> null or Variant::Null ) I suspect the complexity of refactoring required would outweight any "don't repeat yourself" benefits
There was a problem hiding this comment.
Will add the remaining types in a follow-up pr
alamb
left a comment
There was a problem hiding this comment.
Thank you @klion26 -- I think this makes sense to me and moves us towards a nice set of "convert arrow --> variant" and back again
cc @carpecodeum @mprammer and @scovich
There was a problem hiding this comment.
👍
I was thinking about if there is some way to avoid defining 2 macros for each type of array, but given the branches are different (array.is_null --> null or Variant::Null ) I suspect the complexity of refactoring required would outweight any "don't repeat yourself" benefits
|
This PR has a conflict. Can you resolve it @klion26 ? Thank you! |
|
FYI @scovich |
There was a problem hiding this comment.
nit: why not just:
| continue; | |
| } | |
| $builder.append_variant(Variant::from(array.value(i))); | |
| } else { | |
| $builder.append_variant(Variant::from(array.value(i))); | |
| } |
There was a problem hiding this comment.
I'm fine with the ruturn/break/continue as soon as possible way, but if we have a strong preference here, I can change it.
There was a problem hiding this comment.
Definitely not a strong preference, no need to change if you like it as-is.
(I personally prefer early return/continue/etc any time the else block would be more than one line because it starts to save a lot of indentation. For one line, it's about the same either way)
There was a problem hiding this comment.
It seems like this is the only "real" macro, and the other two are just special cases?
But for that to work, this macros shouldn't take $input and $method as separate arguments, so caller is e.g.
- non_generic_conversion_array!(as_boolean, |v| v, input, builder);
+ non_generic_conversion_array!(input.as_boolean(), |v| v, builder);If that's acceptable then we can do:
macro_rules! non_generic_conversion_array {
($array:expr, $cast_fn:expr, $builder:expr) => {{
let array = $array;
...
}};
}and then
macro_rules! generic_conversion_array {
($t:ty, $method:ident, $cast_fn:expr, $input:expr, $builder:expr) => {
non_generic_conversion_array!($input.$method::<$t>, $cast_fn, $builder)
};
}and then
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}A similar factoring should work for the xxx_conversion_scalar macros.
There was a problem hiding this comment.
Added in a separate commit. The code looks cleaner than before. In the new version, there's something changed for the user. If we want to use the primitive_conversion macro, we may need to import the three macros(encounter this in parquet-variant-compute/src/variant_get/output.variant.rs).
There was a problem hiding this comment.
Oh! I forgot about that. You just have to use fully qualified paths, starting with $crate:
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
$crate::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}etc
There was a problem hiding this comment.
And if you want the macro to be usable outside the current crate, then instead:
use self as parquet_variant_compute;
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
parquet_variant_compute::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}... which allows the same path to work both inside and outside the crate.
Tho I suspect it would still break if somebody brought in the crate with a different name?
There was a problem hiding this comment.
I'm not sure those definitions should even be public tho? Why not ditch the #[macro_export] and:
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
$crate::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}
pub(crate) use primitive_conversion;There was a problem hiding this comment.
variant doesn't support negative scale?
There was a problem hiding this comment.
Oh, I see -- you're manually applying the scale and producing a scale=0 decimal result.
There was a problem hiding this comment.
| // For example: 123 with scale -2 becomes 12300 | |
| // For example: 123 with scale -2 becomes 12300 with scale 0 |
There was a problem hiding this comment.
| let multiplier = (10 as $value_type).pow((-*$scale) as u32); | |
| let multiplier = <$value_type>::pow(10, (-*$scale) as u32); |
There was a problem hiding this comment.
Are these fully accurate boundary checks? I worry that the truncating integer division could have corner cases.
I think we can do:
v.checked_mul(multiplier)
.and_then(|v| <$variant_type>::try_new(v, 0).ok())
.map_or(Variant::Null, Into::into)There was a problem hiding this comment.
| if *$scale < 0 { | |
| // For negative scale, we need to multiply the value by 10^|scale| | |
| // For example: 123 with scale -2 becomes 12300 | |
| let multiplier = (10 as $value_type).pow((-*$scale) as u32); | |
| // Check for overflow | |
| if $v > 0 && $v > <$value_type>::MAX / multiplier { | |
| return Variant::Null; | |
| } | |
| if $v < 0 && $v < <$value_type>::MIN / multiplier { | |
| return Variant::Null; | |
| } | |
| <$variant_type>::try_new($v * multiplier, 0) | |
| .map(|v| v.into()) | |
| .unwrap_or(Variant::Null) | |
| } else { | |
| <$variant_type>::try_new($v, *$scale as u8) | |
| .map(|v| v.into()) | |
| .unwrap_or(Variant::Null) | |
| } | |
| let (v, scale) = if *$scale < 0 { | |
| // For negative scale, we need to multiply the value by 10^-scale | |
| // For example: 123 with scale -2 becomes 12300 with scale 0 | |
| let multiplier = <$value_type>::pow(10, (-*$scale) as u32); | |
| ($v.checked_mul($v), 0u8) | |
| } else { | |
| (Some($v), *$scale as u8) | |
| }; | |
| $v.and_then(|v| <$variant_type>::try_new(v, scale).ok()) | |
| .map_or(Variant::Null, Into::into) |
There was a problem hiding this comment.
Isn't this just non_generic_conversion_array?
There was a problem hiding this comment.
Yes, I've removed the redundant macros. My bad for not figuring it out when copying it.
There was a problem hiding this comment.
Taking the advice above for non_generic_conversion_array:
| let array = $input.$method::<$offset_type>(); | |
| for i in 0..array.len() { | |
| if array.is_null(i) { | |
| $builder.append_null(); | |
| continue; | |
| } | |
| let cast_value = $cast_fn(array.value(i)); | |
| $builder.append_variant(Variant::from(cast_value)); | |
| } | |
| non_generic_conversion_array!($input.$method::<$offset_type>(), $cast_fn, $builder) |
There was a problem hiding this comment.
| macro_rules! primitive_conversion { | |
| macro_rules! primitive_conversion_array { |
(to match the others?)
783cb22 to
f813b08
Compare
There was a problem hiding this comment.
I'm fine with the ruturn/break/continue as soon as possible way, but if we have a strong preference here, I can change it.
There was a problem hiding this comment.
Yes, I've removed the redundant macros. My bad for not figuring it out when copying it.
There was a problem hiding this comment.
Added in a separate commit. The code looks cleaner than before. In the new version, there's something changed for the user. If we want to use the primitive_conversion macro, we may need to import the three macros(encounter this in parquet-variant-compute/src/variant_get/output.variant.rs).
scovich
left a comment
There was a problem hiding this comment.
Looks good, but please double check that return in primitive_conversion_single_value -- I'm pretty sure it's incorrect and only works by accident today.
| generic_conversion_array!( | ||
| Float16Type, | ||
| as_primitive, | ||
| |v: f16| -> f32 { v.into() }, |
There was a problem hiding this comment.
aside:
| |v: f16| -> f32 { v.into() }, | |
| f32::from, |
| generic_conversion_array!( | ||
| Decimal256Type, | ||
| as_primitive, | ||
| |v: i256| { |
There was a problem hiding this comment.
aside: L237-241 below could simplify to just:
v.to_i128().map_or(
Variant::Null,
decimal_to_variant_decimal!(v, scale, i128, VariantDecimal16,
)There was a problem hiding this comment.
We can't change this, because compile thinks that v passed into the macro is i256, and seems we can't cast it when calling the macro.
There was a problem hiding this comment.
Oh, I had a silly typo, sorry --
v.to_i128().map_or(
Variant::Null,
|v| decimal_to_variant_decimal!(v, scale, i128, VariantDecimal16),
)(missing |v| in the closure)
| generic_conversion_array!( | ||
| Time64NanosecondType, | ||
| as_primitive, | ||
| |v| NaiveTime::from_num_seconds_from_midnight_opt( |
There was a problem hiding this comment.
aside: @alamb -- I'm not sure CI is running fmt against this file? At least, I've never seen it willing to omit trailing commas for non-macro invocations (L412), and it always formats multi-line lambdas with curly braces even tho I'd personally prefer it didn't:
|v| {
NaiveTime::foo(
a,
b,
)
}| ($t:ty, $input:expr, $index:expr) => {{ | ||
| let array = $input.as_primitive::<$t>(); | ||
| if array.is_null($index) { | ||
| return Variant::Null; |
There was a problem hiding this comment.
return from a macro seems dangerous/wrong? It would return from whatever function invoked the macro which is probably not what the caller expected? Is it even what the macro writer intended? To return Variant::Null from the function on NULL, but the macro invocation produces a normal Variant otherwise?
There was a problem hiding this comment.
Yes, return from a macro seems wrong, changed the implementation.
|
@scovich Thanks for the quick review. I've addressed the comments. Please take another look. |
|
Fixed in the latest commit
|
| @@ -54,7 +55,7 @@ macro_rules! non_generic_conversion_single_value { | |||
| #[macro_export] | |||
There was a problem hiding this comment.
nit: There's no point (publicly) exporting a macro that references $crate -- compilation will fail for any use sites outside this crate. Can do pub(crate) use generic_conversion_array to make it visible everywhere inside the crate.
Ah! The latter is in the |
| macro_rules! generic_conversion_single_value { | ||
| ($t:ty, $method:ident, $cast_fn:expr, $input:expr, $index:expr) => {{ | ||
| let array = $input.$method::<$t>(); |
There was a problem hiding this comment.
Can this not be expressed in terms of non_generic_conversion_single_value?
| macro_rules! primitive_conversion_single_value { | ||
| ($t:ty, $input:expr, $index:expr) => {{ | ||
| let array = $input.as_primitive::<$t>(); |
There was a problem hiding this comment.
Can this not be expressed in terms of generic_conversion_single_value?
|
Nothing but nits left at this point, can fix them before or after merge. |
|
@scovich Thank you very much for the detailed and patient review. Addressed the comments in the last commit. One more thing: I changed all the macros to |
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
VariantArray::valuefor shredded variants #8091 .Rationale for this change
Implement
VariantArray::valuefor some more shredded variants(eg. primitive_conversion/generic_conversion/non_generic_conversion).What changes are included in this PR?
macroRulesto a separate moduletype_conversion.rsvariant valueAre these changes tested?
Covered by the existing test
Are there any user-facing changes?
No