Add Bound DType Cast Functions and Extension Cast Hooks
Motivation
Cast currently handles canonical casts with hard-coded dispatch, and ExtensionArray only supports same-extension/nullability casts or unwrapping to storage. This makes metadata-sensitive casts like timestamp(ms) -> timestamp(ns) hard to express, even though the extension dtype metadata contains the information needed to decide whether the cast is valid.
Proposal
Keep casts represented as ScalarFnArray(Cast, target_dtype, child) so existing pushdown and ExecuteParent behavior continues to work.
Refactor cast execution around a type-erased cast function:
pub type CastFn =
Arc<dyn Fn(ArrayRef, &mut ExecutionCtx) -> VortexResult<ArrayRef> + Send + Sync>;
The returned function is already bound to the concrete source and target dtypes, so it can capture the target dtype and any extension metadata it needs. CastFn is only the callable execution body; binding remains owned by the Cast scalar function.
Design
Add a dtype-level binder as an inherent method on Cast, used by Cast::execute:
impl Cast {
pub(crate) fn bind_dtype_cast(
source: &DType,
target: &DType,
) -> VortexResult<Option<CastFn>>;
}
Binding order:
- Exact dtype: no-op cast.
- Same dtype ignoring nullability: nullability cast.
- Source extension hook:
source_ext.cast_to(target).
- Target extension hook:
target_ext.cast_from(source).
- Built-in non-extension casts: primitive, decimal, bool, varbin, list, struct, etc.
Add optional extension hooks:
fn cast_to(ext_dtype: &ExtDType<Self>, target: &DType) -> VortexResult<Option<CastFn>> {
Ok(None)
}
fn cast_from(ext_dtype: &ExtDType<Self>, source: &DType) -> VortexResult<Option<CastFn>> {
Ok(None)
}
These hooks receive the concrete ExtDType, so implementations can inspect metadata.
Extension Behavior
ext -> storage/non-ext: source-side hook may unwrap storage and recursively bind storage-to-target.
storage/non-ext -> ext: no universal default; extension must opt in because storage dtype alone may not prove semantic validity.
ext -> ext: source-side hook handles metadata-sensitive casts, for example timestamp unit scaling when timezone metadata matches.
Implementation Steps
- Introduce
CastFn and Cast::bind_dtype_cast.
- Move existing canonical cast dispatch behind
bind_builtin_cast.
- Update
Cast::execute to bind from input.dtype() to target dtype, then call the returned CastFn with the input array and execution context.
- Add
cast_to and cast_from hooks to ExtVTable and erased dispatch.
- Implement timestamp extension casts for compatible unit conversions.
- Keep
CastReduce and CastExecuteAdaptor unchanged for pushdown and fused execution paths.
Tests
- Existing cast conformance continues to pass.
timestamp(ms, tz=UTC) -> timestamp(ns, tz=UTC) succeeds and scales values.
timestamp(ms, tz=UTC) -> timestamp(ns, tz=Other) fails.
extension -> storage still works.
storage -> extension fails unless the extension explicitly supports it.
- Returned cast functions produce arrays with the requested target dtype.
Non-Goals
- No standalone cast registry yet.
- No new
CastArray; keep using ScalarFnArray(Cast).
- No public exhaustive list of all supported extension casts, since support can depend on extension metadata.
Add Bound DType Cast Functions and Extension Cast Hooks
Motivation
Castcurrently handles canonical casts with hard-coded dispatch, andExtensionArrayonly supports same-extension/nullability casts or unwrapping to storage. This makes metadata-sensitive casts liketimestamp(ms) -> timestamp(ns)hard to express, even though the extension dtype metadata contains the information needed to decide whether the cast is valid.Proposal
Keep casts represented as
ScalarFnArray(Cast, target_dtype, child)so existing pushdown andExecuteParentbehavior continues to work.Refactor cast execution around a type-erased cast function:
The returned function is already bound to the concrete source and target dtypes, so it can capture the target dtype and any extension metadata it needs.
CastFnis only the callable execution body; binding remains owned by theCastscalar function.Design
Add a dtype-level binder as an inherent method on
Cast, used byCast::execute:Binding order:
source_ext.cast_to(target).target_ext.cast_from(source).Add optional extension hooks:
These hooks receive the concrete
ExtDType, so implementations can inspect metadata.Extension Behavior
ext -> storage/non-ext: source-side hook may unwrap storage and recursively bind storage-to-target.storage/non-ext -> ext: no universal default; extension must opt in because storage dtype alone may not prove semantic validity.ext -> ext: source-side hook handles metadata-sensitive casts, for example timestamp unit scaling when timezone metadata matches.Implementation Steps
CastFnandCast::bind_dtype_cast.bind_builtin_cast.Cast::executeto bind frominput.dtype()to target dtype, then call the returnedCastFnwith the input array and execution context.cast_toandcast_fromhooks toExtVTableand erased dispatch.CastReduceandCastExecuteAdaptorunchanged for pushdown and fused execution paths.Tests
timestamp(ms, tz=UTC) -> timestamp(ns, tz=UTC)succeeds and scales values.timestamp(ms, tz=UTC) -> timestamp(ns, tz=Other)fails.extension -> storagestill works.storage -> extensionfails unless the extension explicitly supports it.Non-Goals
CastArray; keep usingScalarFnArray(Cast).