Skip to content

Casting #8570

Description

@gatesn

Add Bound DType Cast Functions and Extension Cast Hooks

Motivation

Cast currently handles canonical casts with hard-coded dispatch, and ExtensionArray only supports same-extension/nullability casts or unwrapping to storage. This makes metadata-sensitive casts like timestamp(ms) -> timestamp(ns) hard to express, even though the extension dtype metadata contains the information needed to decide whether the cast is valid.

Proposal

Keep casts represented as ScalarFnArray(Cast, target_dtype, child) so existing pushdown and ExecuteParent behavior continues to work.

Refactor cast execution around a type-erased cast function:

pub type CastFn =
    Arc<dyn Fn(ArrayRef, &mut ExecutionCtx) -> VortexResult<ArrayRef> + Send + Sync>;

The returned function is already bound to the concrete source and target dtypes, so it can capture the target dtype and any extension metadata it needs. CastFn is only the callable execution body; binding remains owned by the Cast scalar function.

Design

Add a dtype-level binder as an inherent method on Cast, used by Cast::execute:

impl Cast {
    pub(crate) fn bind_dtype_cast(
        source: &DType,
        target: &DType,
    ) -> VortexResult<Option<CastFn>>;
}

Binding order:

  1. Exact dtype: no-op cast.
  2. Same dtype ignoring nullability: nullability cast.
  3. Source extension hook: source_ext.cast_to(target).
  4. Target extension hook: target_ext.cast_from(source).
  5. Built-in non-extension casts: primitive, decimal, bool, varbin, list, struct, etc.

Add optional extension hooks:

fn cast_to(ext_dtype: &ExtDType<Self>, target: &DType) -> VortexResult<Option<CastFn>> {
    Ok(None)
}

fn cast_from(ext_dtype: &ExtDType<Self>, source: &DType) -> VortexResult<Option<CastFn>> {
    Ok(None)
}

These hooks receive the concrete ExtDType, so implementations can inspect metadata.

Extension Behavior

  • ext -> storage/non-ext: source-side hook may unwrap storage and recursively bind storage-to-target.
  • storage/non-ext -> ext: no universal default; extension must opt in because storage dtype alone may not prove semantic validity.
  • ext -> ext: source-side hook handles metadata-sensitive casts, for example timestamp unit scaling when timezone metadata matches.

Implementation Steps

  1. Introduce CastFn and Cast::bind_dtype_cast.
  2. Move existing canonical cast dispatch behind bind_builtin_cast.
  3. Update Cast::execute to bind from input.dtype() to target dtype, then call the returned CastFn with the input array and execution context.
  4. Add cast_to and cast_from hooks to ExtVTable and erased dispatch.
  5. Implement timestamp extension casts for compatible unit conversions.
  6. Keep CastReduce and CastExecuteAdaptor unchanged for pushdown and fused execution paths.

Tests

  • Existing cast conformance continues to pass.
  • timestamp(ms, tz=UTC) -> timestamp(ns, tz=UTC) succeeds and scales values.
  • timestamp(ms, tz=UTC) -> timestamp(ns, tz=Other) fails.
  • extension -> storage still works.
  • storage -> extension fails unless the extension explicitly supports it.
  • Returned cast functions produce arrays with the requested target dtype.

Non-Goals

  • No standalone cast registry yet.
  • No new CastArray; keep using ScalarFnArray(Cast).
  • No public exhaustive list of all supported extension casts, since support can depend on extension metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions