Skip to content

feat: add array_normalize scalar function#22013

Open
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:feat/array-normalize
Open

feat: add array_normalize scalar function#22013
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:feat/array-normalize

Conversation

@crm26
Copy link
Copy Markdown
Contributor

@crm26 crm26 commented May 5, 2026

Which issue does this PR close?

Part of #21536 — split of #21371 into one-function-per-PR. Third in the series after #21542 (cosine_distance) and #21861 (inner_product).

Rationale for this change

Adds array_normalize(array) — the L2-normalized version of a numeric input vector. Computed as array[i] / sqrt(sum(array[i]^2)) per element. Returns the same shape as the input (List<Float64> or LargeList<Float64>).

Aliased as list_normalize to match the array_X/list_X convention used across the crate.

What changes are included in this PR?

Coercion shell mirrors the merged cosine_distance/inner_product pattern:

  • coerce_types accepts List/LargeList/FixedSizeList of any numeric inner type, plus bare NULL. After coercion the inner function only sees List(Float64) or LargeList(Float64).
  • Per-row L2 norm computed inline (no shared module), using a single as_float64_array(list_array.values()) downcast plus value_offsets() slicing — no per-row downcasts.
  • Manual list builder: Vec<f64> for values, Vec<O> for offsets, NullBuffer for row validity.

Per-row semantics:

  • NULL row → NULL output
  • NULL element in list → NULL row
  • Empty list → empty list (no division-by-zero hazard)
  • Zero magnitude → NULL row (consistent with cosine_distance's zero-magnitude → NULL)
  • Otherwise → divide each element by sqrt(sum-of-squares)

Are these changes tested?

Yes. SLT covers:

  • 3-4-5 right triangle, 3D vector, already-unit-axis, single non-zero component, negative components
  • Bare NULL input, NULL element in list, zero vector, empty array
  • LargeList, FixedSizeList (via coercion), Float32 and Int64 inner types, integer literals
  • Multi-row query mixing normal / NULL row / zero-vector row / null-element row
  • Plan error for non-list input
  • No-args error
  • Return-type assertion (List(Float64))
  • list_normalize alias coverage (constant + multi-row with NULL)

Are there any user-facing changes?

New scalar function array_normalize (alias list_normalize), documented in docs/source/user-guide/sql/scalar_functions.md.

@github-actions github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 5, 2026
let mut new_values: Vec<f64> = Vec::with_capacity(values.len());
let mut new_offsets: Vec<O> = Vec::with_capacity(list_array.len() + 1);
new_offsets.push(O::usize_as(0));
let mut validity: Vec<bool> = Vec::with_capacity(list_array.len());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NullBufferBuilder here instead. One benefit is when finishing it, it may output None if there are no nulls (currently we always provide a null buffer even if there are no nulls)

let offsets = list_array.value_offsets();

let mut new_values: Vec<f64> = Vec::with_capacity(values.len());
let mut new_offsets: Vec<O> = Vec::with_capacity(list_array.len() + 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be simpler to use OffsetBufferBuilder here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants