feat: add schema parameter to tableFromArrays and new recordBatchFromArrays factory by rustyconover · Pull Request #385 · apache/arrow-js

rustyconover · 2026-02-19T22:35:04Z

What's Changed

Allow callers to pass an explicit Schema to tableFromArrays() and a new recordBatchFromArrays() function, giving control over column types, ordering, nullability, and metadata instead of relying solely on type inference.

Also adds a fast path in vectorFromArray for TypedArray-to-typed-vector coercion with BigInt boundary validation.

…Arrays factory Allow callers to pass an explicit Schema to tableFromArrays() and a new recordBatchFromArrays() function, giving control over column types, ordering, nullability, and metadata instead of relying solely on type inference. Also adds a fast path in vectorFromArray for TypedArray-to-typed-vector coercion with BigInt boundary validation.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

trxcllnt · 2026-03-05T04:45:32Z

+            const data = init.constructor === type.ArrayType
+                ? init                                  // zero-copy, same TypedArray type
+                : new (type.ArrayType as any)(init);    // standard JS TypedArray conversion


I believe this is redundant, as this will happen in makeVector via makeData calling toArrayBufferView on the incoming typed array here and here.

toArrayBufferView returns a zero-copy ArrayBufferView of the desired type (here) when given an ArrayBuffer (or any ArrayBufferView).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-05T05:44:55Z

+        if (dtypes.DataType.isInt(type) || dtypes.DataType.isFloat(type)) {
+            const data = init.constructor === type.ArrayType
+                ? init                                  // zero-copy, same TypedArray type
+                : new (type.ArrayType as any)(init);    // standard JS TypedArray conversion
+            return makeVector({ type, data, offset: 0, length: data.length, nullCount: 0 } as any);
+        }


The fast path for Float16 type is incorrect when the input TypedArray is not already a Uint16Array. Float16 uses Uint16Array as its ArrayType, but the values stored are IEEE 754 half-precision encoded, not plain integers. Doing new Uint16Array(new Float32Array([1.5])) yields Uint16Array([1]), not the correct half-precision encoding (0x3E00).

Consider excluding Float16 (i.e., Precision.HALF) from this fast path so it falls through to the builder, which correctly handles the float16 encoding. For example, the condition could additionally check !(dtypes.DataType.isFloat(type) && (type as dtypes.Float).precision === Precision.HALF).

trxcllnt

I love this idea, we should totally do something like this. However, I wonder if we shouldn't accept an IterableBuilderOptions instead of Schema, or accept both and convert the schema into an IterableBuilderOptions.

The reason I ask is Unions.

If someone has a JS Array of a distinct number of types, they might want to encode that more efficiently as a DenseUnion. But to do that, they need to give us a function to map each value to its typeId, which is what the valueToChildTypeId function in UnionBuilderOptions enables:

let num = new Field("num", new Float64);
let str = new Field("str", new Utf8);
let struct = new Field("struct", new Struct([str]));
let nullValues = [null, undefined];

vectorFromArray(
  [123, "a", "b", "c", { str: "hello" }, { str: "goodbye" }],
  {
    type: new DenseUnion([0, 1, 2], [num, str, struct]),
    children: {
      "num": { type: num.type, nullValues, },
      "str": { type: str.type, nullValues, },
      "struct": { type: struct.type, nullValues, },
    },
    valueToChildTypeId(_, value, _) {
      switch(typeof value) {
        case "number": return 0;
        case "string": return 1;
        case "object": return 2;
      }
    },
    nullValues,
  }
)

vectorFromArray uses the vector Builder machinery under the hood, so this seems like something worth enabling. As a bonus, users can pass in their own queueingStrategy and highWaterMark to control chunking, or use a custom hash function (i.e. node-metrohash) when dictionary encoding.

What're your thoughts?

rustyconover force-pushed the feat_schema_builders branch from 31e9efd to 38b7fa6 Compare February 19, 2026 22:47

kou requested review from Copilot, domoritz and trxcllnt March 5, 2026 02:32

Copilot started reviewing on behalf of kou March 5, 2026 02:33 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

trxcllnt reviewed Mar 5, 2026

View reviewed changes

kou requested a review from Copilot March 5, 2026 05:37

Copilot started reviewing on behalf of kou March 5, 2026 05:38 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

trxcllnt requested changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add schema parameter to tableFromArrays and new recordBatchFromArrays factory#385

feat: add schema parameter to tableFromArrays and new recordBatchFromArrays factory#385
rustyconover wants to merge 1 commit into
apache:mainfrom
Query-farm:feat_schema_builders

rustyconover commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

trxcllnt Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

trxcllnt left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rustyconover commented Feb 19, 2026

What's Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

trxcllnt Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

trxcllnt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

trxcllnt left a comment •

edited

Loading