Skip to content

fix: tableFromJSON and tableFromArrays crash on arrays of strings#405

Open
kentkwu wants to merge 1 commit intoapache:mainfrom
kentkwu:gh-86
Open

fix: tableFromJSON and tableFromArrays crash on arrays of strings#405
kentkwu wants to merge 1 commit intoapache:mainfrom
kentkwu:gh-86

Conversation

@kentkwu
Copy link
Contributor

@kentkwu kentkwu commented Mar 4, 2026

What's Changed

Applies a fix for type inference crashing on encountering an array of strings.

Added a compareTypesForInference helper that compares Dictionary types structurally, ignoring IDs. It recurses through List and Struct for nested cases. The public compareTypes is unchanged. ID equality still matters for IPC comparisons.

Root cause:

Strings infer to Dictionary<Utf8, Int32>, and every Dictionary construction auto-increments a global ID. The array homogeneity check in inferType calls compareTypes on separately inferred types, which requires matching IDs — so it always fails for string arrays.

Closes #86, #50

@kou
Copy link
Member

kou commented Mar 5, 2026

@matheuzinoficial @hemkumarrao-ni @knight556 Could you try this fix?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes tableFromJSON() / tableFromArrays() type inference crashes when encountering arrays of strings by comparing inferred types structurally (ignoring Dictionary IDs that auto-increment per construction), including nested List and Struct cases.

Changes:

  • Added an internal compareTypesForInference() helper to compare Dictionary types without requiring matching IDs, with recursion for List and Struct.
  • Updated list homogeneity checking in inferType() to use the new inference comparator.
  • Added unit tests covering arrays of strings and nested objects containing strings for both tableFromArrays() and tableFromJSON().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/factories.ts Introduces ID-agnostic type comparison for inference and applies it to list inference checks to prevent false mismatches for string dictionaries.
test/unit/table/table-test.ts Adds regression tests demonstrating correct behavior for string arrays and nested string-containing structures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

const array = value as Array<unknown>[];
const childType = inferType(array[array.findIndex((ary) => ary != null)]);
if (array.every((ary) => ary == null || compareTypes(childType, inferType(ary)))) {
if (array.every((ary) => ary == null || compareTypesForInference(childType, inferType(ary)))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just passed the dictionaryId down through inferType? e.g.

Suggested change
if (array.every((ary) => ary == null || compareTypesForInference(childType, inferType(ary)))) {
if (array.every((ary) => ary == null || compareTypes(childType, inferType(ary, dtypes.DataType.isDictionary(childType) ? childType.id : null)))) {

Then we could pass the dictionary id into the Dictionary constructor on L142?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably need to maintain a Map of key -> dictionary id in order to do the same thing for dictionary-encoded struct children.

Copy link
Contributor

@trxcllnt trxcllnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment with a suggestion, curious to hear your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JS] tableFromJSON cannot handle nested objects containing strings

4 participants