Skip to content

Python SDK read_parquet union_by_name fail after version 1.2.2 #257

@DaiZack

Description

@DaiZack

What happens?

duckdbtest.zip

What happens?

Title

Regression in 1.3.0+: union_by_name fails with "Can't change source type (NULL) to target type (VARCHAR[])" when reading parquet files with mixed NULL/LIST types

DuckDB Version

  • Working version: 1.2.2
  • Broken versions: 1.3.0, 1.3.1 (and later)

Environment

  • OS: Linux
  • Python: 3.12.9
  • pandas: (latest)

Description

Starting with DuckDB 1.3.0, reading multiple parquet files with union_by_name=True fails when:

  1. Some parquet files have a column stored as NULL type (because all values are null in that file)
  2. Other parquet files have the same column properly typed as VARCHAR[] (array/list of strings)

This worked correctly in DuckDB 1.2.2 but now throws:

BinderException: Binder Error: Can't change source type ("NULL") to target type (VARCHAR[]), type conversion not allowed

Expected Behavior

When union_by_name=True is set, DuckDB should merge schemas gracefully, treating NULL-typed columns as compatible with any target type (similar to how pandas handles this).

Actual Behavior

DuckDB 1.3.0+ throws a BinderException and refuses to read the files, even though union_by_name=True is explicitly designed to handle schema variations across multiple files.

Root Cause Analysis

Investigation shows:

  • When a parquet file has ALL NULL values for a column, it's stored with NULL type (e.g., INT32 with NullType() logical type)
  • Other files with actual data store the same column as BYTE_ARRAY with StringType() or complex types like ListType()
  • The error specifically mentions VARCHAR[] (array type) suggesting it happens with nested/complex types
  • This regression appeared between versions 1.2.2 and 1.3.0

To Reproduce

attached files to test see duckdbtest.zip

import duckdb
print(f"DuckDB version: {duckdb.__version__}")

# Fails with 1.3.0+
try:
    result = duckdb.read_parquet(
        "duckdb_bug_test_files/*.parquet",
        union_by_name=True
    ).df()
    print(f"SUCCESS: Read {len(result)} rows")
except Exception as e:
    print(f"FAILED: {type(e).__name__}: {e}")

OS:

Linux x86

DuckDB Package Version:

v1.2.2, v1.3.0 and later

Python Version:

3.10

Full Name:

Zack Dai

Affiliation:

Zack Dai

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions