Python SDK read_parquet union_by_name fail after version 1.2.2

### What happens?

[duckdbtest.zip](https://github.com/user-attachments/files/24500181/duckdbtest.zip)

### What happens?


## Title
Regression in 1.3.0+: `union_by_name` fails with "Can't change source type (NULL) to target type (VARCHAR[])" when reading parquet files with mixed NULL/LIST types

## DuckDB Version
- **Working version**: 1.2.2
- **Broken versions**: 1.3.0, 1.3.1 (and later)

## Environment
- OS: Linux
- Python: 3.12.9
- pandas: (latest)

## Description

Starting with DuckDB 1.3.0, reading multiple parquet files with `union_by_name=True` fails when:
1. Some parquet files have a column stored as NULL type (because all values are null in that file)
2. Other parquet files have the same column properly typed as VARCHAR[] (array/list of strings)

This worked correctly in DuckDB 1.2.2 but now throws:
```
BinderException: Binder Error: Can't change source type ("NULL") to target type (VARCHAR[]), type conversion not allowed
```

### Expected Behavior
When `union_by_name=True` is set, DuckDB should merge schemas gracefully, treating NULL-typed columns as compatible with any target type (similar to how pandas handles this).

### Actual Behavior
DuckDB 1.3.0+ throws a `BinderException` and refuses to read the files, even though `union_by_name=True` is explicitly designed to handle schema variations across multiple files.

## Root Cause Analysis

Investigation shows:
- When a parquet file has ALL NULL values for a column, it's stored with NULL type (e.g., `INT32` with `NullType()` logical type)
- Other files with actual data store the same column as `BYTE_ARRAY` with `StringType()` or complex types like `ListType()`
- The error specifically mentions `VARCHAR[]` (array type) suggesting it happens with nested/complex types
- This regression appeared between versions 1.2.2 and 1.3.0




### To Reproduce


attached files to test see [duckdbtest.zip](https://github.com/user-attachments/files/24500181/duckdbtest.zip)

```python
import duckdb
print(f"DuckDB version: {duckdb.__version__}")

# Fails with 1.3.0+
try:
    result = duckdb.read_parquet(
        "duckdb_bug_test_files/*.parquet",
        union_by_name=True
    ).df()
    print(f"SUCCESS: Read {len(result)} rows")
except Exception as e:
    print(f"FAILED: {type(e).__name__}: {e}")
```



### OS:

Linux x86

### DuckDB Package Version:

v1.2.2, v1.3.0 and later

### Python Version:

3.10

### Full Name:

Zack Dai

### Affiliation:

Zack Dai

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

### Did you include all relevant data sets for reproducing the issue?

Yes

### Did you include all code required to reproduce the issue?

- [x] Yes, I have

### Did you include all relevant configuration to reproduce the issue?

- [x] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python SDK read_parquet union_by_name fail after version 1.2.2 #257

What happens?

What happens?

Title

DuckDB Version

Environment

Description

Expected Behavior

Actual Behavior

Root Cause Analysis

To Reproduce

OS:

DuckDB Package Version:

Python Version:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python SDK read_parquet union_by_name fail after version 1.2.2 #257

Description

What happens?

What happens?

Title

DuckDB Version

Environment

Description

Expected Behavior

Actual Behavior

Root Cause Analysis

To Reproduce

OS:

DuckDB Package Version:

Python Version:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions