Skip to content

Upsert fails after update_schema().union_by_name() due to schema mismatch #3105

@Saamu192

Description

@Saamu192

Apache Iceberg version

0.10.0

Please describe the bug 🐞

When performing an upsert operation after adding a new column via update_schema().union_by_name() , the operation fails with a ValueError indicating that the schema field names don't match.

To reproduce:

from pyiceberg.catalog import load_catalog
import polars as pl

catalog = load_catalog("default", **{"type": "in-memory"})

df = pl.DataFrame(
    [
        {"id": 1, "name": "Alice", "age": 30, "city": "São Paulo"},
        {"id": 2, "name": "Bob", "age": 25, "city": "Rio de Janeiro"},
        {"id": 3, "name": "Carol", "age": 35, "city": "Belo Horizonte"},
        {"id": 4, "name": "David", "age": 28, "city": "Curitiba"},
    ]
)

arrow = df.to_arrow()

catalog.create_namespace_if_not_exists("default")
catalog.create_table_if_not_exists("default.my_table", arrow.schema)
table = catalog.load_table("default.my_table")

try:
    table.append(arrow)
    
    # Add a new column
    arrow = df.with_columns(ping=pl.lit("pong")).to_arrow()
    
    # Update schema to include the new column
    with table.update_schema() as update_schema:
        update_schema.union_by_name(arrow.schema)
        table = table.refresh()
    
    # This fails with ValueError
    table.upsert(arrow, ["id"])
finally:
    catalog.drop_table("default.my_table")

Error:
ValueError: Target schema's field names are not matching the table's field names: ['id', 'name', 'age', 'city', 'ping'], ['id', 'name', 'age', 'city']

Stack trace:

  File "pyiceberg/table/__init__.py", line 1343, in upsert
    return tx.upsert(
  File "pyiceberg/table/__init__.py", line 825, in upsert
    rows_to_update = upsert_util.get_rows_to_update(df, rows, join_cols)
  File "pyiceberg/table/upsert_util.py", line 92, in get_rows_to_update
    source_table.cast(target_table.schema)
  File "pyarrow/table.pxi", line 4721, in pyarrow.lib.Table.cast

Expected:
The upsert operation should succeed after the schema has been updated to include the new column.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions