Skip to content

Fix TypeError in create_match_filter for Composite Keys with Single Unique Condition#1693

Merged
Fokko merged 18 commits into
apache:mainfrom
omkenge:upsert_issue
Feb 21, 2025
Merged

Fix TypeError in create_match_filter for Composite Keys with Single Unique Condition#1693
Fokko merged 18 commits into
apache:mainfrom
omkenge:upsert_issue

Conversation

@omkenge

@omkenge omkenge commented Feb 20, 2025

Copy link
Copy Markdown
Contributor

Old Code Behavior:

Even if there's only one such condition, the code wraps it in an Or() operator.
But Or() is meant to combine two or more conditions (like “condition A OR condition B”).
If you give it only one condition, it complains because it expects a second condition.

New Code Behavior:

The new change checks how many conditions you have.
If there's only one condition, it simply returns that condition.
If there are more than one, it uses Or() to combine them.

@omkenge omkenge mentioned this pull request Feb 20, 2025
@omkenge omkenge changed the title Upsert Issue Fix TypeError in create_match_filter for Composite Keys with Single Unique Condition Feb 20, 2025

@kevinjqliu kevinjqliu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this! can you also add a test case too?

@omkenge

omkenge commented Feb 20, 2025

Copy link
Copy Markdown
Contributor Author

@kevinjqliu
The existing test already covers the composite key scenario. With the fix, it exercises the code path that previously caused the error, so no new test case is needed.

@omkenge

omkenge commented Feb 20, 2025

Copy link
Copy Markdown
Contributor Author

Old Senario
Problem: The Or() function expects two or more conditions (one on the left, one on the right). But here it is given only one condition.
Result: This produces a TypeError because Or() is missing its "right" argument.

@corleyma

Copy link
Copy Markdown

@omkenge I think the ask is to add a test covering the case where there is a single condition. This test would have failed prior to your change, and should pass with your change.

@omkenge

omkenge commented Feb 20, 2025

Copy link
Copy Markdown
Contributor Author

@corleyma @kevinjqliu
I tried , But not sure I need help from you

@omkenge

omkenge commented Feb 21, 2025

Copy link
Copy Markdown
Contributor Author

Issue:
When no rows are updated (i.e., none of the source composite keys match the target), the code calls pa.Table.from_arrays([], names=...). This passes 0 arrays for a schema that expects 6 arrays(origianl schema of table), causing the error.
ERROR

    rows_to_update = upsert_util.get_rows_to_update(df, matched_iceberg_table, join_cols)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/pyiceberg/pyiceberg/table/upsert_util.py", line 89, in get_rows_to_update
    rows_to_update_table = pa.Table.from_arrays([], names=source_table.column_names)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4851, in pyarrow.lib.Table.from_arrays
  File "pyarrow/table.pxi", line 1593, in pyarrow.lib._sanitize_arrays
  File "pyarrow/table.pxi", line 1557, in pyarrow.lib._schema_from_arrays
ValueError: Length of names (6) does not match length of arrays (0)

Solution:
Create an empty array for each column in the source schema:

empty_arrays = [pa.array([], type=field.type) for field in source_table.schema]
rows_to_update_table = pa.Table.from_arrays(empty_arrays, schema=source_table.schema)

Comment thread pyiceberg/table/upsert_util.py Outdated
@Fokko Fokko added this to the PyIceberg 0.9.0 milestone Feb 21, 2025
Comment thread pyiceberg/table/upsert_util.py Outdated
@Fokko

Fokko commented Feb 21, 2025

Copy link
Copy Markdown
Contributor

@omkenge I ran into the same issue, and saw that you already fixed this 🚀 Thanks for jumping on this right away! I left some suggestions, but it looks like it is heading into the right direction 👍

omkenge and others added 2 commits February 21, 2025 14:43
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@omkenge

omkenge commented Feb 21, 2025

Copy link
Copy Markdown
Contributor Author

@Fokko What is your opinion on Test Cases? my point of view is we do not need test cases for this change

Comment thread tests/table/test_upsert.py Outdated
omkenge and others added 2 commits February 21, 2025 15:15
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Fokko

Fokko commented Feb 21, 2025

Copy link
Copy Markdown
Contributor

@omkenge can fix the import?

from pyiceberg.expressions import (
    And,
    BooleanExpression,
    EqualTo,
    In,
    Or,
)

Should be:

from pyiceberg.expressions import (
    AlwaysFalse,
    And,
    BooleanExpression,
    EqualTo,
    In,
    Or,
)

Comment thread tests/table/test_upsert.py Outdated
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Comment thread tests/table/test_upsert.py Outdated
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Fokko Fokko merged commit 4e9c66d into apache:main Feb 21, 2025
@Fokko

Fokko commented Feb 21, 2025

Copy link
Copy Markdown
Contributor

Thanks @omkenge for fixing this! Thanks @corleyma and @kevinjqliu for the review, I'll be merging this now since I'm also blocked by it :)

@omkenge omkenge deleted the upsert_issue branch February 22, 2025 18:22
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
**Old Code Behavior:**

Even if there's only one such condition, the code wraps it in an Or()
operator.
But Or() is meant to combine two or more conditions (like “condition A
OR condition B”).
If you give it only one condition, it complains because it expects a
second condition.

**New Code Behavior:**

The new change checks how many conditions you have.
If there's only one condition, it simply returns that condition.
If there are more than one, it uses Or() to combine them.

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants