Search before asking
Motivation
Paimon Java already supports the VARIANT type, but pypaimon has no equivalent implementation. This PR adds VARIANT read/write support to pypaimon, enabling Python-based compute engines to integrate it. For example, Daft is planning native VARIANT support at the engine level.
The ideal long-term solution is to wait for PyArrow's official Variant support (apache/arrow#45937), which would guarantee both correctness and performance from the upstream ecosystem. We should continue tracking that progress.
That said, a Python-based implementation now is a reasonable short-term step: users and compute engines can begin integrating VARIANT support early, and the underlying implementation can later be swapped for PyArrow's native one without breaking existing users.
Moreover, some aspects of VARIANT integration are the responsibility of PyPaimon itself and cannot be delegated to PyArrow — for example, shredded column pruning and shredded column predicate pushdown. These will remain part of pypaimon regardless of upstream changes.
Solution
This work is broken into three parts:
- VARIANT read/write support
- Shredded column pruning
- Shredded column predicate pushdown
Anything else?
No response
Are you willing to submit a PR?
Search before asking
Motivation
Paimon Java already supports the VARIANT type, but pypaimon has no equivalent implementation. This PR adds VARIANT read/write support to pypaimon, enabling Python-based compute engines to integrate it. For example, Daft is planning native VARIANT support at the engine level.
The ideal long-term solution is to wait for PyArrow's official Variant support (apache/arrow#45937), which would guarantee both correctness and performance from the upstream ecosystem. We should continue tracking that progress.
That said, a Python-based implementation now is a reasonable short-term step: users and compute engines can begin integrating VARIANT support early, and the underlying implementation can later be swapped for PyArrow's native one without breaking existing users.
Moreover, some aspects of VARIANT integration are the responsibility of PyPaimon itself and cannot be delegated to PyArrow — for example, shredded column pruning and shredded column predicate pushdown. These will remain part of pypaimon regardless of upstream changes.
Solution
This work is broken into three parts:
Anything else?
No response
Are you willing to submit a PR?