Skip to content

Conversation

@geruh
Copy link
Contributor

@geruh geruh commented Dec 25, 2025

related to #2775

Rationale for this change

Adds synchornous client-side support for REST server side scan planning, allowing for scanning if the rest catalog supports it.

This PR cherry-picks and builds on two WIP PRs:

Currently scanning is enable with rest-scan-planning-enabled=true in catalog properties.

TODO: spec handling

Are these changes tested?

Integration tests added with manual testing

Are there any user-facing changes?

yes

@geruh geruh changed the title Scan wip feat: Add support for rest scan planning Dec 25, 2025
@geruh geruh marked this pull request as ready for review January 1, 2026 22:41
return FileScanTask(
data_file=data_file,
delete_files=resolved_deletes,
residual=rest_task.residual_filter if rest_task.residual_filter else ALWAYS_TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are residual filters bounded in the fs task ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review @singhpk234! The residual filters from REST are not bound in the normal sense. Currently the residual is only used for the optimize check in count().

The actual row filtering still uses the full row_filter, not the residual. This works correctly but is slightly inefficient.

Returns:
PlanningResponse the result of the scan plan request representing the status
Raises:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Raises:
Raises:

Comment on lines +2054 to +2060
def _should_use_rest_planning(self) -> bool:
"""Check if REST scan planning should be used for this scan."""
from pyiceberg.catalog.rest import RestCatalog

if not isinstance(self.catalog, RestCatalog):
return False
return self.catalog.is_rest_scan_planning_enabled()
Copy link
Contributor

@Fokko Fokko Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be inclined to create a method on the Catalog, eg:

    @property
    @abstractmethod
    def use_server_side_planning(self, identifier: str | Identifier) -> bool:
        """Support for Server Side Planning"""

Have the MetastoreCatalog implement it, and return False. And rename is_rest_scan_planning_enabled to support_server_side_planning. Now we have to go though multiple jumps.

This would also clean up _plan_files_rest below.

Comment on lines +31 to +36
# REST content-type to DataFileContent
CONTENT_TYPE_MAP: dict[str, DataFileContent] = {
"data": DataFileContent.DATA,
"position-deletes": DataFileContent.POSITION_DELETES,
"equality-deletes": DataFileContent.EQUALITY_DELETES,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to a static method on DataFileContent?

return False
return self.catalog.is_rest_scan_planning_enabled()

def _plan_files_rest(self) -> Iterable[FileScanTask]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, To keep the same language:

Suggested change
def _plan_files_rest(self) -> Iterable[FileScanTask]:
def _plan_files_server_side(self) -> Iterable[FileScanTask]:

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, would be good to get those cleaned up. Apart from that, this looks great to me! Thanks @geruh for working on this, very exciting to see this being added 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants