[flink] Expose scan.bucket for single-bucket manifest pruning by wwj6591812 · Pull Request #8117 · apache/paimon

wwj6591812 · 2026-06-04T07:39:30Z

Background

ReadBuilder.withBucket(int) and manifest scanning already support reading a single bucket, but Flink SQL had no connector option to expose it. Operators often need to debug or scan one bucket of a fixed-bucket primary-key table without reading all buckets.

Why this PR

Expose scan.bucket in Flink so users can run:

SELECT * FROM t /*+ OPTIONS('scan.bucket' = '0') */

and plan splits only for that bucket.

What changes

Add FlinkConnectorOptions.SCAN_BUCKET (scan.bucket).
ScanBucketUtils.applyScanBucket() reads the option and calls ReadBuilder.withBucket().
Wire into FlinkSourceBuilder and FlinkTableSource (batch and split inference).
Validate in ReadBuilderImpl.withBucket() (canonical read path): non-negative bucket id, FileStoreTable only, not postpone-bucket mode, bucket < table.bucket when table bucket > 0.

Stage optimized: scan / manifest planning — fewer manifest entries and splits before read. No change to merge or per-record logic.

Tests

ScanBucketUtilsTest — invalid bucket id fails fast.
ScanBucketITCase — SQL with scan.bucket matches reading that bucket via the table API.

Test plan

mvn test -pl paimon-flink/paimon-flink-common -am -Dtest=ScanBucketUtilsTest,ScanBucketITCase

wwj6591812 · 2026-06-04T11:55:38Z

The failed test is not related to my modifications.

JingsongLi · 2026-06-06T08:32:45Z

The validation here still allows scan.bucket on non-fixed-bucket tables.

validateSpecifiedBucket rejects postpone bucket tables, but it does not require a fixed bucket mode or bucket > 0. For bucket-unaware/dynamic bucket tables, CoreOptions.bucket() can be <= 0, so the upper-bound check is skipped and the scan proceeds with physical bucket pruning. That can turn an invalid configuration into an empty/incorrect result instead of a clear error. The generated doc says this option is only supported for fixed-bucket primary-key tables (bucket > 0).

Could we enforce that here, e.g. require the fixed bucket mode and configured bucket count > 0, and also check primary-key-ness if the option is intended only for primary-key tables?

[flink] Expose scan.bucket for single-bucket manifest pruning

7e8c5d8

wwj6591812 force-pushed the add_scan_bucket_0604 branch from 17c2722 to 7e8c5d8 Compare June 4, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flink] Expose scan.bucket for single-bucket manifest pruning#8117

[flink] Expose scan.bucket for single-bucket manifest pruning#8117
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:add_scan_bucket_0604

wwj6591812 commented Jun 4, 2026

Uh oh!

wwj6591812 commented Jun 4, 2026

Uh oh!

JingsongLi commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wwj6591812 commented Jun 4, 2026

Background

Why this PR

What changes

Tests

Test plan

Uh oh!

wwj6591812 commented Jun 4, 2026

Uh oh!

JingsongLi commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants