forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 18
Fix Iceberg read optimization returning NULLs for stats-less manifests #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
arthurpassos
merged 2 commits into
antalya-26.3
from
fix/antalya-26.3/stateless-manifests
Jul 1, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 3 additions & 0 deletions
3
tests/queries/0_stateless/04302_iceberg_read_optimization_no_column_stats.reference
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| 1 alice 1.5 | ||
| 2 bob 2.5 | ||
| 3 carol 3.5 |
7 changes: 7 additions & 0 deletions
7
tests/queries/0_stateless/04302_iceberg_read_optimization_no_column_stats.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| -- Tags: no-fasttest | ||
| -- Tag no-fasttest: Depends on S3/MinIO | ||
|
|
||
| -- A stats-less Iceberg manifest (no per-column statistics) must return real | ||
| -- values, not NULLs: empty stats were misread as "all columns absent" (#1545). | ||
| SELECT * FROM icebergS3(s3_conn, filename = 'iceberg_no_column_stats') ORDER BY ALL | ||
| SETTINGS allow_experimental_iceberg_read_optimization = 1; |
29 changes: 29 additions & 0 deletions
29
tests/queries/0_stateless/data_minio/iceberg_no_column_stats/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| ## How this data is generated? | ||
|
|
||
| A tiny Iceberg v2 table with three nullable columns and three rows: | ||
|
|
||
| | id | name | value | | ||
| |----|-------|-------| | ||
| | 1 | alice | 1.5 | | ||
| | 2 | bob | 2.5 | | ||
| | 3 | carol | 3.5 | | ||
|
|
||
| The point of this fixture is that its manifest is **stats-less**: the `data_file` | ||
| entry carries no per-column statistics at all (`column_sizes`, `value_counts`, | ||
| `null_value_counts`, `lower_bounds`, `upper_bounds` are all empty). Such manifests | ||
| are produced by writers that do not collect metrics. They left ClickHouse's | ||
| `DataFileMetaInfo::columns_info` empty, which the Iceberg read optimization used | ||
| to misread as "every column is absent", returning `NULL` for all of them. | ||
|
|
||
| It is generated by `generate.py` (needs `pyiceberg`, `pyarrow`, `fastavro`): | ||
|
|
||
| ```bash | ||
| python3 generate.py <output_dir> | ||
| ``` | ||
|
|
||
| The script creates the table via `pyiceberg` with | ||
| `write.metadata.metrics.default = none`, then post-processes the manifest avro to | ||
| drop every per-column statistic and rewrites all internal paths to the stable | ||
| `s3a://test/iceberg_no_column_stats` prefix used by the test bucket. | ||
|
|
||
| Used by `tests/queries/0_stateless/04302_iceberg_read_optimization_no_column_stats.sql`. |
Binary file added
BIN
+1.31 KB
...a_minio/iceberg_no_column_stats/data/00000-0-39d4e713-69c8-49f9-ab8e-f887af4bcecb.parquet
Binary file not shown.
145 changes: 145 additions & 0 deletions
145
tests/queries/0_stateless/data_minio/iceberg_no_column_stats/generate.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,145 @@ | ||
| #!/usr/bin/env python3 | ||
| """Generate the stats-less Iceberg fixture for 04302_iceberg_read_optimization_no_column_stats. | ||
|
|
||
| Creates a 3-row table, strips every per-column statistic from the manifest so | ||
| ClickHouse's `DataFileMetaInfo::columns_info` is empty, and rewrites internal | ||
| paths to a stable `s3a://test/<name>` prefix. See README.md. Usage: generate.py <out_dir>. | ||
| """ | ||
| import json | ||
| import shutil | ||
| import sys | ||
| import tempfile | ||
| from pathlib import Path | ||
|
|
||
| import fastavro | ||
| import pyarrow as pa | ||
| from pyiceberg.catalog.sql import SqlCatalog | ||
| from pyiceberg.schema import Schema | ||
| from pyiceberg.types import NestedField, LongType, StringType, DoubleType | ||
|
|
||
| AVRO_RESERVED = {"avro.schema", "avro.codec"} | ||
|
|
||
| # Per-column statistics on a manifest data_file entry (all optional); clearing | ||
| # them all is what makes the manifest stats-less. | ||
| STAT_FIELDS = ( | ||
| "column_sizes", | ||
| "value_counts", | ||
| "null_value_counts", | ||
| "nan_value_counts", | ||
| "lower_bounds", | ||
| "upper_bounds", | ||
| ) | ||
|
|
||
|
|
||
| def deep_replace(obj, old, new): | ||
| if isinstance(obj, str): | ||
| return obj.replace(old, new) | ||
| if isinstance(obj, dict): | ||
| return {k: deep_replace(v, old, new) for k, v in obj.items()} | ||
| if isinstance(obj, list): | ||
| return [deep_replace(v, old, new) for v in obj] | ||
| return obj | ||
|
|
||
|
|
||
| def clear_stats(record): | ||
| df = record.get("data_file") | ||
| if isinstance(df, dict): | ||
| for field in STAT_FIELDS: | ||
| if field in df and df[field]: | ||
| df[field] = [] | ||
| return record | ||
|
|
||
|
|
||
| def rewrite_avro(src: Path, dst: Path, old: str, new: str, strip_stats: bool): | ||
| with open(src, "rb") as f: | ||
| reader = fastavro.reader(f) | ||
| schema = reader.writer_schema | ||
| meta = {k: v for k, v in reader.metadata.items() if k not in AVRO_RESERVED} | ||
| records = [deep_replace(r, old, new) for r in reader] | ||
| if strip_stats: | ||
| records = [clear_stats(r) for r in records] | ||
| with open(dst, "wb") as f: | ||
| fastavro.writer(f, schema, records, metadata=meta) | ||
|
|
||
|
|
||
| def main(out_dir: str): | ||
| work = Path(tempfile.mkdtemp(prefix="iceberg_gen_")) | ||
| warehouse = work / "warehouse" | ||
| warehouse.mkdir(parents=True) | ||
|
|
||
| catalog = SqlCatalog( | ||
| "gen", | ||
| uri=f"sqlite:///{work}/catalog.db", | ||
| warehouse=f"file://{warehouse}", | ||
| ) | ||
| catalog.create_namespace("ns") | ||
|
|
||
| schema = Schema( | ||
| NestedField(1, "id", LongType(), required=False), | ||
| NestedField(2, "name", StringType(), required=False), | ||
| NestedField(3, "value", DoubleType(), required=False), | ||
| ) | ||
|
|
||
| # Reduces metrics, but pyiceberg still writes column_sizes (stripped below). | ||
| table = catalog.create_table( | ||
| "ns.no_stats", | ||
| schema=schema, | ||
| properties={"write.metadata.metrics.default": "none"}, | ||
| ) | ||
|
|
||
| data = pa.table( | ||
| { | ||
| "id": pa.array([1, 2, 3], type=pa.int64()), | ||
| "name": pa.array(["alice", "bob", "carol"], type=pa.string()), | ||
| "value": pa.array([1.5, 2.5, 3.5], type=pa.float64()), | ||
| } | ||
| ) | ||
| table.append(data) | ||
|
|
||
| table_location = Path(table.location().replace("file://", "")) | ||
| old_prefix = table.location() # file:///tmp/.../ns.db/no_stats | ||
| out = Path(out_dir) | ||
| new_prefix = f"s3a://test/{out.name}" # s3a://test/iceberg_no_column_stats | ||
|
|
||
| if out.exists(): | ||
| shutil.rmtree(out) | ||
| (out / "metadata").mkdir(parents=True) | ||
| (out / "data").mkdir(parents=True) | ||
|
|
||
| for f in (table_location / "data").rglob("*"): | ||
| if f.is_file(): | ||
| rel = f.relative_to(table_location / "data") | ||
| target = out / "data" / rel | ||
| target.parent.mkdir(parents=True, exist_ok=True) | ||
| shutil.copy2(f, target) | ||
|
|
||
| meta_dir = table_location / "metadata" | ||
|
|
||
| # Keep only the latest metadata.json (the post-append snapshot). The empty | ||
| # create-time version and the history logs that reference it aren't read. | ||
| latest_json = max( | ||
| (f for f in meta_dir.iterdir() if f.name.endswith(".metadata.json")), | ||
| key=lambda f: int(f.name.split("-", 1)[0]), | ||
| ) | ||
| meta = json.loads(latest_json.read_text().replace(old_prefix, new_prefix)) | ||
| meta["metadata-log"] = [] | ||
| meta["snapshot-log"] = [] | ||
| (out / "metadata" / latest_json.name).write_text(json.dumps(meta, separators=(",", ":"))) | ||
|
|
||
| for f in meta_dir.iterdir(): | ||
| if f.name.endswith(".avro"): | ||
| # Only manifests carry data_file stats; the manifest list (snap-*) does not. | ||
| strip = not f.name.startswith("snap-") | ||
| rewrite_avro(f, out / "metadata" / f.name, old_prefix, new_prefix, strip) | ||
|
|
||
| print(f"old prefix: {old_prefix}") | ||
| print(f"new prefix: {new_prefix}") | ||
| print(f"table uuid: {table.metadata.table_uuid}") | ||
| print(f"copied to: {out}") | ||
| shutil.rmtree(work) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| if len(sys.argv) != 2: | ||
| sys.exit(f"usage: {sys.argv[0]} <output_dir>") | ||
| main(sys.argv[1]) |
1 change: 1 addition & 0 deletions
1
...iceberg_no_column_stats/metadata/00001-1347065c-1de2-40e5-8774-255dfdff698c.metadata.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| {"location":"s3a://test/iceberg_no_column_stats","table-uuid":"d0e63068-996d-41ee-9b7d-31f1fb17f1b1","last-updated-ms":1782851862557,"last-column-id":3,"schemas":[{"type":"struct","fields":[{"id":1,"name":"id","type":"long","required":false},{"id":2,"name":"name","type":"string","required":false},{"id":3,"name":"value","type":"double","required":false}],"schema-id":0,"identifier-field-ids":[]}],"current-schema-id":0,"partition-specs":[{"spec-id":0,"fields":[]}],"default-spec-id":0,"last-partition-id":999,"properties":{"write.metadata.metrics.default":"none"},"current-snapshot-id":7564025723254944482,"snapshots":[{"snapshot-id":7564025723254944482,"sequence-number":1,"timestamp-ms":1782851862557,"manifest-list":"s3a://test/iceberg_no_column_stats/metadata/snap-7564025723254944482-0-39d4e713-69c8-49f9-ab8e-f887af4bcecb.avro","summary":{"operation":"append","added-files-size":"1346","added-data-files":"1","added-records":"3","total-data-files":"1","total-delete-files":"0","total-records":"3","total-files-size":"1346","total-position-deletes":"0","total-equality-deletes":"0"},"schema-id":0}],"snapshot-log":[],"metadata-log":[],"sort-orders":[{"order-id":0,"fields":[]}],"default-sort-order-id":0,"refs":{"main":{"snapshot-id":7564025723254944482,"type":"branch"}},"statistics":[],"partition-statistics":[],"format-version":2,"last-sequence-number":1} |
Binary file added
BIN
+4.2 KB
.../data_minio/iceberg_no_column_stats/metadata/39d4e713-69c8-49f9-ab8e-f887af4bcecb-m0.avro
Binary file not shown.
Binary file added
BIN
+1.73 KB
...olumn_stats/metadata/snap-7564025723254944482-0-39d4e713-69c8-49f9-ab8e-f887af4bcecb.avro
Binary file not shown.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small question on dropping the has_value() check.
My version guarded the loop with file_meta_data.has_value() before calling .value(). This one calls file_meta_data.value()->columns_info.empty() directly, which assumes file_meta_data is always set at this point.
Is this guaranteed on every path that reaches this loop? I couldn't fully convince myself it can't be empty here. Ff there's a case where it is, .value() would throw instead of just skipping the block.
Just want to make sure an empty optional can't slip through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a guard up the stream already:
ClickHouse/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
Line 839 in 1eccfc1