Skip to content

Conversation

@mkmkme
Copy link
Collaborator

@mkmkme mkmkme commented Jan 16, 2026

Google cloud storage support for data lakes catalogs (fixes #1199)

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Google cloud storage support for data lakes (ClickHouse#93866 by @scanhex12)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

Google cloud storage support for data lakes catalogs
if (capitalize_first_letter(storage_type_str) == "S3a")
else if (capitalize_first_letter(storage_type_str) == "S3a" || storage_type_str == "oss" || storage_type_str == "gs")
{
fiu_do_on(DB::FailPoints::database_iceberg_gcs,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular bit I don't really like. It's there only for the sake of the integration test. But I've left it intact to be more aligned with the upstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better & easier to keep it consistent 👍

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

Workflow [PR], commit [f920a81]

Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarlosFelipeOR
Copy link
Collaborator

QA Verification

This PR was verified manually using a real Iceberg setup backed by Google Cloud Storage (GCS) and a local Nessie REST catalog.

The verification followed the reproduction steps documented in issue #1199.


Test environment

  • ClickHouse binary (not Docker)
  • Nessie REST catalog running locally
  • Iceberg tables stored on GCS
  • Tested with:
    • Private GCS bucket (with credentials)
    • Public GCS bucket (no credentials)

Validation steps

Before running the ClickHouse queries:

  • GCS access was configured locally using a GCP Service Account (GOOGLE_APPLICATION_CREDENTIALS pointing to the QA service account JSON)
  • The Nessie REST catalog was started via Docker and exposed at http://localhost:19120/iceberg, with the warehouse explicitly configured as:

gs://altinity-qa-test/carlos_test/

This ensures the gs:// storage path is exercised.

  • The Iceberg table was populated by running the write_to_iceberg.py script provided in the issue, which creates the namespace and table via Nessie and writes both metadata and data files to GCS

ClickHouse validation

CREATE DATABASE demo_local
ENGINE = DataLakeCatalog('http://localhost:19120/iceberg')
SETTINGS
  catalog_type = 'rest',
  storage_endpoint = 'https://storage.googleapis.com',
  warehouse = 'warehouse';
USE demo_local;
SHOW TABLES;

Result:

┌─name───────────────────────┐
│ local_dataset.taxi_dataset │
└────────────────────────────┘
SELECT count() FROM `local_dataset.taxi_dataset`;

Result:

┌─count()─┐
│ 3475226 │
└─────────┘

Result

  • Iceberg tables created via Nessie are correctly discovered by ClickHouse
  • Metadata and data stored on GCS are read successfully
  • The gs:// warehouse path is handled correctly
  • Data is accessible and queryable
  • Verified with both private and public GCS buckets

✅ Confirmed working when using Iceberg tables stored on GCS and accessed via the Nessie REST catalog.

@CarlosFelipeOR CarlosFelipeOR added the verified Verified by QA label Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants