diff --git a/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc b/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc index 898c1ddf80..8de7b797f0 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc @@ -333,6 +333,10 @@ Your query results should look like the following: +-----------------------------------------------------+----------------+ ---- +=== Manage access for query engine users + +Redpanda manages the permissions between Redpanda and the AWS Glue Data Catalog. To grant your end users and query engines (such as Amazon Athena or Apache Spark) read access to the Iceberg tables, use https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html[AWS Lake Formation^] to assign table-level and column-level permissions. + include::shared:partial$suggested-reading.adoc[] - xref:manage:iceberg/query-iceberg-topics.adoc[] diff --git a/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc b/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc index 5aec5f34e8..844f4b1f90 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc @@ -238,7 +238,11 @@ You should see the topic as a table with data in Unity Catalog. The data may tak == Query Iceberg table using Databricks SQL -You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table. Review the Databricks documentation on https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore[granting permissions to objects^] and https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges[Unity Catalog privileges^] for details. +You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table. + +=== Manage access for query engine users + +Redpanda manages the permissions between Redpanda and Unity Catalog. To grant your end users or query engines read access to the Iceberg tables, use Unity Catalog to assign the appropriate privileges. Review the Databricks documentation on https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore[granting permissions to objects^] and https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges[Unity Catalog privileges^] for details. The following example shows how to query the Iceberg table using SQL in Databricks SQL. diff --git a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc index 553d37b820..970715b3b1 100644 --- a/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc +++ b/modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc @@ -364,6 +364,10 @@ gcloud iam service-accounts delete @$(gcloud config get-va NOTE: Manually delete the BigLake catalog using the https://docs.cloud.google.com/bigquery/docs/reference/biglake/rest/v1/projects.locations.catalogs/delete[REST API^]. +=== Manage access for query engine users + +Redpanda manages the permissions between Redpanda and the BigLake catalog. To grant your end users and query engines read access to the Iceberg tables in BigQuery, see https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions[Grant permissions for BigLake tables^] in the Google Cloud documentation. + include::shared:partial$suggested-reading.adoc[] - xref:manage:iceberg/use-iceberg-catalogs.adoc[] diff --git a/modules/manage/pages/iceberg/query-iceberg-topics.adoc b/modules/manage/pages/iceberg/query-iceberg-topics.adoc index c7915c5309..0521c91047 100644 --- a/modules/manage/pages/iceberg/query-iceberg-topics.adoc +++ b/modules/manage/pages/iceberg/query-iceberg-topics.adoc @@ -3,6 +3,10 @@ :page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration // tag::single-source[] +:page-topic-type: how-to +:personas: data_engineer +:learning-objective-1: Query Redpanda topic data from an Iceberg-compatible engine +:learning-objective-2: Grant end-user and query engine access to Iceberg data ifndef::env-cloud[] [NOTE] @@ -13,6 +17,11 @@ endif::[] When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic xref:manage:iceberg/choose-iceberg-mode.adoc[Iceberg mode] and whether you've registered a schema for the topic in the xref:manage:schema-reg/schema-reg-overview.adoc[Redpanda Schema Registry]. You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda. +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} + == Access Iceberg tables ifndef::env-cloud[] @@ -53,6 +62,31 @@ EOF endif::[] +=== Grant access to query engine users + +Redpanda manages the service-to-service permissions between Redpanda and the catalog (see xref:manage:iceberg/use-iceberg-catalogs.adoc[]). However, you are responsible for granting your end users and query engines (such as Amazon Athena, Apache Spark, Trino, or Snowflake) read access to the Iceberg data. + +Use either or both of the following approaches to control access: + +==== Cloud storage prefix-level access + +Grant query engine roles or users read access to the Iceberg data prefix in the cluster's storage bucket. This controls who can read the underlying data and metadata files. Scope permissions to specific prefixes to restrict access to individual tables. + +* AWS (S3): Use IAM policies to grant `s3:GetObject` and `s3:ListBucket` on the Iceberg prefix (for example, `/redpanda-iceberg-catalog/*`). See https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-iam-policies.html[Using IAM policies with Amazon S3^]. +* GCP (GCS): Use IAM conditions or bucket-level policies to grant `storage.objects.get` and `storage.objects.list` on the Iceberg prefix. See https://cloud.google.com/storage/docs/access-control/iam[GCS IAM permissions^]. +* Azure (Blob Storage): Use Azure RBAC roles such as Storage Blob Data Reader scoped to the container or prefix. See https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory[Authorize access to blob data^]. + +==== Catalog-level table access + +If you use a REST catalog, you can control access at the table level through the catalog's own access control layer. Use this approach when query engines access tables through the catalog rather than reading files directly. + +* AWS Glue: Use https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html[AWS Lake Formation^] to grant table-level and column-level permissions. +* Databricks Unity Catalog: See the https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html[Unity Catalog privileges documentation^]. +* Snowflake Open Catalog: See https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog access control^]. +* GCP BigLake: See https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions[BigLake table permissions^]. + +=== Refresh table data + Some query engines may require you to manually refresh the Iceberg table snapshot (for example, by running a command like `ALTER TABLE REFRESH;`) to see the latest data. If your engine needs the full JSON metadata path, use the following: diff --git a/modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc b/modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc index ce53d8b633..121af1df19 100644 --- a/modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc +++ b/modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc @@ -25,15 +25,84 @@ endif::[] * An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account^], you require ORGADMIN access in Snowflake. * An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage. + -Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog^] with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS: +Follow the https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[Open Catalog documentation^] to create a catalog with the S3 bucket configured as external storage. For the required IAM permissions, see <>. + -. If you don't already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket. -. Create an IAM role and attach the IAM policy to the role. -. After creating a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket. +NOTE: Your Open Catalog account must be in the same AWS region as your S3 bucket. + -* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume^] set up using the Tiered Storage bucket. +* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume^] set up using the Tiered Storage bucket. + -Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3^]. You can use the same IAM policy as the catalog for the external volume's IAM role and user. +Follow the https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[Snowflake documentation^] to configure the external volume with S3. You can use the same IAM policy and role as the catalog. + +[[authorize-access-to-open-catalog]] +== Authorize access to Open Catalog + +You must create an AWS IAM policy and role that grants Open Catalog read and write access to the S3 bucket where your Iceberg data is stored. Redpanda writes Iceberg data and metadata files to the bucket using your cluster's existing object storage credentials, so no additional IAM configuration is needed for Redpanda's own S3 access. + +=== Create an IAM policy + +Create an IAM policy with the following S3 permissions, scoped to your cluster's storage bucket: + +[,json] +---- +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:GetObject", + "s3:GetObjectVersion", + "s3:DeleteObject", + "s3:DeleteObjectVersion" + ], + "Resource": "arn:aws:s3:::/*" + }, + { + "Effect": "Allow", + "Action": [ + "s3:ListBucket", + "s3:GetBucketLocation" + ], + "Resource": "arn:aws:s3:::" + } + ] +} +---- + +Replace `` with the name of your cluster's object storage bucket. You can use the same IAM policy for both the catalog and the Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[external volume^]. + +=== Create an IAM role and configure the trust policy + +Create an IAM role and attach the IAM policy you created. To configure the trust relationship, you need the IAM user ARN and external ID provided by Open Catalog: + +. In Open Catalog, navigate to your catalog. +. Under *Configuration*, find the *IAM user ARN* and *External ID*. + +Use these values in the trust policy for the IAM role: + +[,json] +---- +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "AWS": "" + }, + "Action": "sts:AssumeRole", + "Condition": { + "StringEquals": { + "sts:ExternalId": "" + } + } + } + ] +} +---- + +After creating the IAM role, provide the role ARN to Open Catalog to complete the catalog configuration. == Set up catalog integration using Open Catalog @@ -271,4 +340,12 @@ Your query results should look like the following: ---- +=== Manage access for query engine users + +Redpanda manages the permissions between Redpanda and Open Catalog. To grant your Snowflake users or other query engines read access to the Iceberg tables, use https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog access control^] to assign catalog privileges. For example, you can grant `TABLE_READ_DATA` to a read-only role rather than the `CATALOG_MANAGE_CONTENT` privilege used by the Redpanda service principal. + +include::shared:partial$suggested-reading.adoc[] + +- xref:manage:iceberg/query-iceberg-topics.adoc[] + // end::single-source[] \ No newline at end of file