Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions modules/manage/pages/iceberg/iceberg-topics-aws-glue.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,10 @@ Your query results should look like the following:
+-----------------------------------------------------+----------------+
----

=== Manage access for query engine users

Redpanda manages the permissions between Redpanda and the AWS Glue Data Catalog. To grant your end users and query engines (such as Amazon Athena or Apache Spark) read access to the Iceberg tables, use https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html[AWS Lake Formation^] to assign table-level and column-level permissions.

include::shared:partial$suggested-reading.adoc[]

- xref:manage:iceberg/query-iceberg-topics.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,11 @@ You should see the topic as a table with data in Unity Catalog. The data may tak

== Query Iceberg table using Databricks SQL

You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table. Review the Databricks documentation on https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore[granting permissions to objects^] and https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges[Unity Catalog privileges^] for details.
You can query the Iceberg table using different engines, such as Databricks SQL, PyIceberg, or Apache Spark. To query the table or view the table data in Catalog Explorer, ensure that your account has the necessary permissions to read the table.

=== Manage access for query engine users

Redpanda manages the permissions between Redpanda and Unity Catalog. To grant your end users or query engines read access to the Iceberg tables, use Unity Catalog to assign the appropriate privileges. Review the Databricks documentation on https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/?language=SQL#grant-permissions-on-objects-in-a-unity-catalog-metastore[granting permissions to objects^] and https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/privileges[Unity Catalog privileges^] for details.

The following example shows how to query the Iceberg table using SQL in Databricks SQL.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,10 @@ gcloud iam service-accounts delete <service-account-name>@$(gcloud config get-va

NOTE: Manually delete the BigLake catalog using the https://docs.cloud.google.com/bigquery/docs/reference/biglake/rest/v1/projects.locations.catalogs/delete[REST API^].

=== Manage access for query engine users

Redpanda manages the permissions between Redpanda and the BigLake catalog. To grant your end users and query engines read access to the Iceberg tables in BigQuery, see https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions[Grant permissions for BigLake tables^] in the Google Cloud documentation.

include::shared:partial$suggested-reading.adoc[]

- xref:manage:iceberg/use-iceberg-catalogs.adoc[]
Expand Down
34 changes: 34 additions & 0 deletions modules/manage/pages/iceberg/query-iceberg-topics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration

// tag::single-source[]
:page-topic-type: how-to
:personas: data_engineer
:learning-objective-1: Query Redpanda topic data from an Iceberg-compatible engine
:learning-objective-2: Grant end-user and query engine access to Iceberg data

ifndef::env-cloud[]
[NOTE]
Expand All @@ -13,6 +17,11 @@ endif::[]

When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic xref:manage:iceberg/choose-iceberg-mode.adoc[Iceberg mode] and whether you've registered a schema for the topic in the xref:manage:schema-reg/schema-reg-overview.adoc[Redpanda Schema Registry]. You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

== Access Iceberg tables

ifndef::env-cloud[]
Expand Down Expand Up @@ -53,6 +62,31 @@ EOF

endif::[]

=== Grant access to query engine users

Redpanda manages the service-to-service permissions between Redpanda and the catalog (see xref:manage:iceberg/use-iceberg-catalogs.adoc[]). However, you are responsible for granting your end users and query engines (such as Amazon Athena, Apache Spark, Trino, or Snowflake) read access to the Iceberg data.

Use either or both of the following approaches to control access:

==== Cloud storage prefix-level access

Grant query engine roles or users read access to the Iceberg data prefix in the cluster's storage bucket. This controls who can read the underlying data and metadata files. Scope permissions to specific prefixes to restrict access to individual tables.

* AWS (S3): Use IAM policies to grant `s3:GetObject` and `s3:ListBucket` on the Iceberg prefix (for example, `<cluster-storage-bucket-name>/redpanda-iceberg-catalog/*`). See https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-iam-policies.html[Using IAM policies with Amazon S3^].
* GCP (GCS): Use IAM conditions or bucket-level policies to grant `storage.objects.get` and `storage.objects.list` on the Iceberg prefix. See https://cloud.google.com/storage/docs/access-control/iam[GCS IAM permissions^].
* Azure (Blob Storage): Use Azure RBAC roles such as Storage Blob Data Reader scoped to the container or prefix. See https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory[Authorize access to blob data^].

==== Catalog-level table access

If you use a REST catalog, you can control access at the table level through the catalog's own access control layer. Use this approach when query engines access tables through the catalog rather than reading files directly.

* AWS Glue: Use https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html[AWS Lake Formation^] to grant table-level and column-level permissions.
* Databricks Unity Catalog: See the https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html[Unity Catalog privileges documentation^].
* Snowflake Open Catalog: See https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog access control^].
* GCP BigLake: See https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions[BigLake table permissions^].
Comment on lines +83 to +86
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

`Are the following catalog provider documentation links valid and accessible?

  1. AWS Lake Formation: https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html
  2. Databricks Unity Catalog privileges: https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html
  3. Snowflake Open Catalog access control: https://other-docs.snowflake.com/en/opencatalog/access-control
  4. GCP BigLake permissions: https://cloud.google.com/bigquery/docs/manage-open-source-metadata#grant_permissions`

💡 Result:

  1. AWS Lake Formation: Valid and accessible. 2. Databricks Unity Catalog privileges: Valid and accessible (note: exact path is /aws/en/data-governance/unity-catalog/manage-privileges/, but content matches). 3. Snowflake Open Catalog access control: Valid and accessible. 4. GCP BigLake permissions: Valid and accessible (note: full page is https://cloud.google.com/bigquery/docs/manage-open-source-metadata, which covers permissions including grant_permissions section).

Citations:


Fix incorrect Snowflake Open Catalog documentation link.

The Snowflake Open Catalog documentation link uses the incorrect domain other-docs.snowflake.com. The correct URL is: https://docs.snowflake.com/en/user-guide/opencatalog/access-control

The other three catalog documentation links (AWS Lake Formation, Databricks Unity Catalog, and GCP BigLake) are valid and accessible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/manage/pages/iceberg/query-iceberg-topics.adoc` around lines 91 - 94,
Replace the incorrect Snowflake Open Catalog link used in the "Snowflake Open
Catalog: See https://other-docs.snowflake.com/en/opencatalog/access-control[Open
Catalog access control^]." bullet by updating the URL to the correct one
(https://docs.snowflake.com/en/user-guide/opencatalog/access-control) so the
"Snowflake Open Catalog" link points to the proper documentation; locate the
string "Snowflake Open Catalog" or the existing incorrect URL and substitute it
with the corrected URL.


=== Refresh table data

Some query engines may require you to manually refresh the Iceberg table snapshot (for example, by running a command like `ALTER TABLE <table-name> REFRESH;`) to see the latest data.

If your engine needs the full JSON metadata path, use the following:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,84 @@ endif::[]
* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account^], you require ORGADMIN access in Snowflake.
* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage.
+
Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog^] with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS:
Follow the https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[Open Catalog documentation^] to create a catalog with the S3 bucket configured as external storage. For the required IAM permissions, see <<authorize-access-to-open-catalog>>.
+
. If you don't already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket.
. Create an IAM role and attach the IAM policy to the role.
. After creating a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket.
NOTE: Your Open Catalog account must be in the same AWS region as your S3 bucket.
+
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume^] set up using the Tiered Storage bucket.
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume^] set up using the Tiered Storage bucket.
+
Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3^]. You can use the same IAM policy as the catalog for the external volume's IAM role and user.
Follow the https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[Snowflake documentation^] to configure the external volume with S3. You can use the same IAM policy and role as the catalog.

[[authorize-access-to-open-catalog]]
== Authorize access to Open Catalog

You must create an AWS IAM policy and role that grants Open Catalog read and write access to the S3 bucket where your Iceberg data is stored. Redpanda writes Iceberg data and metadata files to the bucket using your cluster's existing object storage credentials, so no additional IAM configuration is needed for Redpanda's own S3 access.

=== Create an IAM policy

Create an IAM policy with the following S3 permissions, scoped to your cluster's storage bucket:

[,json]
----
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket-name>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket-name>"
}
]
}
----

Replace `<bucket-name>` with the name of your cluster's object storage bucket. You can use the same IAM policy for both the catalog and the Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[external volume^].

=== Create an IAM role and configure the trust policy

Create an IAM role and attach the IAM policy you created. To configure the trust relationship, you need the IAM user ARN and external ID provided by Open Catalog:

. In Open Catalog, navigate to your catalog.
. Under *Configuration*, find the *IAM user ARN* and *External ID*.

Use these values in the trust policy for the IAM role:

[,json]
----
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<open-catalog-iam-user-arn>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<open-catalog-external-id>"
}
}
}
]
}
----

After creating the IAM role, provide the role ARN to Open Catalog to complete the catalog configuration.

== Set up catalog integration using Open Catalog

Expand Down Expand Up @@ -271,4 +340,12 @@ Your query results should look like the following:

----

=== Manage access for query engine users

Redpanda manages the permissions between Redpanda and Open Catalog. To grant your Snowflake users or other query engines read access to the Iceberg tables, use https://other-docs.snowflake.com/en/opencatalog/access-control[Open Catalog access control^] to assign catalog privileges. For example, you can grant `TABLE_READ_DATA` to a read-only role rather than the `CATALOG_MANAGE_CONTENT` privilege used by the Redpanda service principal.

include::shared:partial$suggested-reading.adoc[]

- xref:manage:iceberg/query-iceberg-topics.adoc[]

// end::single-source[]
Loading