Pangolin provides mechanisms to securely vend temporary credentials to clients for S3, Azure ADLS Gen2, and Google Cloud Storage, enabling direct data access while maintaining security.
Instead of sharing long-term cloud credentials with clients (e.g., Spark jobs, Dremio, Trino), Pangolin acts as a trusted intermediary. It authenticates the client and then issues temporary, scoped credentials for specific storage resources.
Benefits:
- ✅ No long-term credentials in client configurations
- ✅ Automatic credential rotation (STS for S3, OAuth2 for Azure/GCP)
- ✅ Scoped access to specific table locations
- ✅ Centralized audit trail of data access
- ✅ Support for cross-account/cross-cloud access
- ✅ Multi-cloud support: S3, Azure ADLS Gen2, Google Cloud Storage
⚠️ Local Filesystem: Supported for dev/test (no credential vending involved)
-
AWS Credentials: Pangolin needs AWS credentials with permissions to:
- Call
sts:AssumeRole(for STS vending) - Access S3 buckets (for static credential vending)
- Generate presigned URLs
- Call
-
IAM Role (for STS vending): Create an IAM role that Pangolin can assume with S3 access permissions.
# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
# For STS Credential Vending
PANGOLIN_STS_ROLE_ARN=arn:aws:iam::123456789012:role/PangolinDataAccess
PANGOLIN_STS_SESSION_DURATION=3600 # 1 hour (default)
# For MinIO or S3-compatible storage
AWS_ENDPOINT_URL=http://minio:9000
AWS_ALLOW_HTTP=trueTo enable credential vending, configure the vending_strategy in your warehouse definition.
AWS S3 (STS Mode):
curl -X POST http://localhost:8080/api/v1/warehouses \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "production_warehouse",
"storage_config": {
"type": "s3",
"bucket": "my-data-bucket",
"region": "us-east-1",
"s3.role-arn": "arn:aws:iam::123456789012:role/PangolinDataAccess"
},
"vending_strategy": {
"AwsSts": {
"role_arn": "arn:aws:iam::123456789012:role/PangolinDataAccess"
}
}
}'AWS S3 (Static Mode):
curl -X POST http://localhost:8080/api/v1/warehouses \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "dev_warehouse",
"storage_config": {
"type": "s3",
"bucket": "my-dev-bucket",
"region": "us-east-1",
"s3.access-key-id": "AKIA...",
"s3.secret-access-key": "..."
},
"vending_strategy": "AwsStatic"
}'OAuth2 Mode (Recommended):
curl -X POST http://localhost:8080/api/v1/warehouses \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "azure_warehouse",
"storage_config": {
"type": "azure",
"azure.account-name": "mystorageaccount",
"azure.container": "data",
"azure.client-id": "azure-client-id",
"azure.client-secret": "azure-client-secret",
"azure.tenant-id": "azure-tenant-id"
},
"vending_strategy": {
"AzureOAuth": {
"client_id": "azure-client-id",
"client_secret": "azure-client-secret",
"tenant_id": "azure-tenant-id"
}
}
}'Account Key Mode:
curl -X POST http://localhost:8080/api/v1/warehouses \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "azure_warehouse",
"storage_config": {
"type": "azure",
"azure.account-name": "mystorageaccount",
"azure.container": "data",
"azure.account-key": "your-account-key"
},
"vending_strategy": "AzureSas"
}'curl -X POST http://localhost:8080/api/v1/warehouses \
-H "X-Pangolin-Tenant: <tenant-id>" \
-H "Content-Type: application/json" \
-d '{
"name": "gcp_warehouse",
"storage_config": {
"type": "gcs",
"gcp.project-id": "my-gcp-project",
"gcp.bucket": "my-data-bucket",
"gcp.service-account-key": "{...json key...}"
},
"vending_strategy": "GcpDownscoped"
}'Get a presigned URL to download a specific file (e.g., a metadata file or data file) without needing AWS credentials.
Endpoint: GET /v1/{prefix}/namespaces/{namespace}/tables/{table}/presign?location=s3://bucket/key
Request:
curl "http://localhost:8080/v1/analytics/namespaces/sales/tables/transactions/presign?location=s3://my-bucket/data/file.parquet" \
-H "Authorization: Bearer <token>" \
-H "X-Pangolin-Tenant: <tenant-id>"Response:
{
"url": "https://bucket.s3.amazonaws.com/key?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=..."
}Use Case: Web applications, data preview tools, or clients that can't handle AWS credentials.
PyIceberg automatically uses Pangolin's credential vending when configured correctly for all supported cloud providers.
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"pangolin",
**{
"uri": "http://localhost:8080",
"prefix": "analytics",
"token": "your-jwt-token",
# No S3 credentials needed - Pangolin vends them automatically!
}
)
# PyIceberg will request credentials from Pangolin for each table access
table = catalog.load_table("sales.transactions")
df = table.scan().to_pandas() # Pangolin vends S3 credentials automaticallycatalog = load_catalog(
"pangolin_azure",
**{
"uri": "http://localhost:8080",
"prefix": "azure_catalog",
"token": "your-jwt-token",
# No Azure credentials needed - Pangolin vends them automatically!
# Pangolin provides: adls.token, adls.account-name, adls.container
}
)
table = catalog.load_table("sales.transactions")
df = table.scan().to_pandas() # Pangolin vends Azure credentials automaticallycatalog = load_catalog(
"pangolin_gcp",
**{
"uri": "http://localhost:8080",
"prefix": "gcp_catalog",
"token": "your-jwt-token",
# No GCP credentials needed - Pangolin vends them automatically!
# Pangolin provides: gcp-oauth-token, gcp-project-id
}
)
table = catalog.load_table("sales.transactions")
df = table.scan().to_pandas() # Pangolin vends GCP credentials automaticallyHow it works:
- PyIceberg requests table metadata from Pangolin
- Pangolin includes temporary cloud credentials in the response (based on warehouse type)
- PyIceberg uses these credentials to read data files from cloud storage
- Credentials expire after the configured duration (default: 1 hour)
If you prefer to manage credentials yourself:
S3:
catalog = load_catalog(
"pangolin",
**{
"uri": "http://localhost:8080",
"prefix": "analytics",
"token": "your-jwt-token",
"s3.access-key-id": "AKIA...",
"s3.secret-access-key": "...",
}
)Azure:
catalog = load_catalog(
"pangolin_azure",
**{
"uri": "http://localhost:8080",
"prefix": "azure_catalog",
"token": "your-jwt-token",
"adls.account-name": "mystorageaccount",
"adls.account-key": "...",
}
)GCP:
catalog = load_catalog(
"pangolin_gcp",
**{
"uri": "http://localhost:8080",
"prefix": "gcp_catalog",
"token": "your-jwt-token",
"gcp-project-id": "my-project",
"gcs.service-account-key": "/path/to/key.json",
}
)This is the policy for the IAM role that Pangolin assumes to vend credentials:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-data-bucket/*",
"arn:aws:s3:::my-data-bucket"
]
}
]
}Allow Pangolin's AWS account/role to assume the data access role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR-PANGOLIN-ACCOUNT:role/PangolinService"
},
"Action": "sts:AssumeRole"
}
]
}Always use AwsSts strategy in production to vend temporary credentials instead of sharing static credentials.
Pangolin vends credentials scoped to specific table locations, limiting blast radius if credentials are compromised.
Configure PANGOLIN_STS_SESSION_DURATION to the minimum time needed (default: 3600 seconds / 1 hour).
Add IAM policy conditions to restrict access by IP, time, or other factors:
{
"Condition": {
"IpAddress": {
"aws:SourceIp": "10.0.0.0/8"
}
}
}Review CloudTrail logs for STS AssumeRole calls to detect unusual patterns.
If using static credentials (AwsStatic), rotate them regularly.
Cause: Vended credentials don't have permissions for the S3 location.
Solution:
- Verify the IAM role has S3 permissions for the table location
- Check the role ARN in warehouse configuration
- Verify Pangolin can assume the role:
aws sts assume-role --role-arn <arn> --role-session-name test
Cause: STS credentials expired during a long-running query.
Solution:
- Increase
PANGOLIN_STS_SESSION_DURATION(max: 43200 seconds / 12 hours) - Configure your client to refresh credentials automatically
- For very long queries, consider using static credentials
Cause: PyIceberg may be using client-provided credentials instead.
Solution:
- Remove
s3.access-key-idands3.secret-access-keyfrom PyIceberg config - Verify warehouse has
vending_strategyconfigured correctly - Check Pangolin logs for credential vending requests
Cause: Clock skew between Pangolin server and AWS.
Solution:
- Sync server time with NTP:
sudo ntpdate -s time.nist.gov - Verify server timezone is set correctly
- Check CloudTrail for timestamp-related errors
Cause: Trust policy or permissions issue.
Solution:
- Verify trust policy allows Pangolin's role to assume the target role
- Check both the trust policy and the permissions policy
- Test with AWS CLI:
aws sts assume-role --role-arn <target-role> --role-session-name test
# Set custom session duration (in seconds)
PANGOLIN_STS_SESSION_DURATION=7200 # 2 hoursFor enhanced security in cross-account scenarios:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::PANGOLIN-ACCOUNT:role/PangolinService"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "unique-external-id-12345"
}
}
}
]
}Set in Pangolin:
PANGOLIN_STS_EXTERNAL_ID=unique-external-id-12345For better performance, use regional STS endpoints:
AWS_STS_REGIONAL_ENDPOINTS=regional- Warehouse Management - Creating and configuring warehouses
- Authentication - User authentication and tokens
- Client Configuration - PyIceberg, Spark, Trino setup
- AWS S3 Storage - S3 storage backend configuration