Skip to content

HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266

Open
paf91 wants to merge 7 commits into
apache:HDDS-13323-stsfrom
paf91:HDDS-15273-oidc-webidentity-sts
Open

HDDS-15273. Add OIDC AssumeRoleWithWebIdentity support to Ozone STS#10266
paf91 wants to merge 7 commits into
apache:HDDS-13323-stsfrom
paf91:HDDS-15273-oidc-webidentity-sts

Conversation

@paf91
Copy link
Copy Markdown

@paf91 paf91 commented May 14, 2026

What changes were proposed in this pull request?

This PR adds OIDC/WebIdentity support to Apache Ozone STS by implementing AssumeRoleWithWebIdentity on top of the existing STS runtime.

The feature is disabled by default through ozone.sts.web.identity.enabled=false, so there is no behavior change on upgrade unless explicitly enabled.

This change allows clients to exchange a Keycloak/OIDC JWT for temporary S3 credentials. The returned credentials can then be used with normal AWS SigV4 requests and x-amz-security-token.

Architecture

The change extends the existing Ozone STS temporary credential path instead of introducing a parallel S3 authentication model:

  • Keycloak/OIDC authenticates the caller by issuing a signed JWT.
  • Ozone STS validates the JWT using issuer, audience, expiry, and JWKS signature checks.
  • OM, not S3G, is the authoritative JWT validator.
  • OM authorizes role assumption through the configured Ozone authorizer / Ranger path.
  • OM issues temporary S3 credentials using the existing STS token infrastructure.
  • Subsequent S3 requests continue to use normal SigV4 plus x-amz-security-token.
  • Subsequent S3 authorization continues through the existing session-policy / Ranger authorization path.

Keycloak groups and roles are treated only as identity attributes. Ranger or the configured Ozone authorizer remains the policy decision point.

This PR does not replace Kerberos daemon authentication, does not add OFS OIDC login, does not add CLI device-code login, does not add daemon-to-daemon OIDC authentication, and does not use Keycloak Authorization Services as the Ozone policy engine.

Backward compatibility

  • Existing AssumeRole flow is unchanged.
  • Existing permanent S3 secret flow is unchanged.
  • Existing S3 SigV4 behavior is unchanged.
  • Previously-issued AssumeRole tokens remain valid.
  • Tokens serialized without AuthType deserialize as ASSUME_ROLE, preserving compatibility with previously-issued tokens.
  • The feature is disabled by default.

Main implementation points

  • Adds an STS-focused OIDC/JWT validation module with JWKS caching and refresh.
  • Adds ozone.sts.web.identity.* configuration keys.
  • Adds AssumeRoleWithWebIdentity request/response models and S3G STS XML response handling.
  • Adds a tightly scoped unauthenticated bootstrap bypass only for /sts Action=AssumeRoleWithWebIdentity when explicitly enabled.
  • Validates and strips raw WebIdentityToken in OM preExecute() before the Ratis-applied request is created.
  • The Ratis-applied request contains only sanitized identity/session fields, token expiry metadata, and token fingerprint.
  • Extends STSTokenIdentifier with a backward-compatible auth type for WebIdentity-backed credentials.
  • Keeps existing AssumeRole token compatibility.
  • Adds an authorizer hook for WebIdentity role assumption:
    IAccessAuthorizer.generateAssumeRoleWithWebIdentitySessionPolicy(...).

Security notes

  • S3G does not become the source of truth for JWT identity.
  • Raw OIDC JWTs are not persisted in the Ratis-applied request, OM metadata, STS tokens, or logs.
  • WebIdentity temporary credentials require the existing STS session token validation path.
  • SecretAccessKey and temporary credential material follow the existing STS credential handling model.
  • Operators must still protect OM/Ratis logs and metadata files.
  • alg=none is rejected.
  • Issuer, audience, exp, nbf, and iat are validated.
  • JWKS fetch is bounded by connect timeout, read timeout, and size limit.
  • Unknown-kid refresh is debounced and fails closed.
  • Insecure HTTP issuer/JWKS usage is test-only and emits an OM startup warning when explicitly enabled.
  • Deployments without a WebIdentity-capable Ranger/Ozone authorizer override fail closed with NOT_SUPPORTED_OPERATION.

Configuration keys

  • ozone.sts.web.identity.enabled
  • ozone.sts.web.identity.issuer.uri
  • ozone.sts.web.identity.jwks.uri
  • ozone.sts.web.identity.audience
  • ozone.sts.web.identity.username.claim
  • ozone.sts.web.identity.subject.claim
  • ozone.sts.web.identity.groups.claim
  • ozone.sts.web.identity.roles.claim
  • ozone.sts.web.identity.clock.skew
  • ozone.sts.web.identity.jwks.refresh.interval
  • ozone.sts.web.identity.jwks.connect.timeout
  • ozone.sts.web.identity.jwks.read.timeout
  • ozone.sts.web.identity.jwks.size.limit
  • ozone.sts.web.identity.require.https
  • ozone.sts.web.identity.allow.insecure.http.for.tests

Dependency

  • Uses Nimbus JOSE + JWT for OIDC/JWT/JWKS validation.

Design / user docs added

  • hadoop-hdds/docs/content/security/OzoneSTSWebIdentityKeycloakRanger.md

Design document was split into a separate PR against master:

#10338

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15273

How was this patch tested?

The patch was tested with focused unit, S3 Gateway, OM, mini-cluster, and Keycloak Testcontainers coverage.

Unit and component tests

mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/common \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestOidcJwtIdentityProvider,TestAssumeRoleWithWebIdentityRequest,TestAssumeRoleResponseInfo' \
  test

Result:

Tests run: 36, Failures: 0, Errors: 0, Skipped: 0
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/s3gateway \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestS3STSEndpoint,TestS3STSWebIdentityAuthBypassFilter,TestAuthorizationFilter' \
  test

Result:

Tests run: 44, Failures: 0, Errors: 0, Skipped: 0
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/ozone-manager \
  -am \
  -DskipITs \
  -DskipShade \
  -Dtest='TestSTSTokenSecretManager,TestSTSSecurityUtil,TestS3AssumeRoleWithWebIdentityRequest,TestS3AssumeRoleRequest,TestS3AssumeRoleResponse,TestSTSTokenIdentifier' \
  test

Result:

Tests run: 77, Failures: 0, Errors: 0, Skipped: 0

Mini-cluster E2E test

The mini-cluster E2E test verifies:

  • generated JWT + local JWKS;
  • AssumeRoleWithWebIdentity;
  • temporary AccessKeyId, SecretAccessKey, and SessionToken;
  • real AWS SDK v2 S3 SigV4 request with session token;
  • allowed bucket succeeds;
  • denied bucket fails;
  • wrong/missing session credentials fail.
mvn -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/integration-test-s3 \
  -am \
  -DskipShade \
  -Dtest=TestAssumeRoleWithWebIdentityEndToEnd \
  test

Result:

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

Keycloak Testcontainers integration test

The Keycloak IT starts a real Keycloak container, imports a test realm, obtains a real Keycloak JWT, exchanges it through Ozone STS, and uses the returned temporary credentials for S3 operations.

env DOCKER_HOST=unix:///var/run/docker.sock \
  TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock \
  mvn -Dapi.version=1.44 \
  -Dmaven.repo.local=/tmp/m2-ozone \
  -pl hadoop-ozone/integration-test-s3 \
  -am \
  -DskipShade \
  -Dtest=TestAssumeRoleWithWebIdentityKeycloakIT \
  test

Result:

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

The -Dapi.version=1.44 parameter is used for Docker API compatibility in the local Docker environment.

Static check

git diff --check origin/HDDS-13323-sts..HEAD

Result:

No output / clean

paf91 added 4 commits May 14, 2026 09:14
Includes:
- STS-focused OIDC/JWKS validator
- OIDC config keys
- AssumeRoleWithWebIdentity authorizer request shape
- fail-closed default authorizer hook
- S3G STS bootstrap auth bypass only for Action=AssumeRoleWithWebIdentity
- design doc
- unit tests
Includes:

- OM protocol/client path for AssumeRoleWithWebIdentity

- OM-side JWT validation in preExecute()

- sanitized Ratis request without raw WebIdentityToken

- WebIdentity-backed STS token identity model

- backward-compatible STSTokenIdentifier authType

- reuse of existing STS validation path for subsequent S3 requests

- S3G STS XML response/routing for WebIdentity

- tests proving no raw JWT persistence and replay determinism
@peterxcli peterxcli self-requested a review May 14, 2026 07:17
@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks @paf91 for the patch. As quick initial feedback: there is a compile error that prevents further test runs.

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Thanks @paf91 for the patch. As quick initial feedback: there is a compile error that prevents further test runs.

My bad, the earlier validation was focused on selected unit/E2E tests and did not run the exact PR compile lane from CI from a clean state

@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks for the update. Please make sure to check results, there are checkstyle/pmd errors.

https://github.com/paf91/ozone/actions/runs/25856843411

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Thanks for the update. Please make sure to check results, there are checkstyle/pmd errors.

https://github.com/paf91/ozone/actions/runs/25856843411

Thanks, reran the relevant checks locally and fixed it. Now CI looks green

@peterxcli
Copy link
Copy Markdown
Member

Do you plan to add smoke tests that use the new Docker Compose cluster, including Keycloak or another IdP container?

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

Do you plan to add smoke tests that use the new Docker Compose cluster, including Keycloak or another IdP container?

Not in this PR. I kept this focused on STS runtime + Java integration coverage.

There is already a Keycloak Testcontainers IT with a real Keycloak container and real JWT exchange.

I agree a Docker Compose smoke test would be useful for packaged config/deployment coverage. If you think it is a blocker, I can add a small one here; otherwise I’d prefer a follow-up JIRA/PR.

@peterxcli
Copy link
Copy Markdown
Member

Or maybe you can split this into smaller prs and leave this for the discussion of design review and the reference of basic implementation/MVP

btw personally I really like this proposal because this make ozone more usable for modern cloud environment. actually I was trying to design this this morning haha

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 14, 2026

btw personally I really like this proposal because this make ozone more usable for modern cloud environment. actually I was trying to design this this morning haha

Thanks, that is exactly the motivation: make Ozone STS usable in OIDC/cloud-native environments while keeping Ranger/Ozone authorizer as the PDP.

I can split it if you think that would make review easier.. The split would be like:

  1. OIDC/JWKS + config + design doc
  2. AssumeRoleWithWebIdentity runtime
  3. E2E + Keycloak IT + docs + compose smoke test

My preference is to keep this PR together for now (I am lazy haha), since the pieces are connected already and the current PR already shows the full MVP flow end-to-end.

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 16, 2026

I can resolve conflicts here if needed @peterxcli

@peterxcli
Copy link
Copy Markdown
Member

Up to you, or we can just leave it as a discussion.

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can split this into smaller prs and leave this for the discussion of design review and the reference of basic implementation/MVP

My preference is to keep this PR together for now (I am lazy haha),

I think the former approach is better, but we can experiment with reviewing as a whole. (I'm guessing it will increase number of review rounds, hope you can accept that.)

</description>
</property>
<property>
<name>ozone.sts.web.identity.enabled</name>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these new config properties as @Config annotations in OidcConfig instead of XML + constants split between two files.

Comment on lines +54 to +58
@SuppressWarnings("checkstyle:ParameterNumber")
public AssumeRoleWithWebIdentityRequest(String host, InetAddress ip,
String user, Set<String> groups, Set<String> roles, String roleArn,
String roleSessionName, String issuer, String subject, String audience,
String providerId, Set<AssumeRoleRequest.OzoneGrant> grants) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Builder class, newBuilder method, change the constructor to accept only the Builder instead of all properties, and make the constructor private.

import org.apache.commons.lang3.StringUtils;

/**
* Thread-safe JWKS cache with refresh-on-unknown-kid semantics.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common is for code shared between client and server components. These classes in org.apache.hadoop.ozone.security.oidc are not needed in clients, please move them to ozone-manager (along with additional dependency nimbus-jose-jwt).

requireNonBlank(issuerUri, OZONE_STS_WEB_IDENTITY_ISSUER_URI);
requireNonBlank(audience, OZONE_STS_WEB_IDENTITY_AUDIENCE);

if (requireHttps && !allowInsecureHttpForTests) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two config properties/flags? allowInsecureHttpForTests seems to be the test-specific version of requireHttps (inverted).

Comment thread pom.xml
<swagger-annotations-version>1.5.4</swagger-annotations-version>
<test.build.data>${test.build.dir}</test.build.data>
<test.build.dir>${project.build.directory}/test-dir</test.build.dir>
<testcontainers.version>1.21.3</testcontainers.version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use latest version (2.0.5)?

Comment on lines +88 to +96
private static byte[] readFully(InputStream stream) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int read;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
return out.toByteArray();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reuse org.apache.commons.io.IOUtils.readFully?

Comment on lines +61 to +73
private boolean isWebIdentityEnabled() {
OzoneConfiguration conf = ozoneConfiguration;
if (conf == null) {
conf = OzoneConfigurationHolder.configuration();
}
return conf != null && conf.getBoolean(OZONE_STS_WEB_IDENTITY_ENABLED,
OZONE_STS_WEB_IDENTITY_ENABLED_DEFAULT);
}

@VisibleForTesting
void setOzoneConfiguration(OzoneConfiguration conf) {
this.ozoneConfiguration = conf;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think field injection can be replaced with constructor injection, making test-specific mutator unnecessary.

Comment on lines +174 to +182
private static String read(InputStream stream) throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[256];
int read;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
return new String(out.toByteArray(), StandardCharsets.UTF_8);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use IOUtils.

Comment on lines +2456 to +2457
required string roleArn = 1;
required string roleSessionName = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai
Copy link
Copy Markdown
Contributor

@adoroszlai
Copy link
Copy Markdown
Contributor

I can split it if you think that would make review easier.. The split would be like:

  1. OIDC/JWKS + config + design doc
  2. AssumeRoleWithWebIdentity runtime
  3. E2E + Keycloak IT + docs + compose smoke test

Please split design doc to its own PR against master:

  • we don't want to run all tests for any doc changes
  • separate approval of design and code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 23, 2026

Please split design doc to its own PR against master:

  • we don't want to run all tests for any doc changes
  • separate approval of design and code

Done

@paf91
Copy link
Copy Markdown
Author

paf91 commented May 23, 2026

Following review feedback, I split the design document into a separate PR against master:

#10338

I removed the design doc from this implementation PR to keep #10266 focused on runtime/code changes. The operator/runtime Keycloak/Ranger guide remains in this PR because it is tied to the configuration and behavior introduced here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants