Skip to content

fix: Harden webhook, vault, session, and base client paths#654

Open
gjtorikian wants to merge 12 commits intomainfrom
updates
Open

fix: Harden webhook, vault, session, and base client paths#654
gjtorikian wants to merge 12 commits intomainfrom
updates

Conversation

@gjtorikian
Copy link
Copy Markdown
Contributor

Summary

  • Webhook/action signature verification: reject future-skewed timestamps in webhooks/_resource, webhooks/_verification, and actions._verify_signature by comparing abs(seconds_since_issued); honor an explicit tolerance=0 (was silently coerced to the 180s default).
  • Vault crypto: preserve empty associated_data/aad (was dropped, causing encrypt/decrypt AAD divergence) and bound _decode_u32_leb128 to 32 bits so a malformed payload can't yield a >32-bit key_len. Empty-AAD ciphertexts written by older SDKs need migration — noted for release notes.
  • Public client isolation: create_public_client now forces api_key=None and sets is_public=True, so a PKCE/public client cannot pick up WORKOS_API_KEY from the process env.
  • Session refresh: map auth-flow/network exceptions to structured AuthenticateWithSessionCookieFailureReason members (MFA_CHALLENGE_REQUIRED, SSO_REQUIRED, EMAIL_VERIFICATION_REQUIRED, ORGANIZATION_SELECTION_REQUIRED, REFRESH_DENIED, REFRESH_NETWORK_ERROR) instead of stringifying. Additive — string consumers keep working.
  • Base client URL safety: percent-encode every path segment via a single _encode_path helper so user-supplied identifiers can't escape their segment with /, ?, #, or %. Covers all ~115 generated call sites.

Test plan

  • uv run ruff format --check .
  • uv run ruff check .
  • uv run pyright
  • uv run pytest (in particular tests/test_actions.py, tests/test_webhook_verification.py, vault, session, and base-client suites)
  • Manually verify a public client built via create_public_client does not include client_secret even when WORKOS_API_KEY is set in the environment
  • Confirm release notes call out the empty-AAD wire-format change for locally-persisted vault ciphertexts

gjtorikian and others added 9 commits May 7, 2026 14:35
Centralize path-segment encoding so user-supplied identifiers cannot
escape their intended URL segment via reserved characters such as
'/', '?', '#', or '%'. A new helper `_encode_path` splits on '/'
and applies `urllib.parse.quote(seg, safe='')` to each segment, and
is invoked from both the sync and async `request` methods. This
covers all ~115 generated call sites without per-site changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds new members to `AuthenticateWithSessionCookieFailureReason`
(`MFA_CHALLENGE_REQUIRED`, `SSO_REQUIRED`,
`EMAIL_VERIFICATION_REQUIRED`, `ORGANIZATION_SELECTION_REQUIRED`,
`REFRESH_DENIED`, `REFRESH_NETWORK_ERROR`) and maps the relevant
auth-flow / network exceptions raised during refresh to those
reasons instead of stringifying the exception. The `reason` field
is already typed `Union[..., str]`, so existing string-based
consumers keep working — additive only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`create_public_client` now passes `api_key=None` and `is_public=True`
to `WorkOSClient`. The base client honors `is_public` by forcing
`_api_key` to `None` and ignoring the `WORKOS_API_KEY` environment
variable, so a public/PKCE client cannot accidentally pick up an
API key from the process environment.

The existing `if self._client._api_key` guards in PKCE token-
exchange paths (`sso/_resource.py`, `user_management/_resource.py`)
keep `client_secret` out of the request body for public clients
without further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the freshness window only rejected timestamps that were
older than the tolerance — clock skew that put the issued timestamp
in the future was silently accepted, opening the verifier to
replay-style attacks once the attacker's clock drifted forward.

Compare `abs(seconds_since_issued)` against `max_seconds_since_issued`
in both the sync and async `verify_header` implementations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_decode_u32_leb128` previously kept reading continuation bytes
after the 4th byte (5+ continuations could yield a 35+ bit value
that would later be used as `key_len` for slicing the payload).
Tighten the loop to reject a 5th continuation byte (`i >= 4 and
b & 0x80`) and validate the decoded result against the 32-bit
ceiling (`> 0xFFFFFFFF`) before returning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the truthy `if aad:` / `if associated_data` guards with
explicit `is not None` checks in `_aes_gcm_encrypt`,
`_aes_gcm_decrypt`, and the four `aad_buffer` construction sites.
Previously an empty string passed as `associated_data` (or an empty
`bytes` aad) was silently dropped, so encrypt and decrypt paths
disagreed on the AAD bound to the GCM tag.

Wire-format risk: any locally-persisted vault ciphertext that was
encrypted by an earlier version of this SDK with `associated_data=""`
(or `aad=b""`) was actually written without AAD. After this fix the
SDK now binds an empty AAD, so re-encrypting with the same empty
string will produce a tag the old SDK cannot verify, and decrypting
old empty-AAD ciphertexts must continue to use no AAD. WorkOS
Vault ciphertexts live server-side, so production impact is bounded;
applications that have stashed ciphertexts locally must migrate.
Document in the next release migration note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tolerance or DEFAULT_TOLERANCE` silently coerced an explicit
`tolerance=0` (caller wants no tolerance) to the default 180-second
window. Use `tolerance if tolerance is not None else DEFAULT_TOLERANCE`
so callers that ask for a strict zero window get one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff format --check flagged the ORGANIZATION_SELECTION_REQUIRED return as
exceeding line length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 7f58e9f closed the future-timestamp replay gap in
`webhooks/_resource.verify_header`, but the same vulnerable
comparison still lived in `actions._verify_signature` and the
standalone `webhooks/_verification.verify_header` helper — both
rejected only past-skewed timestamps, silently accepting issued
timestamps in the future.

Compare `abs(seconds_since_issued)` in both helpers and add
regression tests covering the future-timestamp case for actions,
the webhook resource, and the standalone verifier.
@gjtorikian gjtorikian requested review from a team as code owners May 8, 2026 19:13
@gjtorikian gjtorikian requested a review from awolfden May 8, 2026 19:13
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR hardens five distinct security and correctness surface areas across the WorkOS Python SDK: webhook/action timestamp validation, vault LEB128 payload parsing, public-client credential isolation, session refresh exception mapping, and URL path encoding.

  • Timestamp validation: All three signature-verification paths now use abs(seconds_since_issued) to reject both old and future-skewed timestamps; _verification.py also fixes the tolerance or DEFAULT pattern so an explicit tolerance=0 is honoured rather than silently overridden.
  • Vault LEB128 hardening: _decode_u32_leb128 now raises on the 5th byte when a continuation bit is set (i >= 4 and b & 0x80 != 0) and adds a res > 0xFFFFFFFF guard, closing a window where a malformed payload could yield a key-length larger than 32 bits.
  • Public client isolation + path encoding: create_public_client forces is_public=True to prevent server credentials from leaking into PKCE flows; a new _encode_path helper percent-encodes every user-supplied path segment across all request call sites to block path-traversal via reserved characters.

Confidence Score: 5/5

Safe to merge — all five hardening areas are narrowly scoped, well-tested, and additive; the only behavioural break is the empty-AAD vault wire format, which the PR description already flags for release notes.

Each change is a tightly contained correctness fix (abs-wrap, None-guard, LEB128 bit-boundary check, public-client credential null-out, path segment encoding). The MfaEnrollmentError gap flagged in a prior review thread is now mapped. Tests cover future-timestamp rejection, tolerance=0, all eleven exception-to-reason mappings, and the unknown-exception fallback. No regressions introduced on the reviewed paths.

No files require special attention. The vault _aes_gcm_encrypt/_aes_gcm_decrypt if-aad guards are unchanged (the empty-AAD wire-format note in the PR description is documentation/release-notes scope, not a code defect in this diff).

Important Files Changed

Filename Overview
src/workos/_base_client.py Adds _encode_path static helper to percent-encode path segments, wiring it into both sync and async request methods; adds is_public constructor flag that forces the auth credential to None so public PKCE clients cannot inherit server credentials from the process environment.
src/workos/actions.py Single-line fix: wraps seconds_since_issued in abs() to reject future-skewed timestamps. Tolerance is a mandatory int default, so no tolerance=0 hole exists here.
src/workos/public_client.py Sets is_public=True on the underlying client constructor and nulls out the auth credential at construction time, ensuring public PKCE clients never carry server-side credentials.
src/workos/session.py Adds _map_refresh_exception_to_reason helper mapping all AuthenticationFlowError/AuthenticationError/network errors to structured enum members (including the previously-missing MfaEnrollmentError); replaces bare str(e) in refresh paths.
src/workos/vault.py Tightens _decode_u32_leb128: raises on a 5th continuation byte (i >= 4 and b & 0x80 != 0) and adds an explicit res > 0xFFFFFFFF guard so a valid 5-byte encoding that encodes a >32-bit value is still rejected.
src/workos/webhooks/_verification.py Fixes two bugs: (1) replaces tolerance or DEFAULT_TOLERANCE with tolerance if tolerance is not None else DEFAULT_TOLERANCE to honour an explicit tolerance=0; (2) wraps seconds_since_issued in abs() to catch future-dated timestamps.
src/workos/webhooks/_resource.py Applies the same abs(seconds_since_issued) fix in both the sync Webhooks and async AsyncWebhooks class implementations.
tests/test_session.py Adds TestMapRefreshExceptionToReason parametrized suite covering all 11 exception-to-reason mappings plus the unknown-exception fallback string.
tests/test_webhook_verification.py Adds future-timestamp rejection tests for both the class-based and standalone webhook verifier, and adds a tolerance=0 test confirming the explicit-zero fix in _verification.py.
tests/test_actions.py Adds test_verify_header_future_timestamp to verify the abs() fix rejects a timestamp 60 seconds in the future within a 30-second tolerance window.

Reviews (3): Last reviewed commit: "Revert "fix: preserve empty AAD in vault..." | Re-trigger Greptile

Comment thread src/workos/session.py
@gjtorikian gjtorikian changed the title Harden webhook, vault, session, and base client paths fix: Harden webhook, vault, session, and base client paths May 8, 2026
gjtorikian and others added 3 commits May 8, 2026 12:33
…tured reasons

Refresh failures from MfaEnrollmentError, OrganizationAuthMethodsRequiredError,
AuthenticationMethodNotAllowedError, and RadarChallengeError previously fell
through the AuthenticationError check (they inherit from AuthorizationError)
and surfaced as bare strings. Add enum members and isinstance branches so all
explicit AuthKit flow outcomes resolve to structured reasons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression guard for the standalone _verification.verify_header tolerance
handling: a 1-second-old signature with tolerance=0 must be rejected, so
reverting to the prior `tolerance or DEFAULT_TOLERANCE` form would fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 0c12462.

The original truthy `if aad:` guard was self-consistent within the SDK
(encrypt and decrypt agreed: empty string and None both meant "no AAD"),
and matches the convention used by sibling WorkOS SDKs. Switching to
`is not None` made `aad=""` semantically distinct from `aad=None`, which
is cryptographically more precise but breaks decryption of every vault
ciphertext previously encrypted with `associated_data=""` once a caller
upgrades — the new SDK calls `authenticate_additional_data(b"")` and the
GCM tag (computed without AAD on the encrypt side) will not verify.

The threat being closed (a caller passing "" expecting an empty AAD to
be authenticated, but silently getting none) is a narrow integrity
misperception, not an exploitable hole, and there is no signal anyone
in the wild relies on it. The wire-format break is the larger harm.
A coordinated cross-SDK change with a versioned ciphertext header is
the right path if this ever needs to be revisited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant