Skip to content

Security hardening + correctness fixes: WS authz, OAuth2 OIDC, file-service IDOR, macro feature-gating#20

Open
JedimEmO wants to merge 12 commits into
masterfrom
security/websocket-authz
Open

Security hardening + correctness fixes: WS authz, OAuth2 OIDC, file-service IDOR, macro feature-gating#20
JedimEmO wants to merge 12 commits into
masterfrom
security/websocket-authz

Conversation

@JedimEmO

@JedimEmO JedimEmO commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Fixes the correctness bugs, security gaps, and design issues tracked in #19, across the WebSocket RPC stack, the service macros, the auth/identity crates, and the example templates.

Correctness (WebSocket RPC)

  • Client connect(): replaced the busy-wait/hang with a Notify + connection_timeout; only reports Connected after the handshake; call() now removes its pending entry on every failure path (no more permanent "Too many pending requests").
  • Connection manager: senders are cloned out before any .await (no shard-lock-across-await deadlock), broadcasts fan out concurrently, and the subscribe/remove_connection/cleanup races are closed with remove_if + a live-connection check.
  • Protocol: unparseable frames are answered with JSON-RPC -32700 and the connection stays open.

Security / authz

  • WS hardening: default-deny authorize_subscribe, require_auth defaults to true, and periodic credential re-validation (with_auth_revalidation, 30s default) bounds revoked/expired tokens to one interval.
  • REST macro: request bodies are read and deserialized only after auth succeeds; new body_limit option (413 on excess).
  • OAuth2: OIDC nonce on every request; id_token iss/aud/exp/nonce validated on callback (signature skipped per OIDC Core §3.1.3.7, token comes from the token endpoint over TLS); login-CSRF guard via start_flow_bound + callback binding.
  • File-service example: exact full-UUID id match (no prefix IDOR), per-owner storage scopes with ownership-enforced secure download, and extension/content-type allowlisting (no reflected Content-Type → XSS).
  • Bounded resources: OAuth2 InMemoryStateStore self-prunes + caps pending flows; SessionService::start_cleanup_task background sweeper; WS max_connections cap.

Design / maintainability

  • Shared auth pipeline in ras-auth-core (authorize_request, check_permission_groups, user_satisfies_permission_groups) replaces ~6 inlined codegen copies — and fixes the WITH_PERMISSIONS([]) divergence (file macro was deny-all vs authenticated-only elsewhere).
  • OAuth2 start_flow is now a typed method instead of returning its success value inside Err.
  • Macro feature-gating: opt-in feature_gated: true emits #[cfg(feature = ...)]-wrapped code resolved against the consumer crate, fixing the workspace feature-unification flaw.

Verification

  • cargo test --workspace — 102 suites green (regression tests added for each fix)
  • cargo clippy --workspace --all-targets — clean
  • cargo audit — 0 vulnerabilities (2 informational warnings from ratatui in the TUI example only)

42 files changed, +2246/−637.

Closes #19

🤖 Generated with Claude Code

JedimEmO and others added 12 commits June 9, 2026 22:49
- Add default-deny authorize_subscribe hook to MessageHandler; the
  default handle_subscribe now only subscribes topics the service
  explicitly authorizes (was: any attacker-supplied topic string).
- Flip require_auth default to true in WebSocketServiceBuilder and the
  macro-generated service builder; anonymous connections now require an
  explicit opt-out.
- Periodically re-validate connection credentials: the token is captured
  pre-upgrade and re-run through the AuthProvider every
  auth_revalidation_interval (default 30s, configurable). Failure closes
  the connection; success refreshes the cached user/permissions, so
  revoked or expired tokens are bounded to one interval on long-lived
  connections.
- bidirectional-chat-server: depend on ras-rest-macro with
  default-features = false so the reqwest/client feature is not forced
  into the server build via proc-macro feature unification.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- connect() no longer spins in a hot loop waiting for the server's
  ConnectionEstablished message. It now parks on a Notify signaled by
  the message handler, bounded by connection_timeout, and tears down the
  half-open connection on timeout instead of hanging forever.
- The client only reports Connected (and only starts the heartbeat)
  after the handshake completes; previously concurrent calls could pass
  the state check with no connection id, and the heartbeat task could
  race the state write and exit immediately.
- call() removes its pending-request entry on send failure, response
  timeout, and channel closure. Previously timed-out entries leaked
  until the map hit max_pending_requests, after which every call failed
  permanently with 'Too many pending requests'.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…bscription races

- send_to_connection and all broadcast paths clone senders out of the
  map before awaiting. Previously a slow consumer with a full bounded
  channel parked the send while holding the shard guard, blocking (or
  deadlocking) every other access to that shard, including the
  consumer's own remove_connection.
- Broadcasts now fan out concurrently via join_all; wall-clock is
  bounded by the slowest recipient instead of the sum of all sends.
- add_subscription only indexes live connections, dedups, and undoes
  its insert if the connection vanished mid-call, so a subscribe racing
  remove_connection cannot leave zombie ids in the topic index.
- remove_subscription / remove_pending_request use remove_if for the
  empty-entry cleanup so a concurrent insert between retain and remove
  is not thrown away.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ping the connection

A text frame that parses as neither a BidirectionalMessage nor a
JSON-RPC request previously bubbled an InvalidRequest error into the
handler loop, which closed the whole connection without any
protocol-level response. Per JSON-RPC 2.0 the server now replies with a
Parse Error (-32700, id null) and keeps serving the connection; only
transport failures terminate the loop.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… option

Generated REST handlers previously used an axum Json extractor, so the
full body was read and deserialized before any auth/CSRF/permission
check ran — free pre-auth CPU and allocation for unauthenticated
clients. Handlers now take the raw request and read + deserialize the
body inside the handler, after authorization succeeds (matching what
the file-service macro already did). Unauthorized endpoints keep
parsing up front.

rest_service! also gains an optional body_limit field (bytes, default
2 MiB to match axum); oversized bodies are answered with 413.

Regression tests: malformed JSON without credentials now yields 401
(not 400), and the body_limit option is enforced end-to-end.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The credential → CSRF → authenticate → OR-of-AND permission sequence was
inlined by codegen in six places (REST canonical + versioned handlers,
JSON-RPC dispatch, file-service handlers, bidirectional method dispatch)
with already-diverging copies. It now lives in ras-auth-core:

- authorize_request(): the full pipeline for header-authenticated HTTP
  services; generated REST handlers shrink to one call plus a per-service
  error-shape mapper, with byte-identical responses.
- check_permission_groups(): provider-aware OR-of-AND group check used by
  the REST, JSON-RPC and file macros.
- user_satisfies_permission_groups(): set-membership variant for the
  bidirectional handler, which authorizes against the cached connection
  user.

This also fixes a real divergence: file_service! treated
WITH_PERMISSIONS([]) as deny-all while the REST and JSON-RPC macros
treated it as any-authenticated-user. All macros now share the
authenticated-only semantics.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
OAuth2Provider::verify() previously answered StartFlow payloads by
JSON-encoding the authorization URL into
Err(IdentityError::ProviderError(...)) — the success value travelled
inside an error, every caller had to pattern-match an error string and
parse JSON out of it, and generic error logging reported successful
flow starts as provider failures.

start_flow(provider_id, additional_params) is now a public method
returning OAuth2Result<OAuth2Response>. verify() handles only Callback
payloads and answers StartFlow with UnsupportedMethod. The demo server,
the google_oauth2 example, the README, and the tests now use the typed
API (the provider is Clone, so callers keep a handle for flow initiation
alongside the one registered with SessionService).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rue)

The four service macros decided whether to emit server/client/reqwest
code via cfg!(feature = ...), which evaluates the PROC-MACRO crate's
feature set. Cargo unifies those features across the whole workspace, so
any crate enabling e.g. ras-rest-macro/reqwest forced client codegen
(and its transport dependencies) into every other consumer's expansion —
the root cause of the earlier bidirectional-chat-server build break.

Proc macros cannot observe the consumer's features (Cargo passes them as
--cfg flags, not env), so the fix is to emit the generated blocks
wrapped in #[cfg(feature = "server")] / #[cfg(feature = "client")] /
#[cfg(feature = "reqwest")] attributes, which rustc resolves against
the CONSUMER crate. This is opt-in via a new service-level field:

    rest_service!({ ..., feature_gated: true, ... })

supported by rest_service!, jsonrpc_service!, file_service! and
jsonrpc_bidirectional_service!. Consumers that opt in declare their own
server/client (and optionally reqwest) features, as bidirectional-chat-
api already did; feature-less consumers keep the old macro-crate-feature
behavior. chat-api now uses feature_gated: true for both its services,
pinning the regression. Permission-manifest gating still follows the
macro crate's permissions feature (additive; unchanged scope).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ackend

The wasm file-service backend is a template users copy, and it had three
real vulnerabilities (audit HIGH-4):

- get_file matched stored files by starts_with(file_id): a truncated or
  single-character id matched arbitrary files (IDOR/enumeration). Ids
  are now validated as full UUIDs and matched exactly against the file
  stem.
- download_secure had no object-ownership check (the handler even said
  so in a comment). Files are now stored per-scope (public/ vs
  users/{owner}/), the authenticated uploader is recorded as owner, and
  the secure download only resolves files in the caller's own scope;
  anything else is indistinguishable from missing. Owner ids are
  charset-validated before becoming a path segment.
- The served Content-Type derived from the attacker-controlled upload
  filename extension. Extensions are now allowlisted at save time
  (everything else stored as .bin) and only known-safe content types are
  echoed on download; everything else is application/octet-stream.

Regression tests: prefix ids rejected, cross-user and public access to
owned files yields 404, unauthenticated secure download yields 401,
.html upload served as opaque octet-stream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n binding

Closes audit HIGH-3. Previously the identity was taken solely from the
userinfo endpoint, id_tokens were never inspected, no nonce was sent,
and the state parameter was not bound to the browser session that
started the flow.

- Every authorization request now carries a fresh OIDC nonce, stored in
  the flow state.
- When the token endpoint returns an id_token, its claims are validated
  on callback: iss must equal the new OAuth2ProviderConfig::issuer field
  (when configured), aud must contain the client_id (string or array
  form), exp must be in the future, and nonce must echo the one we sent.
  The signature is intentionally not verified: the token arrives
  directly from the token endpoint over TLS, which OIDC Core 3.1.3.7
  permits for the authorization-code flow.
- Login-CSRF guard: start_flow_bound / generate_authorization_url_bound
  accept an unguessable per-browser-session binding value (e.g. a
  random cookie); the callback payload must present the identical
  binding or the flow is rejected before any token exchange. A failed
  attempt also burns the one-use state.
- Google configs in the examples/README now set the issuer.
- oauth2-demo-api adopts feature_gated: true, fixing its pre-existing
  package-scoped build break (cargo check -p oauth2-demo-server failed
  on master with ReqwestTransport missing from ras-transport-core).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Closes the audit MEDIUM findings on unbounded resources:

- OAuth2 InMemoryStateStore now prunes expired flows opportunistically
  on every store and caps pending flows (default 10k, configurable via
  with_capacity); the cap returns TooManyPendingFlows instead of growing
  without bound. cleanup_expired no longer needs external scheduling.
- SessionService gains active_session_count() and start_cleanup_task(),
  a background sweeper holding only a Weak reference (stops when the
  service is dropped), so expired sessions are pruned even through
  traffic lulls instead of only on begin/verify calls.
- WebSocket services accept an optional max_connections cap (builder
  field + WebSocketService::max_connections). Upgrades beyond the cap
  are refused with 503 before the socket upgrade; non-axum transports
  are refused with a close frame before the connection is registered.
  ConnectionManager gains active_connection_count() (cheap override in
  DefaultConnectionManager).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rustdoc runs with -D warnings in CI; the bare https URL tripped
rustdoc::bare-urls. Wrap it in a code span.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security hardening + correctness fixes: WebSocket authz, OAuth2 OIDC, file-service IDOR, macro feature-gating

1 participant