Skip to content

Feature: clickhouse-client --login auto-discovery of OAuth parameters over the native port #1983

Description

@BorisTyshkevich

Summary

Let clickhouse-client --login auto-discover the OAuth 2.0 server parameters (issuer/oauth-url, client_id, audience, supported flows) instead of requiring the user to pass --oauth-url/--oauth-client-id/--oauth-audience or to ship a credentials JSON file.

The catch is that clickhouse-client is configured with only the native port (9000/9440) and speaks the binary protocol there, while OAuth discovery is an HTTP concept whose natural home is the HTTP port (8123/8443) — a different port the client was never told about, and one that is often not exposed at all in hardened deployments. So the design has to bridge "client only knows the native port" to "discovery is HTTP-shaped".

The good news: the native TCP port already answers HTTP requests with a fallback message (the "Port 9000 is for clickhouse-client" text), and the server already holds every value the client needs in <token_processors>. This proposal wires those together.

Motivation

Today, to log in with OAuth a user must already know — and pass on the command line — the very things the server could tell them:

clickhouse-client --host ch.example.com --secure \
  --login=browser \
  --oauth-url=https://issuer.example.com \
  --oauth-client-id=abcd...apps \
  --oauth-audience=https://ch.example.com

These values are not secrets and the server already knows them (they live in <token_processors> for token verification). Forcing every user to copy them into flags (or distribute an oauth_client.json) is friction and a source of drift: the client-side values and the server-side expected_issuer/introspection_client_id/expected_audience can disagree. The goal is:

clickhouse-client --host ch.example.com --secure --login

…and the client discovers the rest from the server it is already connecting to.

The core constraint

  • clickhouse-client knows only the native port and speaks the binary protocol there.
  • Discovery is HTTP. The HTTP port (8123/8443) is a different port the client was not given, and is frequently firewalled off — many secure deployments expose only 9440.

⇒ A design that says "just fetch http://host:8123/.well-known/…" has two holes: the client doesn't know the HTTP port, and it may be unreachable. This pushes the discovery channel onto the native port the client already has.

What already exists (so this is small)

Building block Where Note
HTTP-on-native-port fallback src/Server/TCPHandler.cpp:1829 formatHTTPErrorResponseWhenUserIsConnectedToWrongPort, fired in receiveHello (src/Server/TCPHandler.cpp:1886) when the first byte is 'G'/'P' Takes config + is_secure; already reads tcp_port/http_port and tells the user the HTTP port. Can also see <token_processors>. Works on 9440 (TLS terminates, bytes reach receiveHello).
Client OAuth flags + flows programs/client/Client.cpp:760 (--login, --oauth-url, --oauth-client-id, --oauth-audience, --oauth-credentials); flows in src/Client/OAuthFlowRunner.cpp (device + auth-code/PKCE) Discovery only needs to populate these three values before the flow starts. Endpoints (auth_uri/token_uri/device_auth_uri) come from the IdP's own OIDC discovery against issuer.
Server OAuth config (source of truth) <token_processors>, src/Access/TokenProcessorsParse.cpp Holds expected_issuer, introspection_client_id, expected_audience, discovery endpoint. No /.well-known is served today.
Public-config GET endpoint (sibling) Companion proposal in #1930 Emits the public subset of token_processors for a browser SPA — essentially the same discovery document the CLI needs. Same registry, same security model.

Proposed design

One discovery document (public subset of a token_processors entry — never secrets)

{
  "issuer": "https://issuer.example.com",
  "client_id": "abcd...apps",
  "audience": "https://ch.example.com",
  "scopes": ["openid", "profile"],
  "flows": ["browser", "device"]
}

The client then runs standard OIDC discovery against issuer (/.well-known/openid-configuration) to resolve authorization_endpoint/token_endpoint/device_authorization_endpoint. This keeps the ClickHouse document minimal and reuses the IdP's own well-known, mapping cleanly onto the existing client flags (--oauth-urlissuer, --oauth-client-idclient_id, --oauth-audienceaudience).

Transport: serve it on the native port (primary), HTTP port (optional)

Primary — native-port HTTP fallback (works everywhere the client can already reach). Extend receiveHello's HTTP path so that, when at least one token_processors entry is advertised for login, a request for a well-known path returns 200 OK + the JSON above instead of the fixed 400. The client, given --login with no explicit oauth flags, opens a socket to the same native port it is already configured for and sends a one-line HTTP GET:

GET /.well-known/clickhouse-oauth HTTP/1.0\r\n\r\n
  • Reuses the existing hook; the function already has config and is_secure.
  • Survives 9440-only deployments (TLS terminates first, then the HTTP bytes reach receiveHello — the same path that produces today's message over https://…:9440).
  • Keep today's human-readable 400 text for any non-discovery path so a mistaken curl still gets the helpful message.

Optional — HTTP-port handler. The same generator can also answer on the HTTP port as a normal handler (this is the GET companion in #1930) for SPAs and standards alignment. One source of truth, two transports.

Client side

When --login is given and --oauth-url/--oauth-client-id are absent, probe the native port for the discovery document before starting the flow. Explicit flags and --oauth-credentials always override discovery. If discovery fails or OAuth isn't advertised, fall back to today's behavior (Cloud auto-login path / require flags) with a clear message.

Implementation steps

  1. Discovery generator: a small function that maps advertised token_processors entries to the public JSON subset (whitelist below). Reused by both transports.
  2. Native-port transport: in TCPHandler.cpp, when the first byte is 'G'/'P', read the request line; if the path matches the well-known path and OAuth is advertised, write HTTP/1.0 200 OK\r\nContent-Type: application/json\r\n\r\n + the document; otherwise emit today's 400 text unchanged.
  3. HTTP-port transport (optional): register the same generator as an <http_handlers> handler / well-known route (converges with Feature: Server-side OAuth2 code exchange to support secure client login flow #1930).
  4. Client discovery: in the --login path (programs/client/Client.cpp / src/Client/OAuth*), if oauth flags are unset, fetch the document from the native port, run IdP OIDC discovery against issuer, then proceed with the existing browser/device flow.
  5. Docs: document --login auto-discovery and the per-processor opt-in.

Security considerations

  • The document is served pre-auth and is public by nature (issuer/client_id/audience already appear in every authorize URL).
  • Strict field whitelist, never a dump. Emit only issuer/client_id/audience/scopes/flows. It must never emit introspection_client_secret, static_key, private_key, or any JWKS private material. The whitelist is the security boundary.
  • Per-processor opt-in (e.g. <advertise_for_login>true</advertise_for_login>) so only intended IdPs are advertised over this pre-auth channel.
  • Do not alter the existing human-readable 400 for ordinary mistaken connections — only add a 200 branch for the explicit well-known path.

Alternatives considered

  • Extend the binary server Hello with a protocol-versioned OAuth-metadata field (client does a pre-auth probe handshake). The most principled, fully port-agnostic, machine-readable option — but it needs a protocol revision + negotiation and more code on both sides. Good long-term direction; out of scope for v1.
  • Two-hop via the HTTP port (parse the fallback message to learn http_port, then fetch /.well-known/clickhouse-oauth there). Standards-shaped and independently useful, but breaks when the HTTP port isn't exposed (common), relies on parsing a human-readable message, and adds a round trip. The optional HTTP-port handler above covers the SPA case without making the CLI depend on it.
  • Status quo — require --oauth-url/--oauth-client-id/--oauth-audience or --oauth-credentials, or the hardcoded Cloud auto-login path. No discovery.

Relationship to #1930

#1930 proposes server-side OAuth handlers for a browser SPA: a POST /oauth/token code-exchange (secret stays server-side) and a GET public-config endpoint sourced from token_processors. This issue is the CLI counterpart: the same public-config document, but reachable over the native port so clickhouse-client --login can discover it without knowing or reaching the HTTP port. Shared registry (<token_processors>), shared whitelist/security model — likely the same generator behind both transports.


Drafted with Claude Code against the Antalya tree; file/line references from a read-through of src/Server/TCPHandler.cpp, programs/client/Client.cpp, src/Client/OAuthFlowRunner.cpp, and src/Access/TokenProcessorsParse.cpp.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions