Skip to content

Add read-only introspection endpoints: /health, /info, /config#326

Open
shagghiesuperstar wants to merge 1 commit into
antirez:mainfrom
shagghiesuperstar:shag/feat-introspection-endpoints
Open

Add read-only introspection endpoints: /health, /info, /config#326
shagghiesuperstar wants to merge 1 commit into
antirez:mainfrom
shagghiesuperstar:shag/feat-introspection-endpoints

Conversation

@shagghiesuperstar
Copy link
Copy Markdown

What

Three small read-only HTTP endpoints on the ds4-server for operators, dashboards, and load balancers that need server state without paying for a model call:

GET /health   status, uptime, in-flight clients, context, model, kv cache
GET /info     engine identity snapshot
GET /config   read-only reflection of server_config

Sample response (live ds4-server):

{"status":"ok","uptime_s":9,"clients_in_flight":1,"context_length":131072,"default_max_tokens":393216,"model":"DeepSeek V4 Flash","kv_cache":{"enabled":true,"dir":"/Volumes/OWC_MODELS_TB5/DS4/cache","budget_bytes":53687091200}}

Why

The server previously exposed only the OpenAI/Anthropic endpoints and /v1/models. There is no way for an operator to ask "is this server healthy?" or "what is the active context length?" without either issuing a chat call (expensive) or scraping /v1/models (wrong shape, counts as a model registry hit).

Downstream consumers (a small open-source web dashboard I maintain, shagghiesuperstar/ds4-dashboard) currently fall back on speculative probes (/telem, /metrics) that always 404. /health makes that path deterministic.

How

  • start_time (time_t) added to struct server, set in main() next to the other init lines.
  • Three send_* helpers added right after send_models/send_model, mirroring their style: buf assembly, http_response, no allocations past buf, no model access.
  • Three new dispatch branches in client_main, placed before the /v1/models/ prefix branch so /health, /info, /config cannot be shadowed.
  • All fields used are already exposed through public ds4 engine API or directly on struct server. No new headers, no new dependencies.

Tested

  • make ds4-server clean (no warnings under -Wall -Wextra -std=c99).
  • ./ds4_test --server passes (added two new unit tests covering dispatch disjointness and JSON shape).
  • Live-tested against a running ds4-server:
    • GET /health, GET /info, GET /config return valid JSON.
    • GET /v1/models and POST /v1/chat/completions unchanged.
    • Unknown paths still return 404.

Diff size

151 lines added (150 in ds4_server.c, 1 in .gitignore for the test_q4k_dot test binary that was missing from the ignore list).

Happy to split into separate PRs per endpoint, or rename them (/v1/health, etc.) if you'd prefer them grouped under a versioned prefix.

The HTTP server previously exposed only the OpenAI/Anthropic-shaped
endpoints (/v1/models, /v1/chat/completions, /v1/messages,
/v1/responses, /v1/completions) with no way to inspect server state
without a model call. Operators and dashboards polling the server
had to fall back on /v1/models, which is the wrong shape for
healthchecks and overloads the model registry.

Three small read-only GET endpoints are added:

  GET /health
      status=ok, uptime_s, clients_in_flight, context_length,
      default_max_tokens, model, kv_cache{enabled,dir,budget_bytes}.
      Designed for load balancers and liveness/readiness probes.

  GET /info
      engine, model, context_length, default_max_tokens.
      Stable identity snapshot suitable for dashboards and CLIs.

  GET /config
      context_length, default_max_tokens, enable_cors,
      disable_exact_dsml_tool_replay, kv_disk_cache details.
      Read-only reflection of server_config for operators.

Implementation notes:
  * start_time (time_t) is added to struct server and set in main().
  * All three handlers reuse existing fields: ds4_session_ctx,
    ds4_engine_model_name, s->kv.{enabled,dir,budget_bytes,
    reject_different_quant}, s->enable_cors, s->default_tokens.
  * No new dependencies, no model access, no allocations past buf.
  * Live-tested against ds4-server: returns valid JSON; existing
    GET /v1/models and POST /v1/chat/completions unchanged.
  * make test --server passes; no other tests affected.

This is motivated by an external dashboard that today speculatively
probes /telem and /metrics (both 404) before falling back. With
/health, that probe becomes deterministic and the dashboard can be
updated to consume the new endpoint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant