Add read-only introspection endpoints: /health, /info, /config#326
Open
shagghiesuperstar wants to merge 1 commit into
Open
Add read-only introspection endpoints: /health, /info, /config#326shagghiesuperstar wants to merge 1 commit into
shagghiesuperstar wants to merge 1 commit into
Conversation
The HTTP server previously exposed only the OpenAI/Anthropic-shaped
endpoints (/v1/models, /v1/chat/completions, /v1/messages,
/v1/responses, /v1/completions) with no way to inspect server state
without a model call. Operators and dashboards polling the server
had to fall back on /v1/models, which is the wrong shape for
healthchecks and overloads the model registry.
Three small read-only GET endpoints are added:
GET /health
status=ok, uptime_s, clients_in_flight, context_length,
default_max_tokens, model, kv_cache{enabled,dir,budget_bytes}.
Designed for load balancers and liveness/readiness probes.
GET /info
engine, model, context_length, default_max_tokens.
Stable identity snapshot suitable for dashboards and CLIs.
GET /config
context_length, default_max_tokens, enable_cors,
disable_exact_dsml_tool_replay, kv_disk_cache details.
Read-only reflection of server_config for operators.
Implementation notes:
* start_time (time_t) is added to struct server and set in main().
* All three handlers reuse existing fields: ds4_session_ctx,
ds4_engine_model_name, s->kv.{enabled,dir,budget_bytes,
reject_different_quant}, s->enable_cors, s->default_tokens.
* No new dependencies, no model access, no allocations past buf.
* Live-tested against ds4-server: returns valid JSON; existing
GET /v1/models and POST /v1/chat/completions unchanged.
* make test --server passes; no other tests affected.
This is motivated by an external dashboard that today speculatively
probes /telem and /metrics (both 404) before falling back. With
/health, that probe becomes deterministic and the dashboard can be
updated to consume the new endpoint.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Three small read-only HTTP endpoints on the ds4-server for operators, dashboards, and load balancers that need server state without paying for a model call:
Sample response (live ds4-server):
{"status":"ok","uptime_s":9,"clients_in_flight":1,"context_length":131072,"default_max_tokens":393216,"model":"DeepSeek V4 Flash","kv_cache":{"enabled":true,"dir":"/Volumes/OWC_MODELS_TB5/DS4/cache","budget_bytes":53687091200}}Why
The server previously exposed only the OpenAI/Anthropic endpoints and
/v1/models. There is no way for an operator to ask "is this server healthy?" or "what is the active context length?" without either issuing a chat call (expensive) or scraping/v1/models(wrong shape, counts as a model registry hit).Downstream consumers (a small open-source web dashboard I maintain, shagghiesuperstar/ds4-dashboard) currently fall back on speculative probes (
/telem,/metrics) that always 404./healthmakes that path deterministic.How
start_time(time_t) added tostruct server, set inmain()next to the other init lines.send_*helpers added right aftersend_models/send_model, mirroring their style:bufassembly,http_response, no allocations pastbuf, no model access.client_main, placed before the/v1/models/prefix branch so/health,/info,/configcannot be shadowed.struct server. No new headers, no new dependencies.Tested
make ds4-serverclean (no warnings under-Wall -Wextra -std=c99)../ds4_test --serverpasses (added two new unit tests covering dispatch disjointness and JSON shape).GET /health,GET /info,GET /configreturn valid JSON.GET /v1/modelsandPOST /v1/chat/completionsunchanged.Diff size
151 lines added (150 in
ds4_server.c, 1 in.gitignorefor thetest_q4k_dottest binary that was missing from the ignore list).Happy to split into separate PRs per endpoint, or rename them (
/v1/health, etc.) if you'd prefer them grouped under a versioned prefix.