DOC-2197: Add Redpanda SQL metrics reference section#1721
Conversation
Appends a cloud-only section (ifdef::env-cloud[]) documenting 67 Oxla Prometheus metrics across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a new "Redpanda SQL metrics" documentation section to the public metrics reference guide. The section is conditioned on the cloud environment and introduces oxla_-prefixed metric entries covering SQL engine monitoring areas including admissions, object storage requests, catalog transactions, cluster and node state, data-task execution, executor/scheduler/thread-pool operations, query execution statistics, PostgreSQL wire-protocol connections, and S3 lifecycle events. The documentation is inserted before the existing "Related topics" section. Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modules/reference/pages/public-metrics-reference.adoc (1)
3541-4142: ⚡ Quick winConsider adding "Available in Serverless" indicators for consistency.
The SQL metrics section is cloud-only (wrapped in
ifdef::env-cloud[]), but unlike other cloud metrics in this file (e.g., Serverless metrics at lines 1434-1502), individual metrics don't include*Available in Serverless*: Yes/Notags. Since the section intro (line 3539) specifies "BYOC clusters," these metrics appear to be unavailable in Serverless.For consistency with the established pattern throughout this file, consider adding the following after each metric's
*Type*:line (and before any*Labels*:or---separator):*Available in Serverless*: NoThis would match the documentation style used for other cloud metrics and improve clarity for readers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modules/reference/pages/public-metrics-reference.adoc` around lines 3541 - 4142, This SQL metrics block (metrics named like oxla_admission_active_queries through oxla_writers_opened_total) is cloud-only but lacks the "Available in Serverless" tag; add a line "*Available in Serverless*: No" immediately after each metric's "*Type*:" line (and before any "*Labels*:" or the "---" separator) for every metric in this section so it matches the established pattern used elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modules/reference/pages/public-metrics-reference.adoc`:
- Around line 3541-4142: This SQL metrics block (metrics named like
oxla_admission_active_queries through oxla_writers_opened_total) is cloud-only
but lacks the "Available in Serverless" tag; add a line "*Available in
Serverless*: No" immediately after each metric's "*Type*:" line (and before any
"*Labels*:" or the "---" separator) for every metric in this section so it
matches the established pattern used elsewhere.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 62a4e0e2-a179-4713-8175-457794e88858
📒 Files selected for processing (1)
modules/reference/pages/public-metrics-reference.adoc
Co-authored-by: Grzegorz Dudek <38960244+Greketrotny@users.noreply.github.com>
Feediver1
left a comment
There was a problem hiding this comment.
Docs standards review
Files reviewed: 1 .adoc file (modules/reference/pages/public-metrics-reference.adoc)
Net diff: 612+/1- (611-line addition + 1-line clarity edit applied 2026-06-03)
Overall assessment: Solid reference-doc addition that fills a real BYOC monitoring gap. Single substantive finding (file-wide consistency with *Available in Serverless*: pattern) plus minor PR-description hygiene.
What this PR does
Appends a new H2 section == Redpanda SQL metrics (anchor [[redpanda-sql-metrics]]) to the end of the cloud-rendered portion of public-metrics-reference.adoc, wrapped in a single ifdef::env-cloud[] block. The section documents 67 Prometheus metrics emitted by the SQL engine (all oxla_* prefix) across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems. Audience is BYOC operators who already have Redpanda SQL enabled and want to monitor it from Prometheus/Grafana.
Jira ticket alignment
Ticket: DOC-2197 (per branch name DOC-2197-sql-monitoring; the PR body's "Resolves" line still literally reads <jira-ticket>).
Status: Addressed. The 67 metrics cover the subsystems a customer would reasonably want visibility into. Cross-link from oxla_node_is_degraded_bool to the SQL troubleshooting guide closes one loop nicely.
Critical issues
None.
The cross-component xref xref:sql:troubleshoot/degraded-state-handling.adoc[] looked suspicious at first because sql isn't a module in the streaming component, but verified:
- The section is wrapped in
ifdef::env-cloud[], so it only renders in cloud-docs. - cloud-docs single-sources this page via
include::streaming:reference:public-metrics-reference.adoc[tag=single-source], and the new section falls inside thetag::single-sourcerange (lines 3–4154). - In cloud-docs context, the
sql:module exists (modules/sql/pages/troubleshoot/degraded-state-handling.adocconfirmed present). - So the xref resolves correctly in the only build that renders it. ✓
Suggestions
-
CodeRabbit's "Available in Serverless" finding has merit and isn't pedantic — this file uses
*Available in Serverless*: Yes/Noon 221 existing metrics. The 67 new ones omit it. A Serverless reader scanning the file expects every metric to declare its availability; absence reads ambiguously rather than "not available." The intro paragraph says "BYOC clusters where Redpanda SQL is enabled," which implies "not Serverless," but readers commonly scan per-metric and skip section intros. Two acceptable resolutions:- (a) Per-metric: Add
*Available in Serverless*: Noafter each*Type*:line. Matches the file's established pattern exactly. ~67 line additions. - (b) Section-level: Add an
IMPORTANTorNOTEblock right under the section heading saying "None of these metrics are available on Serverless clusters." Less churn, but breaks the per-metric scan pattern.
Either is defensible; (a) is the safer call because future maintainers won't have to remember the section-level exception.
- (a) Per-metric: Add
-
PR description placeholder
<jira-ticket>. Branch isDOC-2197-sql-monitoring; just replace<jira-ticket>withDOC-2197. -
Empty "Page previews" section. The actual deploy preview will be
https://deploy-preview-1721--redpanda-docs-preview.netlify.app/redpanda-cloud/reference/public-metrics-reference/#redpanda-sql-metrics— worth dropping in so reviewers can click straight to the new content. -
None of the "Checks" boxes ticked. This is documentation of a recently-GA feature, which lands closest to "New feature" or "Content gap" — pick one.
-
Repeated
Use for diagnostic purposes.phrasing appears on four metrics (oxla_jemalloc_mallctl_stats,oxla_mallinfo,oxla_net_callback_handling_time_us,oxla_receipts_received_total). Optional refactor: hoist a single-paragraph NOTE at the top of the section listing the diagnostic-only metrics, so readers immediately know which ones they shouldn't build dashboards on. Current per-metric approach is fine if you'd rather leave it. -
Section intro could mention subsystem grouping. The commit message says "across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems" — that one sentence would help readers navigate 67 alphabetical entries. Currently the intro is just two sentences about BYOC + the
oxla_prefix. Adding "Metrics are grouped by subsystem prefix: admission, catalog, …" would orient a first-time reader.
Impact on other files
- No
nav.adocchange needed. The metrics reference page is already in nav atmodules/ROOT/nav.adoc:935; only adding a section. - No
whats-new-cloud.adocentry strictly required — Redpanda SQL itself is already announced (cloud-docs whats-new-cloud.adoc:17). A small "you can now monitor Redpanda SQL via Prometheus" line is optional; not blocking. - Single-source path verified. The new content lands inside the
tag::single-source[]region (lines 3–4154), so cloud-docs'sinclude::streaming:reference:public-metrics-reference.adoc[tag=single-source]will pick it up. No cloud-docs companion PR needed. monitor-cloud.adocin cloud-docs could optionally add a sentence pointing BYOC SQL users to the new section, but the existing umbrella xref to the metrics reference covers it.- No xref breakage. Only additive; no headings renamed or moved.
CodeRabbit findings worth considering
- The "Available in Serverless" consistency finding. See Suggestion 1.
What works well
- Real customer need addressed — 67 metrics with concrete types, labels, and label-value enumerations means a customer can build a Grafana dashboard from this page alone.
- Clean single-
ifdefwrap rather than per-metric ifdef gating. Less visual noise, easier to maintain. The single-source tag range correctly contains it. - Stable anchor
[[redpanda-sql-metrics]]— survives heading rewrites. - Sentence-case H2 with proper-noun "Redpanda SQL" preserved, consistent with sibling H2s ("Redpanda Connect metrics", "Iceberg metrics", "TLS metrics").
- Branding stays "Redpanda SQL" in prose, while metric names retain the
oxla_*prefix the engine actually emits — readers get the product name they recognize AND the metric strings they can copy/paste into PromQL. Right balance given the underlying engine origin. oxla_memory_usage_bytesclarity edit materially improves accuracy — readers now know it's tracked-memory, not process RSS, with an explicit pointer tooxla_process_memory_totalfor the full picture.- Cross-link from
oxla_node_is_degraded_boolto the troubleshooting guide is exactly the right xref to add — degraded state is the kind of thing customers hit and then need to interpret. Resolves correctly in cloud-docs. - Labels are enumerated where they're constrained (
typeforoxla_aws_requests,actionforoxla_catalog_transactions_total,error_typeforoxla_query_errors_total, etc.) — saves customers from guessing.
Process note
PR has been open since 2026-05-29 (~5 days). Only one human inline comment so far (already applied). If you want SME review beyond CodeRabbit, the Redpanda SQL eng team is the natural ping — they can confirm the 67 metrics enumerated actually match what the engine emits in current cloud builds.
Appends a cloud-only section (ifdef::env-cloud[]) documenting 67 Oxla Prometheus metrics across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems.
Description
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Checks