Skip to content

DOC-2197: Add Redpanda SQL metrics reference section#1721

Open
kbatuigas wants to merge 2 commits into
mainfrom
DOC-2197-sql-monitoring
Open

DOC-2197: Add Redpanda SQL metrics reference section#1721
kbatuigas wants to merge 2 commits into
mainfrom
DOC-2197-sql-monitoring

Conversation

@kbatuigas
Copy link
Copy Markdown
Contributor

Appends a cloud-only section (ifdef::env-cloud[]) documenting 67 Oxla Prometheus metrics across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems.

Description

Resolves https://redpandadata.atlassian.net/browse/
Review deadline:

Page previews

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

Appends a cloud-only section (ifdef::env-cloud[]) documenting 67 Oxla
Prometheus metrics across admission, catalog, cluster, executor,
kafka, memory, network, query, scheduler, and storage subsystems.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d9ce0f7-6997-4d07-aead-e2b6bebebd62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a new "Redpanda SQL metrics" documentation section to the public metrics reference guide. The section is conditioned on the cloud environment and introduces oxla_-prefixed metric entries covering SQL engine monitoring areas including admissions, object storage requests, catalog transactions, cluster and node state, data-task execution, executor/scheduler/thread-pool operations, query execution statistics, PostgreSQL wire-protocol connections, and S3 lifecycle events. The documentation is inserted before the existing "Related topics" section.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • redpanda-data/docs#1509: Both PRs expand modules/reference/pages/public-metrics-reference.adoc with new metric documentation entries under the same env-cloud cloud metrics sections.

Suggested reviewers

  • micheleRP
  • nicolaferraro
  • r-vasquez
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is largely incomplete. While it mentions the change details, it has unfilled template placeholders and unchecked checklist items. Fill in the Jira ticket number in the Resolves link, add a review deadline if needed, provide actual page preview links, and check the appropriate checkbox(s) for the type of change.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a Redpanda SQL metrics reference section to documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DOC-2197-sql-monitoring

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 29, 2026

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit e9d8073
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/6a206beb4b7e680008a1585e
😎 Deploy Preview https://deploy-preview-1721--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@kbatuigas kbatuigas marked this pull request as ready for review May 29, 2026 15:39
@kbatuigas kbatuigas requested a review from a team as a code owner May 29, 2026 15:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modules/reference/pages/public-metrics-reference.adoc (1)

3541-4142: ⚡ Quick win

Consider adding "Available in Serverless" indicators for consistency.

The SQL metrics section is cloud-only (wrapped in ifdef::env-cloud[]), but unlike other cloud metrics in this file (e.g., Serverless metrics at lines 1434-1502), individual metrics don't include *Available in Serverless*: Yes/No tags. Since the section intro (line 3539) specifies "BYOC clusters," these metrics appear to be unavailable in Serverless.

For consistency with the established pattern throughout this file, consider adding the following after each metric's *Type*: line (and before any *Labels*: or --- separator):

*Available in Serverless*: No

This would match the documentation style used for other cloud metrics and improve clarity for readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/reference/pages/public-metrics-reference.adoc` around lines 3541 -
4142, This SQL metrics block (metrics named like oxla_admission_active_queries
through oxla_writers_opened_total) is cloud-only but lacks the "Available in
Serverless" tag; add a line "*Available in Serverless*: No" immediately after
each metric's "*Type*:" line (and before any "*Labels*:" or the "---" separator)
for every metric in this section so it matches the established pattern used
elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modules/reference/pages/public-metrics-reference.adoc`:
- Around line 3541-4142: This SQL metrics block (metrics named like
oxla_admission_active_queries through oxla_writers_opened_total) is cloud-only
but lacks the "Available in Serverless" tag; add a line "*Available in
Serverless*: No" immediately after each metric's "*Type*:" line (and before any
"*Labels*:" or the "---" separator) for every metric in this section so it
matches the established pattern used elsewhere.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62a4e0e2-a179-4713-8175-457794e88858

📥 Commits

Reviewing files that changed from the base of the PR and between 5d49714 and 87aa10d.

📒 Files selected for processing (1)
  • modules/reference/pages/public-metrics-reference.adoc

Comment thread modules/reference/pages/public-metrics-reference.adoc Outdated
Co-authored-by: Grzegorz Dudek <38960244+Greketrotny@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs standards review

Files reviewed: 1 .adoc file (modules/reference/pages/public-metrics-reference.adoc)
Net diff: 612+/1- (611-line addition + 1-line clarity edit applied 2026-06-03)
Overall assessment: Solid reference-doc addition that fills a real BYOC monitoring gap. Single substantive finding (file-wide consistency with *Available in Serverless*: pattern) plus minor PR-description hygiene.

What this PR does

Appends a new H2 section == Redpanda SQL metrics (anchor [[redpanda-sql-metrics]]) to the end of the cloud-rendered portion of public-metrics-reference.adoc, wrapped in a single ifdef::env-cloud[] block. The section documents 67 Prometheus metrics emitted by the SQL engine (all oxla_* prefix) across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems. Audience is BYOC operators who already have Redpanda SQL enabled and want to monitor it from Prometheus/Grafana.

Jira ticket alignment

Ticket: DOC-2197 (per branch name DOC-2197-sql-monitoring; the PR body's "Resolves" line still literally reads <jira-ticket>).
Status: Addressed. The 67 metrics cover the subsystems a customer would reasonably want visibility into. Cross-link from oxla_node_is_degraded_bool to the SQL troubleshooting guide closes one loop nicely.

Critical issues

None.

The cross-component xref xref:sql:troubleshoot/degraded-state-handling.adoc[] looked suspicious at first because sql isn't a module in the streaming component, but verified:

  • The section is wrapped in ifdef::env-cloud[], so it only renders in cloud-docs.
  • cloud-docs single-sources this page via include::streaming:reference:public-metrics-reference.adoc[tag=single-source], and the new section falls inside the tag::single-source range (lines 3–4154).
  • In cloud-docs context, the sql: module exists (modules/sql/pages/troubleshoot/degraded-state-handling.adoc confirmed present).
  • So the xref resolves correctly in the only build that renders it. ✓

Suggestions

  1. CodeRabbit's "Available in Serverless" finding has merit and isn't pedantic — this file uses *Available in Serverless*: Yes/No on 221 existing metrics. The 67 new ones omit it. A Serverless reader scanning the file expects every metric to declare its availability; absence reads ambiguously rather than "not available." The intro paragraph says "BYOC clusters where Redpanda SQL is enabled," which implies "not Serverless," but readers commonly scan per-metric and skip section intros. Two acceptable resolutions:

    • (a) Per-metric: Add *Available in Serverless*: No after each *Type*: line. Matches the file's established pattern exactly. ~67 line additions.
    • (b) Section-level: Add an IMPORTANT or NOTE block right under the section heading saying "None of these metrics are available on Serverless clusters." Less churn, but breaks the per-metric scan pattern.

    Either is defensible; (a) is the safer call because future maintainers won't have to remember the section-level exception.

  2. PR description placeholder <jira-ticket>. Branch is DOC-2197-sql-monitoring; just replace <jira-ticket> with DOC-2197.

  3. Empty "Page previews" section. The actual deploy preview will be https://deploy-preview-1721--redpanda-docs-preview.netlify.app/redpanda-cloud/reference/public-metrics-reference/#redpanda-sql-metrics — worth dropping in so reviewers can click straight to the new content.

  4. None of the "Checks" boxes ticked. This is documentation of a recently-GA feature, which lands closest to "New feature" or "Content gap" — pick one.

  5. Repeated Use for diagnostic purposes. phrasing appears on four metrics (oxla_jemalloc_mallctl_stats, oxla_mallinfo, oxla_net_callback_handling_time_us, oxla_receipts_received_total). Optional refactor: hoist a single-paragraph NOTE at the top of the section listing the diagnostic-only metrics, so readers immediately know which ones they shouldn't build dashboards on. Current per-metric approach is fine if you'd rather leave it.

  6. Section intro could mention subsystem grouping. The commit message says "across admission, catalog, cluster, executor, kafka, memory, network, query, scheduler, and storage subsystems" — that one sentence would help readers navigate 67 alphabetical entries. Currently the intro is just two sentences about BYOC + the oxla_ prefix. Adding "Metrics are grouped by subsystem prefix: admission, catalog, …" would orient a first-time reader.

Impact on other files

  • No nav.adoc change needed. The metrics reference page is already in nav at modules/ROOT/nav.adoc:935; only adding a section.
  • No whats-new-cloud.adoc entry strictly required — Redpanda SQL itself is already announced (cloud-docs whats-new-cloud.adoc:17). A small "you can now monitor Redpanda SQL via Prometheus" line is optional; not blocking.
  • Single-source path verified. The new content lands inside the tag::single-source[] region (lines 3–4154), so cloud-docs's include::streaming:reference:public-metrics-reference.adoc[tag=single-source] will pick it up. No cloud-docs companion PR needed.
  • monitor-cloud.adoc in cloud-docs could optionally add a sentence pointing BYOC SQL users to the new section, but the existing umbrella xref to the metrics reference covers it.
  • No xref breakage. Only additive; no headings renamed or moved.

CodeRabbit findings worth considering

  1. The "Available in Serverless" consistency finding. See Suggestion 1.

What works well

  • Real customer need addressed — 67 metrics with concrete types, labels, and label-value enumerations means a customer can build a Grafana dashboard from this page alone.
  • Clean single-ifdef wrap rather than per-metric ifdef gating. Less visual noise, easier to maintain. The single-source tag range correctly contains it.
  • Stable anchor [[redpanda-sql-metrics]] — survives heading rewrites.
  • Sentence-case H2 with proper-noun "Redpanda SQL" preserved, consistent with sibling H2s ("Redpanda Connect metrics", "Iceberg metrics", "TLS metrics").
  • Branding stays "Redpanda SQL" in prose, while metric names retain the oxla_* prefix the engine actually emits — readers get the product name they recognize AND the metric strings they can copy/paste into PromQL. Right balance given the underlying engine origin.
  • oxla_memory_usage_bytes clarity edit materially improves accuracy — readers now know it's tracked-memory, not process RSS, with an explicit pointer to oxla_process_memory_total for the full picture.
  • Cross-link from oxla_node_is_degraded_bool to the troubleshooting guide is exactly the right xref to add — degraded state is the kind of thing customers hit and then need to interpret. Resolves correctly in cloud-docs.
  • Labels are enumerated where they're constrained (type for oxla_aws_requests, action for oxla_catalog_transactions_total, error_type for oxla_query_errors_total, etc.) — saves customers from guessing.

Process note

PR has been open since 2026-05-29 (~5 days). Only one human inline comment so far (already applied). If you want SME review beyond CodeRabbit, the Redpanda SQL eng team is the natural ping — they can confirm the 67 metrics enumerated actually match what the engine emits in current cloud builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants