feat: ROX-35431: add Go memory lifecycle panels to Central dashboard#344
Conversation
Add 6 new panels to the Go Metrics row in rhacs-central.json: - Allocation & Free Rate (Panel 200) - Live Heap Objects (Panel 201) - Heap Memory Breakdown - stacked area (Panel 202) - GC Pressure with NextGC threshold (Panel 203) - Go Heap vs Container Memory (Panel 204) - Scavenger Effectiveness (Panel 205) These panels visualize the full Go heap memory lifecycle from allocation through GC to OS return, mapping each stage to its Prometheus metrics. Enables engineers to diagnose memory leaks, GC pressure, and OOM proximity at per-pod granularity with 10s resolution. Ticket: ROX-35431
|
Warning Review limit reached
Next review available in: 43 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesDashboard panel updates
Estimated code review effort: 3 (Moderate) | ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
resources/grafana/sources/rhacs-central.json (1)
2251-2391: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winRegenerate
resources/grafana/generated/dashboards/rhacs-central.yaml
The committed dashboard artifact is missing the new Go metrics panels, so Grafana will keep using the stale dashboard until the generated file is rebuilt and committed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@resources/grafana/sources/rhacs-central.json` around lines 2251 - 2391, The committed Grafana artifact is stale and missing the new Go metrics panel defined in the rhacs-central dashboard JSON. Regenerate and commit the matching generated dashboard output from the same source panel changes so the generated artifact stays in sync with the dashboard definition; use the rhacs-central dashboard source and its generated dashboard file as the key symbols to locate the update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@resources/grafana/sources/rhacs-central.json`:
- Around line 2251-2391: The committed Grafana artifact is stale and missing the
new Go metrics panel defined in the rhacs-central dashboard JSON. Regenerate and
commit the matching generated dashboard output from the same source panel
changes so the generated artifact stays in sync with the dashboard definition;
use the rhacs-central dashboard source and its generated dashboard file as the
key symbols to locate the update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b9e7d8fc-5293-491a-ba60-3d7c20bb6ec6
📒 Files selected for processing (1)
resources/grafana/sources/rhacs-central.json
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
resources/grafana/sources/rhacs-central.json (1)
1610-1641: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winPreserve pod identity in the new Go panels.
These targets use fixed legends like
"mallocs/sec"and"live objects", so multi-pod Central series become indistinguishable in Grafana. That undercuts the PR goal of per-pod leak/GC/OOM diagnosis. Aggregate by the pod-identifying label and include it inlegendFormat.Example pattern
- "expr": "rate(go_memstats_mallocs_total{namespace=\"rhacs-$instance_id\",job=\"central\"}[1m])", + "expr": "sum by (pod) (rate(go_memstats_mallocs_total{namespace=\"rhacs-$instance_id\",job=\"central\"}[1m]))", ... - "legendFormat": "mallocs/sec", + "legendFormat": "{{pod}} mallocs/sec",- "expr": "go_memstats_heap_alloc_bytes{namespace=\"rhacs-$instance_id\",job=\"central\"} / go_memstats_heap_objects{namespace=\"rhacs-$instance_id\",job=\"central\"}", + "expr": "sum by (pod) (go_memstats_heap_alloc_bytes{namespace=\"rhacs-$instance_id\",job=\"central\"}) / sum by (pod) (go_memstats_heap_objects{namespace=\"rhacs-$instance_id\",job=\"central\"})", ... - "legendFormat": "avg object size", + "legendFormat": "{{pod}} avg object size",Also applies to: 1751-1768, 1907-1938, 2063-2094, 2187-2232, 2342-2373
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@resources/grafana/sources/rhacs-central.json` around lines 1610 - 1641, The new Go panels in the Grafana dashboard are using fixed legend strings, which collapses multiple Central pods into indistinguishable series. Update the affected query definitions in the dashboard JSON so the PromQL aggregates include the pod-identifying label and the `legendFormat` references that label, using the existing panel/query blocks around the Go memory/GC metrics to preserve per-pod identity across all listed sections.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@resources/grafana/sources/rhacs-central.json`:
- Around line 1610-1641: The new Go panels in the Grafana dashboard are using
fixed legend strings, which collapses multiple Central pods into
indistinguishable series. Update the affected query definitions in the dashboard
JSON so the PromQL aggregates include the pod-identifying label and the
`legendFormat` references that label, using the existing panel/query blocks
around the Go memory/GC metrics to preserve per-pod identity across all listed
sections.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 80502a01-e618-46a0-8506-64dc45c5a552
⛔ Files ignored due to path filters (1)
resources/grafana/generated/dashboards/rhacs-central.yamlis excluded by!**/generated/**
📒 Files selected for processing (1)
resources/grafana/sources/rhacs-central.json
Add 6 new panels to the Go Metrics row in rhacs-central.json:
These panels visualize the full Go heap memory lifecycle from allocation through GC to OS return, mapping each stage to its Prometheus metrics. Enables engineers to diagnose memory leaks, GC pressure, and OOM proximity at per-pod granularity with 10s resolution.
Ticket: ROX-35431
Created with Claude Code
Screenshot of it loaded into prod grafana on IBM instance:
