Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823
Open
arunkumarucet wants to merge 1 commit into
Open
Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823arunkumarucet wants to merge 1 commit into
arunkumarucet wants to merge 1 commit into
Conversation
…ing via JMX Emit a per-table `tableTenantInfo` gauge from `SegmentStatusChecker` with the server tenant name embedded as an extra key segment in the metric name: pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1 This lets Prometheus scrape the metric via the JMX exporter and use a `group_left(tenant)` join to attach the tenant label to any existing table-scoped metric without modifying the core metrics pipeline. Implementation details: - The gauge is registered only on first encounter or when the tenant changes, avoiding redundant writes on every 5-minute SegmentStatusChecker cycle. - Stale gauges are cleaned up on tenant change, null config, and table removal, tracked via an internal `_tableTenantMap`. - A dedicated JMX exporter rule in `controller.yml` extracts `table`, `tableType`, `tenant`, and `database` labels. The rule is placed before the generic tableNameWithType rules to ensure the tenant segment is captured.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18823 +/- ##
============================================
+ Coverage 64.76% 64.77% +0.01%
Complexity 1319 1319
============================================
Files 3392 3392
Lines 210949 210968 +19
Branches 33119 33124 +5
============================================
+ Hits 136611 136653 +42
+ Misses 63323 63297 -26
- Partials 11015 11018 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TABLE_TENANT_INFOcontroller gauge emitted bySegmentStatusCheckerthat encodes the server tenant name as a key segment in the JMX metric name:pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1controller.ymlthat extractstable,tableType,tenant, anddatabaseas Prometheus labels from this metricgroup_left(tenant)join — no changes to broker/server metric pipelines requiredMotivation
Previously there was no way to aggregate table-scoped metrics (e.g.
numDocsScanned, segment counts) by tenant in Prometheus/Grafana without scattered, disruptive changes to add atenanttag throughout the metrics pipeline. This approach exposes the table→tenant mapping as a standalone info metric that Prometheus can join against.Aggregate across all tenants:
Filter to a specific tenant (e.g.
DefaultTenant):The
tenantlabel can be used in any label matcher (=,!=,=~,!~) wherever PromQL label selectors are supported — in dashboards, alerts, and recording rules.Implementation
Emission strategy:
(table, tenant)pair — on first registration or when the tenant changes. It is not re-emitted on every 5-minuteSegmentStatusCheckercycle (early-return when tenant is unchanged)._tableTenantMaptracks the current tenant per table so stale gauges are removed on: tenant change, null table config, and table removal (nonLeaderCleanup).JMX metric name:
Prometheus output (via JMX exporter):
Test plan
SegmentStatusCheckerTest#tableTenantInfoGaugeNamedTenantTest— named server tenant is registeredSegmentStatusCheckerTest#tableTenantInfoGaugeDefaultTenantFallbackTest— falls back toDefaultTenantwhen no tenant configuredSegmentStatusCheckerTest#tableTenantInfoGaugeTenantChangeCleansStaleGaugeTest— stale gauge removed when tenant changesSegmentStatusCheckerTest#tableTenantInfoGaugeTableRemovedCleansUpTest— gauge cleaned up vianonLeaderCleanupSegmentStatusCheckerTest#tableTenantInfoGaugeRealtimeTableTest— REALTIME table type covered