Skip to content

Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823

Open
arunkumarucet wants to merge 1 commit into
apache:masterfrom
arunkumarucet:feature/table-tenant-info-metric
Open

Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823
arunkumarucet wants to merge 1 commit into
apache:masterfrom
arunkumarucet:feature/table-tenant-info-metric

Conversation

@arunkumarucet

@arunkumarucet arunkumarucet commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new TABLE_TENANT_INFO controller gauge emitted by SegmentStatusChecker that encodes the server tenant name as a key segment in the JMX metric name: pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1
  • Adds a dedicated JMX exporter rule in controller.yml that extracts table, tableType, tenant, and database as Prometheus labels from this metric
  • Enables tenant-scoped aggregation of any existing table-level metric in Prometheus via a group_left(tenant) join — no changes to broker/server metric pipelines required

Motivation

Previously there was no way to aggregate table-scoped metrics (e.g. numDocsScanned, segment counts) by tenant in Prometheus/Grafana without scattered, disruptive changes to add a tenant tag throughout the metrics pipeline. This approach exposes the table→tenant mapping as a standalone info metric that Prometheus can join against.

Aggregate across all tenants:

sum by (tenant) (
  sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
  * on(table) group_left(tenant)
  pinot_controller_tableTenantInfo
)

Filter to a specific tenant (e.g. DefaultTenant):

sum by (tenant) (
  sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
  * on(table) group_left(tenant)
  pinot_controller_tableTenantInfo{tenant="DefaultTenant"}
)

The tenant label can be used in any label matcher (=, !=, =~, !~) wherever PromQL label selectors are supported — in dashboards, alerts, and recording rules.

Implementation

Emission strategy:

  • The gauge is written only once per (table, tenant) pair — on first registration or when the tenant changes. It is not re-emitted on every 5-minute SegmentStatusChecker cycle (early-return when tenant is unchanged).
  • _tableTenantMap tracks the current tenant per table so stale gauges are removed on: tenant change, null table config, and table removal (nonLeaderCleanup).
  • The new gauge is registered before removing the old tenant's gauge on a tenant change, to avoid a scrape-window gap.

JMX metric name:

"org.apache.pinot.common.metrics":type="ControllerMetrics",
  name="pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant>"

Prometheus output (via JMX exporter):

pinot_controller_tableTenantInfo_Value{table="airlineStats", tableType="OFFLINE", tenant="DefaultTenant"} 1

Test plan

  • SegmentStatusCheckerTest#tableTenantInfoGaugeNamedTenantTest — named server tenant is registered
  • SegmentStatusCheckerTest#tableTenantInfoGaugeDefaultTenantFallbackTest — falls back to DefaultTenant when no tenant configured
  • SegmentStatusCheckerTest#tableTenantInfoGaugeTenantChangeCleansStaleGaugeTest — stale gauge removed when tenant changes
  • SegmentStatusCheckerTest#tableTenantInfoGaugeTableRemovedCleansUpTest — gauge cleaned up via nonLeaderCleanup
  • SegmentStatusCheckerTest#tableTenantInfoGaugeRealtimeTableTest — REALTIME table type covered
  • Verified locally via batch quickstart: 10 MBeans registered, all value=1, JMX exporter regex validated against no-database, with-database, and REALTIME patterns

…ing via JMX

Emit a per-table `tableTenantInfo` gauge from `SegmentStatusChecker` with the
server tenant name embedded as an extra key segment in the metric name:

  pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1

This lets Prometheus scrape the metric via the JMX exporter and use a
`group_left(tenant)` join to attach the tenant label to any existing
table-scoped metric without modifying the core metrics pipeline.

Implementation details:
- The gauge is registered only on first encounter or when the tenant changes,
  avoiding redundant writes on every 5-minute SegmentStatusChecker cycle.
- Stale gauges are cleaned up on tenant change, null config, and table removal,
  tracked via an internal `_tableTenantMap`.
- A dedicated JMX exporter rule in `controller.yml` extracts `table`,
  `tableType`, `tenant`, and `database` labels. The rule is placed before the
  generic tableNameWithType rules to ensure the tenant segment is captured.
@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.68421% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.77%. Comparing base (14bc147) to head (8c6923d).

Files with missing lines Patch % Lines
...e/pinot/controller/helix/SegmentStatusChecker.java 72.22% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18823      +/-   ##
============================================
+ Coverage     64.76%   64.77%   +0.01%     
  Complexity     1319     1319              
============================================
  Files          3392     3392              
  Lines        210949   210968      +19     
  Branches      33119    33124       +5     
============================================
+ Hits         136611   136653      +42     
+ Misses        63323    63297      -26     
- Partials      11015    11018       +3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.77% <73.68%> (+0.01%) ⬆️
temurin 64.77% <73.68%> (+0.01%) ⬆️
unittests 64.77% <73.68%> (+0.01%) ⬆️
unittests1 56.97% <100.00%> (+0.01%) ⬆️
unittests2 37.19% <73.68%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants