Skip to content

Ingester: Add owned series tracking to exclude stale data from per-ingester limit calculations #7509

@danielblando

Description

@danielblando

Is your feature request related to a problem? Please describe.
When ingesters scale up or the ring reshards, the per-ingester local series limit is recalculated based on the new total ingester count. However, ingesters that haven't compacted yet still hold series data they no longer own (because those series now hash to a different ingester). This causes two problems:

  1. Incorrect throttling: The ingester appears over its new lower limit and incorrectly throttles tenants. For example, if an ingester holds 1M series and a scale-up causes its local limit to drop from 1.2M to 800K, the ingester will reject new writes for that tenant — even though many of those 1M series are stale data that will be cleaned up at the next head compaction (every 2 hours). This creates a window of up to 2 hours after any ring change where tenants can be incorrectly throttled.
  2. Inaccurate tenant usage reporting: The cortex_ingester_active_series metric counts all series in the ingester's TSDB, including series that have been resharded away. This inflates the reported per-tenant usage, making it unreliable for capacity planning, or limit tuning. During resharding, the same series can be counted by both the old and new ingester, effectively double-counting until compaction cleans up the stale data.

Describe the solution you'd like
Introduce an "owned series" concept in the ingester's active series tracker:

  1. Track the ring token for each active series — when a series is added to the active series tracker, store its hash ring token (computed the same way the distributor computes it for routing via TokenForLabels).
  2. Periodically reconcile ownership — on each active series update cycle, compare each tracked series' token against the current ring state. If the series' token no longer maps to this ingester's tokens in the ring, exclude it from the owned count.
  3. Expose a cortex_ingester_owned_series metric — a new per-user gauge that reports only the series this ingester currently owns according to the ring.
  4. Use owned count for local limit calculation — when evaluating whether a tenant exceeds its per-ingester series limit, use the owned count rather than the total active count.

Ownership is determined by looking up the series' token in the sorted ring token list and checking if the responsible token belongs to this ingester instance. The Purge() cycle (on head compaction) removes entries whose TSDB data has been compacted, while ownership reconciliation only adjusts the owned count without deleting entries from the tracker (since the data still exists until compaction).

Describe alternatives you've considered

  • Waiting for head compaction: Current behavior — causes up to 2 hours of potential throttling after any ring change.
  • Temporarily increasing limits during scale-up: Operational burden and risk of actual over-limit tenants going undetected.
  • Faster compaction: Reduces the window but doesn't eliminate it, and has performance/resource implications.

Additional context
We have a proof-of-concept implementation demonstrating this approach: danielblando@37eea74

The PoC moves the token hashing functions (TokenForLabels, ShardByMetricName, etc.) to the ring package so they can be reused by the ingester, adds an owned counter to activeSeriesStripe, and introduces UpdateMetrics() which reconciles ownership when ring tokens change.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions