Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
* [FEATURE] Memberlist: Add `-memberlist.cluster-label` and `-memberlist.cluster-label-verification-disabled` to prevent accidental cross-cluster gossip joins and support rolling label rollout. #7385
* [FEATURE] Querier: Add timeout classification to classify query timeouts as 4XX (user error) or 5XX (system error) based on phase timing. When enabled, queries that spend most of their time in PromQL evaluation return `422 Unprocessable Entity` instead of `503 Service Unavailable`. #7374
* [FEATURE] Querier: Implement Resource Based Throttling in Querier. #7442
* [FEATURE] Querier: Add resource-based query eviction that automatically cancels the heaviest running query when CPU or heap utilization exceeds configured thresholds. #7488
* [ENHANCEMENT] Tenant Federation: Avoid purging the regex resolver LRU cache on user-sync ticks when the set of known users has not changed. #7489
* [ENHANCEMENT] Parquet Converter: Add a ring status page to expose the ring status. #7455
* [ENHANCEMENT] Ingester: Add WAL record metrics to help evaluate the effectiveness of WAL compression type (e.g. snappy, zstd): `cortex_ingester_tsdb_wal_record_part_writes_total`, `cortex_ingester_tsdb_wal_record_parts_bytes_written_total`, and `cortex_ingester_tsdb_wal_record_bytes_saved_total`. #7420
Expand Down
42 changes: 42 additions & 0 deletions docs/blocks-storage/querier.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,48 @@ querier:
# type. 0 to disable.
# CLI flag: -querier.query-protection.rejection.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

eviction:
threshold:
# EXPERIMENTAL: Max CPU utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in
# percentage, between 0 and 1. monitored_resources config must include
# the resource type. 0 to disable.
# CLI flag: -querier.query-protection.eviction.threshold.cpu-utilization
[cpu_utilization: <float> | default = 0]

# EXPERIMENTAL: Max heap utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in
# percentage, between 0 and 1. monitored_resources config must include
# the resource type. 0 to disable.
# CLI flag: -querier.query-protection.eviction.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

# EXPERIMENTAL: How frequently the evictor checks system resource
# utilization.
# CLI flag: -querier.query-protection.eviction.check-interval
[check_interval: <duration> | default = 1s]

# EXPERIMENTAL: Number of check intervals to wait after an eviction before
# evicting again.
# CLI flag: -querier.query-protection.eviction.cooldown-period
[cooldown_period: <int> | default = 3]

# EXPERIMENTAL: The query metric used to determine the heaviest query for
# eviction. Supported values: fetched_samples, fetched_series,
# fetched_chunks, fetched_chunk_bytes.
# CLI flag: -querier.query-protection.eviction.eviction-metric
[eviction_metric: <string> | default = "fetched_samples"]

# EXPERIMENTAL: Minimum time a query must be running before it becomes
# eligible for eviction. Queries younger than this are ignored.
# CLI flag: -querier.query-protection.eviction.min-query-age
[min_query_age: <duration> | default = 10s]

# EXPERIMENTAL: Maximum number of queries to evict in a single check cycle
# when resource thresholds are breached.
# CLI flag: -querier.query-protection.eviction.max-evictions-per-cycle
[max_evictions_per_cycle: <int> | default = 1]
```

### `blocks_storage_config`
Expand Down
42 changes: 42 additions & 0 deletions docs/blocks-storage/store-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,48 @@ store_gateway:
# CLI flag: -store-gateway.query-protection.rejection.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

eviction:
threshold:
# EXPERIMENTAL: Max CPU utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in
# percentage, between 0 and 1. monitored_resources config must include
# the resource type. 0 to disable.
# CLI flag: -store-gateway.query-protection.eviction.threshold.cpu-utilization
[cpu_utilization: <float> | default = 0]

# EXPERIMENTAL: Max heap utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in
# percentage, between 0 and 1. monitored_resources config must include
# the resource type. 0 to disable.
# CLI flag: -store-gateway.query-protection.eviction.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

# EXPERIMENTAL: How frequently the evictor checks system resource
# utilization.
# CLI flag: -store-gateway.query-protection.eviction.check-interval
[check_interval: <duration> | default = 1s]

# EXPERIMENTAL: Number of check intervals to wait after an eviction before
# evicting again.
# CLI flag: -store-gateway.query-protection.eviction.cooldown-period
[cooldown_period: <int> | default = 3]

# EXPERIMENTAL: The query metric used to determine the heaviest query for
# eviction. Supported values: fetched_samples, fetched_series,
# fetched_chunks, fetched_chunk_bytes.
# CLI flag: -store-gateway.query-protection.eviction.eviction-metric
[eviction_metric: <string> | default = "fetched_samples"]

# EXPERIMENTAL: Minimum time a query must be running before it becomes
# eligible for eviction. Queries younger than this are ignored.
# CLI flag: -store-gateway.query-protection.eviction.min-query-age
[min_query_age: <duration> | default = 10s]

# EXPERIMENTAL: Maximum number of queries to evict in a single check cycle
# when resource thresholds are breached.
# CLI flag: -store-gateway.query-protection.eviction.max-evictions-per-cycle
[max_evictions_per_cycle: <int> | default = 1]

hedged_request:
# If true, hedged requests are applied to object store calls. It can help
# with reducing tail latency.
Expand Down
126 changes: 126 additions & 0 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -3877,6 +3877,48 @@ query_protection:
# disable.
# CLI flag: -ingester.query-protection.rejection.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

eviction:
threshold:
# EXPERIMENTAL: Max CPU utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -ingester.query-protection.eviction.threshold.cpu-utilization
[cpu_utilization: <float> | default = 0]

# EXPERIMENTAL: Max heap utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -ingester.query-protection.eviction.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

# EXPERIMENTAL: How frequently the evictor checks system resource
# utilization.
# CLI flag: -ingester.query-protection.eviction.check-interval
[check_interval: <duration> | default = 1s]

# EXPERIMENTAL: Number of check intervals to wait after an eviction before
# evicting again.
# CLI flag: -ingester.query-protection.eviction.cooldown-period
[cooldown_period: <int> | default = 3]

# EXPERIMENTAL: The query metric used to determine the heaviest query for
# eviction. Supported values: fetched_samples, fetched_series,
# fetched_chunks, fetched_chunk_bytes.
# CLI flag: -ingester.query-protection.eviction.eviction-metric
[eviction_metric: <string> | default = "fetched_samples"]

# EXPERIMENTAL: Minimum time a query must be running before it becomes
# eligible for eviction. Queries younger than this are ignored.
# CLI flag: -ingester.query-protection.eviction.min-query-age
[min_query_age: <duration> | default = 10s]

# EXPERIMENTAL: Maximum number of queries to evict in a single check cycle
# when resource thresholds are breached.
# CLI flag: -ingester.query-protection.eviction.max-evictions-per-cycle
[max_evictions_per_cycle: <int> | default = 1]
```

### `ingester_client_config`
Expand Down Expand Up @@ -5031,6 +5073,48 @@ query_protection:
# disable.
# CLI flag: -querier.query-protection.rejection.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

eviction:
threshold:
# EXPERIMENTAL: Max CPU utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -querier.query-protection.eviction.threshold.cpu-utilization
[cpu_utilization: <float> | default = 0]

# EXPERIMENTAL: Max heap utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -querier.query-protection.eviction.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

# EXPERIMENTAL: How frequently the evictor checks system resource
# utilization.
# CLI flag: -querier.query-protection.eviction.check-interval
[check_interval: <duration> | default = 1s]

# EXPERIMENTAL: Number of check intervals to wait after an eviction before
# evicting again.
# CLI flag: -querier.query-protection.eviction.cooldown-period
[cooldown_period: <int> | default = 3]

# EXPERIMENTAL: The query metric used to determine the heaviest query for
# eviction. Supported values: fetched_samples, fetched_series,
# fetched_chunks, fetched_chunk_bytes.
# CLI flag: -querier.query-protection.eviction.eviction-metric
[eviction_metric: <string> | default = "fetched_samples"]

# EXPERIMENTAL: Minimum time a query must be running before it becomes
# eligible for eviction. Queries younger than this are ignored.
# CLI flag: -querier.query-protection.eviction.min-query-age
[min_query_age: <duration> | default = 10s]

# EXPERIMENTAL: Maximum number of queries to evict in a single check cycle
# when resource thresholds are breached.
# CLI flag: -querier.query-protection.eviction.max-evictions-per-cycle
[max_evictions_per_cycle: <int> | default = 1]
```

### `query_frontend_config`
Expand Down Expand Up @@ -6800,6 +6884,48 @@ query_protection:
# CLI flag: -store-gateway.query-protection.rejection.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

eviction:
threshold:
# EXPERIMENTAL: Max CPU utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -store-gateway.query-protection.eviction.threshold.cpu-utilization
[cpu_utilization: <float> | default = 0]

# EXPERIMENTAL: Max heap utilization that this instance can reach before
# evicting the heaviest running query (across all tenants) in percentage,
# between 0 and 1. monitored_resources config must include the resource
# type. 0 to disable.
# CLI flag: -store-gateway.query-protection.eviction.threshold.heap-utilization
[heap_utilization: <float> | default = 0]

# EXPERIMENTAL: How frequently the evictor checks system resource
# utilization.
# CLI flag: -store-gateway.query-protection.eviction.check-interval
[check_interval: <duration> | default = 1s]

# EXPERIMENTAL: Number of check intervals to wait after an eviction before
# evicting again.
# CLI flag: -store-gateway.query-protection.eviction.cooldown-period
[cooldown_period: <int> | default = 3]

# EXPERIMENTAL: The query metric used to determine the heaviest query for
# eviction. Supported values: fetched_samples, fetched_series,
# fetched_chunks, fetched_chunk_bytes.
# CLI flag: -store-gateway.query-protection.eviction.eviction-metric
[eviction_metric: <string> | default = "fetched_samples"]

# EXPERIMENTAL: Minimum time a query must be running before it becomes
# eligible for eviction. Queries younger than this are ignored.
# CLI flag: -store-gateway.query-protection.eviction.min-query-age
[min_query_age: <duration> | default = 10s]

# EXPERIMENTAL: Maximum number of queries to evict in a single check cycle
# when resource thresholds are breached.
# CLI flag: -store-gateway.query-protection.eviction.max-evictions-per-cycle
[max_evictions_per_cycle: <int> | default = 1]

hedged_request:
# If true, hedged requests are applied to object store calls. It can help with
# reducing tail latency.
Expand Down
7 changes: 7 additions & 0 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ Currently experimental features are:
- `-validation.max-label-cardinality-for-unoptimized-regex` (int) - maximum label cardinality
- `-validation.max-total-label-value-length-for-unoptimized-regex` (int) - maximum total length of all label values in bytes
- HATracker: `-distributor.ha-tracker.enable-startup-sync` (bool) - If enabled, fetches all tracked keys on startup to populate the local cache.
- Querier: Resource-based query eviction
- `-querier.query-protection.eviction.threshold.cpu-utilization` (float)
- `-querier.query-protection.eviction.threshold.heap-utilization` (float)
- `-querier.query-protection.eviction.check-interval` (duration)
- `-querier.query-protection.eviction.cooldown-period` (int)
- `-querier.query-protection.eviction.eviction-metric` (string)
- `-querier.query-protection.eviction.min-query-age` (duration)
- Ingester: Active Series Tracker
- Per-tenant `active_series_trackers` configuration in runtime config overrides
- Counts active series matching PromQL label matchers and exposes `cortex_ingester_active_series_per_tracker` metric
Loading
Loading