Skip to content

Commit 67997cb

Browse files
authored
Merge branch 'master' into Allow-partial-data-in-federation
Signed-off-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
2 parents eea3e30 + c3d066c commit 67997cb

351 files changed

Lines changed: 17687 additions & 40192 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/scripts/install-docker.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/bin/bash
22

33
set -x
4-
VER="28.0.4"
4+
VER="29.2.1"
55

66
# Detect architecture
77
ARCH=$(uname -m)
@@ -25,7 +25,7 @@ echo "Installing Docker $VER for architecture: $ARCH (docker: $DOCKER_ARCH, buil
2525
curl -L -o /tmp/docker-$VER.tgz https://download.docker.com/linux/static/stable/$DOCKER_ARCH/docker-$VER.tgz
2626
tar -xz -C /tmp -f /tmp/docker-$VER.tgz
2727
mkdir -vp ~/.docker/cli-plugins/
28-
curl --silent -L "https://github.com/docker/buildx/releases/download/v0.3.0/buildx-v0.3.0.linux-$BUILDX_ARCH" > ~/.docker/cli-plugins/docker-buildx
28+
curl --silent -L "https://github.com/docker/buildx/releases/download/v0.31.1/buildx-v0.31.1.linux-$BUILDX_ARCH" > ~/.docker/cli-plugins/docker-buildx
2929
chmod a+x ~/.docker/cli-plugins/docker-buildx
3030
mv /tmp/docker/* /usr/bin
3131
docker run --privileged --rm tonistiigi/binfmt --install all

.github/workflows/test-build-deploy.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,9 @@ jobs:
192192
- runner: ubuntu-24.04
193193
arch: amd64
194194
tags: integration_querier
195+
- runner: ubuntu-24.04
196+
arch: amd64
197+
tags: integration_querier_microservices_mode
195198
- runner: ubuntu-24.04
196199
arch: amd64
197200
tags: integration_ruler
@@ -225,6 +228,9 @@ jobs:
225228
- runner: ubuntu-24.04-arm
226229
arch: arm64
227230
tags: integration_querier
231+
- runner: ubuntu-24.04
232+
arch: arm64
233+
tags: integration_querier_microservices_mode
228234
steps:
229235
- name: Upgrade golang
230236
uses: actions/setup-go@4dc6199c7b1a012772edbd06daecab0f50c9053c # v6.1.0

AGENTS.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# AGENTS.md
2+
3+
This file provides guidance to AI coding agents when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Cortex is a horizontally scalable, highly available, multi-tenant, long-term storage solution for Prometheus metrics. It uses a microservices architecture with components that can run as separate processes or as a single binary.
8+
9+
## Build Commands
10+
11+
```bash
12+
make # Build all (runs in Docker container by default)
13+
make BUILD_IN_CONTAINER=false # Build locally without Docker
14+
make exes # Build binaries only
15+
make protos # Generate protobuf files
16+
make lint # Run all linters (golangci-lint, misspell, etc.)
17+
make doc # Generate config documentation (run after changing flags/config)
18+
make ./cmd/cortex/.uptodate # Build Cortex Docker image for integration tests
19+
```
20+
21+
## Testing
22+
23+
### Unit Tests
24+
25+
```bash
26+
go test -timeout 2400s -tags "netgo slicelabels" ./... # Run tests with CI configuration
27+
```
28+
29+
### Integration Tests
30+
31+
Integration tests require Docker and the Cortex image to be built first:
32+
33+
```bash
34+
make ./cmd/cortex/.uptodate # Build Cortex Docker image first
35+
36+
# Run all integration tests
37+
go test -v -tags=integration,requires_docker,integration_alertmanager,integration_memberlist,integration_querier,integration_ruler,integration_query_fuzz ./integration/...
38+
39+
# Run a specific integration test
40+
go test -v -tags=integration,integration_ruler -timeout 2400s -count=1 ./integration/... -run "^TestRulerAPISharding$"
41+
```
42+
43+
Environment variables for integration tests:
44+
45+
- `CORTEX_IMAGE` - Docker image to test (default: `quay.io/cortexproject/cortex:latest`)
46+
- `E2E_TEMP_DIR` - Directory for temporary test files
47+
48+
## Code Formatting
49+
50+
Use goimports with Cortex-specific import grouping:
51+
52+
```bash
53+
goimports -local github.com/cortexproject/cortex -w ./path/to/file.go
54+
```
55+
56+
Import order: stdlib, third-party packages, internal Cortex packages (separated by blank lines).
57+
58+
## Architecture
59+
60+
### Write Path
61+
62+
- **Distributor** (stateless) - Receives samples via remote write, validates, distributes to ingesters using consistent hashing
63+
- **Ingester** (semi-stateful) - Stores samples in memory, periodically flushes to long-term storage (TSDB blocks)
64+
65+
### Read Path
66+
67+
- **Querier** (stateless) - Executes PromQL queries across ingesters and long-term storage
68+
- **Query Frontend** (optional, stateless) - Query caching, splitting, and queueing
69+
- **Query Scheduler** (optional, stateless) - Moves queue from frontend for independent scaling
70+
71+
### Storage
72+
73+
- **Compactor** (stateless) - Compacts TSDB blocks in object storage
74+
- **Store Gateway** (semi-stateful) - Queries blocks from object storage
75+
76+
### Optional Services
77+
78+
- **Ruler** - Executes recording rules and alerts
79+
- **Alertmanager** - Multi-tenant alert routing
80+
- **Configs API** - Configuration management
81+
82+
### Key Patterns
83+
84+
- **Hash Ring** - Consistent hashing via Consul, Etcd, or memberlist gossip for data distribution
85+
- **Multi-tenancy** - Tenant isolation via `X-Scope-OrgID` header
86+
- **Blocks Storage** - TSDB-based storage with 2-hour block ranges, stored in S3/GCS/Azure/Swift
87+
88+
### Main Entry Points
89+
90+
- `cmd/cortex/main.go` - Main Cortex binary
91+
- `pkg/cortex/cortex.go` - Service orchestration and configuration
92+
93+
## Code Conventions
94+
95+
- **No global variables** - Use dependency injection
96+
- **Metrics**: Register with `promauto.With(reg)`, never use global prometheus registerer
97+
- **Config naming**: YAML uses `snake_case`, CLI flags use `kebab-case`
98+
- **Logging**: Use `github.com/go-kit/log` (not `github.com/go-kit/kit/log`)
99+
100+
## PR Requirements
101+
102+
- Sign commits with DCO: `git commit -s -m "message"`
103+
- Run `make doc` if config/flags changed
104+
- Include CHANGELOG entry for user-facing changes

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Changelog
22

33
## master / unreleased
4+
* [CHANGE] Blocks storage: Bucket index is now enabled by default. Disabling the bucket index (`-blocks-storage.bucket-store.bucket-index.enabled=false`) is not recommended for production. #7259
45
* [CHANGE] Users Scanner: Rename user index update configuration. #7180
56
* Flag: Renamed `-*.users-scanner.user-index.cleanup-interval` to `-*.users-scanner.user-index.update-interval`.
67
* Config: Renamed `clean_up_interval` to `update_interval` within the `users_scanner` configuration block..
@@ -14,7 +15,10 @@
1415
* [FEATURE] Querier: Add experimental projection pushdown support in Parquet Queryable. #7152
1516
* [FEATURE] Ingester: Add experimental active series queried metric. #7173
1617
* [FEATURE] Tenant Federation: Add experimental support for partial responses using the `-tenant-federation.allow-partial-data` flag. When enabled, failures from individual tenants during a federated query are treated as warnings, allowing results from successful tenants to be returned. #7232
18+
* [ENHANCEMENT] Distributor: Add `cortex_distributor_push_requests_total` metric to track the number of push requests by type. #7239
1719
* [ENHANCEMENT] Querier: Add `-querier.store-gateway-series-batch-size` flag to configure the maximum number of series to be batched in a single gRPC response message from Store Gateways. #7203
20+
* [ENHANCEMENT] HATracker: Add `-distributor.ha-tracker.enable-startup-sync` flag. If enabled, the ha-tracker fetches all tracked keys on startup to populate the local cache. #7213
21+
* [ENHANCEMENT] Distributor: Add validation to ensure remote write v2 requests contain at least one sample or histogram. #7201
1822
* [ENHANCEMENT] Ingester: Add support for ingesting Native Histogram with Custom Buckets. #7191
1923
* [ENHANCEMENT] Ingester: Optimize labels out-of-order (ooo) check by allowing the iteration to terminate immediately upon finding the first unsorted label. #7186
2024
* [ENHANCEMENT] Distributor: Skip attaching `__unit__` and `__type__` labels when `-distributor.enable-type-and-unit-labels` is enabled, as these are appended from metadata. #7145
@@ -28,15 +32,18 @@
2832
* [ENHANCEMENT] Alertmanager/Ruler: Introduce a user scanner to reduce the number of list calls to object storage. #6999
2933
* [ENHANCEMENT] Ruler: Add DecodingConcurrency config flag for Thanos Engine. #7118
3034
* [ENHANCEMENT] Query Frontend: Add query priority based on operation. #7128
31-
* [ENHANCEMENT] Compactor: Avoid double compaction by cleaning partition files in 2 cycles. #7130 #7209
35+
* [ENHANCEMENT] Compactor: Avoid double compaction by cleaning partition files in 2 cycles. #7130 #7209 #7257
3236
* [ENHANCEMENT] Distributor: Optimize memory usage by recycling v2 requests. #7131
3337
* [ENHANCEMENT] Compactor: Avoid double compaction by not filtering delete blocks on real time when using bucketIndex lister. #7156
3438
* [ENHANCEMENT] Upgrade to go 1.25. #7164
3539
* [ENHANCEMENT] Upgraded container base images to `alpine:3.23`. #7163
3640
* [ENHANCEMENT] Ingester: Instrument Ingester CPU profile with userID for read APIs. #7184
3741
* [ENHANCEMENT] Ingester: Add fetch timeout for Ingester expanded postings cache. #7185
3842
* [ENHANCEMENT] Ingester: Add feature flag to collect metrics of how expensive an unoptimized regex matcher is and new limits to protect Ingester query path against expensive unoptimized regex matchers. #7194 #7210
43+
* [ENHANCEMENT] Querier: Add active API requests tracker logging to help with OOMKill troubleshooting. #7216
3944
* [ENHANCEMENT] Compactor: Add partition group creation time to visit marker. #7217
45+
* [ENHANCEMENT] Compactor: Add concurrency for partition cleanup and mark block for deletion #7246
46+
* [BUGFIX] Distributor: If remote write v2 is disabled, explicitly return HTTP 415 (Unsupported Media Type) for Remote Write V2 requests instead of attempting to parse them as V1. #7238
4047
* [BUGFIX] Ring: Change DynamoDB KV to retry indefinitely for WatchKey. #7088
4148
* [BUGFIX] Ruler: Add XFunctions validation support. #7111
4249
* [BUGFIX] Querier: propagate Prometheus info annotations in protobuf responses. #7132
@@ -45,6 +52,7 @@
4552
* [BUGFIX] Query Frontend: Add Native Histogram extraction logic in results cache #7167
4653
* [BUGFIX] Alertmanager: Fix alertmanager reloading bug that removes user template files #7196
4754
* [BUGFIX] Query Scheduler: If max_outstanding_requests_per_tenant value is updated to lesser value than the current number of requests in the queue, the excess requests (newest ones) will be dropped to prevent deadlocks. #7188
55+
* [BUGFIX] Distributor: Return remote write V2 stats headers properly when the request is HA deduplicated. #7240
4856

4957
## 1.20.1 2025-12-03
5058

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

docs/blocks-storage/bucket-index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ slug: bucket-index
77

88
The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor, and used by queriers, store-gateways and rulers to discover blocks in the storage.
99

10-
The bucket index usage is **optional** and can be enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true` (or its respective YAML config option).
10+
The bucket index is **enabled by default**. Disabling it via `-blocks-storage.bucket-store.bucket-index.enabled=false` is not recommended for production environments.
1111

1212
## Benefits
1313

@@ -34,7 +34,7 @@ The `bucket-index.json.gz` contains:
3434

3535
The [compactor](./compactor.md) periodically scans the bucket and uploads an updated bucket index to the storage. The frequency at which the bucket index is updated can be configured via `-compactor.cleanup-interval`.
3636

37-
Despite using the bucket index is optional, the index itself is built and updated by the compactor even if `-blocks-storage.bucket-store.bucket-index.enabled` has **not** been enabled. This is intentional, so that once a Cortex cluster operator decides to enable the bucket index in a live cluster, the bucket index for any tenant is already existing and query results consistency is guaranteed. The overhead introduced by keeping the bucket index updated is expected to be non significative.
37+
The index itself is built and updated by the compactor even if `-blocks-storage.bucket-store.bucket-index.enabled` has **not** been enabled. This is intentional, so that once a Cortex cluster operator decides to enable the bucket index in a live cluster, the bucket index for any tenant is already existing and query results consistency is guaranteed. The overhead introduced by keeping the bucket index updated is expected to be non significative.
3838

3939
## How it's used by the querier
4040

docs/blocks-storage/querier.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1716,9 +1716,10 @@ blocks_storage:
17161716

17171717
bucket_index:
17181718
# True to enable querier and store-gateway to discover blocks in the
1719-
# storage via bucket index instead of bucket scanning.
1719+
# storage via bucket index instead of bucket scanning. Disabling the
1720+
# bucket index is not recommended for production.
17201721
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
1721-
[enabled: <boolean> | default = false]
1722+
[enabled: <boolean> | default = true]
17221723

17231724
# How frequently a bucket index, which previously failed to load, should
17241725
# be tried to load again. This option is used only by querier.

docs/blocks-storage/store-gateway.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1782,9 +1782,10 @@ blocks_storage:
17821782

17831783
bucket_index:
17841784
# True to enable querier and store-gateway to discover blocks in the
1785-
# storage via bucket index instead of bucket scanning.
1785+
# storage via bucket index instead of bucket scanning. Disabling the
1786+
# bucket index is not recommended for production.
17861787
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
1787-
[enabled: <boolean> | default = false]
1788+
[enabled: <boolean> | default = true]
17881789

17891790
# How frequently a bucket index, which previously failed to load, should
17901791
# be tried to load again. This option is used only by querier.

docs/configuration/config-file-reference.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2399,9 +2399,10 @@ bucket_store:
23992399

24002400
bucket_index:
24012401
# True to enable querier and store-gateway to discover blocks in the storage
2402-
# via bucket index instead of bucket scanning.
2402+
# via bucket index instead of bucket scanning. Disabling the bucket index is
2403+
# not recommended for production.
24032404
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
2404-
[enabled: <boolean> | default = false]
2405+
[enabled: <boolean> | default = true]
24052406

24062407
# How frequently a bucket index, which previously failed to load, should be
24072408
# tried to load again. This option is used only by querier.
@@ -3100,6 +3101,13 @@ ha_tracker:
31003101
# CLI flag: -distributor.ha-tracker.failover-timeout
31013102
[ha_tracker_failover_timeout: <duration> | default = 30s]
31023103
3104+
# [Experimental] If enabled, fetches all tracked keys on startup to populate
3105+
# the local cache. This prevents duplicate GET calls for the same key while
3106+
# the cache is cold, but could cause a spike in GET requests during
3107+
# initialization if the number of tracked keys is large.
3108+
# CLI flag: -distributor.ha-tracker.enable-startup-sync
3109+
[enable_startup_sync: <boolean> | default = false]
3110+
31033111
# Backend storage to use for the ring. Please be aware that memberlist is not
31043112
# supported by the HA tracker since gossip propagation is too slow for HA
31053113
# purposes.

docs/configuration/single-process-config-blocks-gossip-1.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,5 @@ alertmanager_storage:
105105
local:
106106
# Make sure file exist
107107
path: /tmp/cortex/alerts
108+
compactor:
109+
cleanup_interval: 10s

0 commit comments

Comments
 (0)