Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/practices/labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Labels
sort_rank: 2
---

The label conventions presented in this document are not required
for using Prometheus, but can serve as both a style-guide and a collection of
best practices. Individual organizations may want to approach some of these
practices, e.g. naming conventions, differently.

## Labels

Prometheus labels can come from both the target and from
[relabeling in discovery](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) as well as from the target itself.

By default Prometheus configures two primary discovery target labels.

- `job`
- The `job` label is one of the few ubiquitious labels, set at scrape time, and is
used to identify metrics scraped from the same target/exporter.
- If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation

WARNING: When using `without`, be careful not to strip out the `job` label accidentally.

- `instance`
- The `instance` label will include the `ip:port` what was scraped

### General Labelling Advice

Use labels to differentiate the characteristics of the thing that is being measured:

- `api_http_requests_total` - differentiate request types: `operation="create|update|delete"`
- `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"`

Do not put the label names in the metric name, as this introduces redundancy
and will cause confusion if the respective labels are aggregated away.

CAUTION: Remember that every unique combination of key-value label
pairs represents a new time series, which can dramatically increase the amount
of data stored. Do not use labels to store dimensions with high cardinality
(many different label values), such as user IDs, email addresses, or other
unbounded sets of values.
20 changes: 2 additions & 18 deletions docs/practices/naming.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: Metric and label naming
title: Metric naming
sort_rank: 1
---

The metric and label conventions presented in this document are not required
The metric conventions presented in this document are not required
for using Prometheus, but can serve as both a style-guide and a collection of
best practices. Individual organizations may want to approach some of these
practices, e.g. naming conventions, differently.
Expand Down Expand Up @@ -80,22 +80,6 @@ the underlying metric type and unit you work with.
* **Metric collisions**: With growing adoption and metric changes over time, there are cases where lack
of unit and type information in the metric name will cause certain series to collide (e.g. `process_cpu` for seconds and milliseconds).

## Labels

Use labels to differentiate the characteristics of the thing that is being measured:

* `api_http_requests_total` - differentiate request types: `operation="create|update|delete"`
* `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"`

Do not put the label names in the metric name, as this introduces redundancy
and will cause confusion if the respective labels are aggregated away.

CAUTION: Remember that every unique combination of key-value label
pairs represents a new time series, which can dramatically increase the amount
of data stored. Do not use labels to store dimensions with high cardinality
(many different label values), such as user IDs, email addresses, or other
unbounded sets of values.

## Base Units

Prometheus does not have any units hard coded. For better compatibility, base
Expand Down
18 changes: 14 additions & 4 deletions docs/practices/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ This page documents proper naming conventions and aggregation for recording rule
Keeping the metric name unchanged makes it easy to know what a metric is and
easy to find in the codebase.

IMPORTANT: `job` label acts as a primary key. It is **strongly** recommended that you use it to scope your PromQL expressions to the system you are monitoring.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading. Prometheus doesn't have the concept of "primary key". Not even metric names are a "primary key".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, especially since folks used to SQL DBs will jump to the conclusion that it's a SQL DB, which it isn't.

Iterated on the language to avoid creating ambiguity


To keep the operations clean, `_sum` is omitted if there are other operations,
as `sum()`. Associative operations can be merged (for example `min_min` is the
same as `min`).
Expand All @@ -27,6 +29,18 @@ If there is no obvious operation to use, use `sum`. When taking a ratio by
doing division, separate the metrics using `_per_` and call the operation
`ratio`.

## Labels

NOTE: Omitting a label in a PromQL expression is the functional equivalent of specifying `label=*`

* In both recorded rules and alerting expressions, always specify a `job` label to prevent expression mismatches from occuring.
This is especially important in multi-tenant systems where the same metric names may be exported by different jobs or the
same job (e.g `node_exporter) in multiple, distinct deployments

* Always specify a `without` clause with the labels you are aggregating away.
This is to preserve all the other labels such as `job`, which will avoid
conflicts and give you more useful metrics and alerts.

## Aggregation

* When aggregating up ratios, aggregate up the numerator and denominator
Expand All @@ -40,10 +54,6 @@ Instead keep the metric name without the `_count` or `_sum` suffix and replace
the `rate` in the operation with `mean`. This represents the average
observation size over that time period.

* Always specify a `without` clause with the labels you are aggregating away.
This is to preserve all the other labels such as `job`, which will avoid
conflicts and give you more useful metrics and alerts.

## Examples

_Note the indentation style with outdented operators on their own line between
Expand Down