Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions docs/Alerts & Notifications/Alert Configuration Ordering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,18 +88,20 @@ Both have the **same name** (`disk_space_usage`), so precedence applies:

Netdata loads alert configurations from two directories:

1. **User config** (loaded first): `/etc/netdata/health.d/` (default)
2. **Stock config** (loaded second): `/usr/lib/netdata/conf.d/health.d/` (default)
1. **User config** (loaded first): your `health.d/` directory under the [Netdata config directory](/docs/netdata-agent/configuration)
2. **Stock config** (loaded second): the stock `health.d/` directory

These paths can vary by installation. Check your `netdata.conf` `[directories]` section for exact paths.
:::note
Config paths vary by install prefix. Run `sudo ./edit-config health.d/<file>` to resolve the correct user path automatically, or check the `[directories]` section of `netdata.conf` (keys `health config` and `stock health config`) for exact locations.
:::

### File Shadowing

If a file with the **same name** exists in both directories, only the user file is loaded. The stock file is completely ignored.

**Example:**
- Stock: `/usr/lib/netdata/conf.d/health.d/cpu.conf`
- User: `/etc/netdata/health.d/cpu.conf`
- Stock: `cpu.conf` in the stock config directory
- User: `cpu.conf` in your user config directory
- Result: Only the user file is loaded

This means if you copy a stock file to override it, you must include **all** alerts you want from that file, not just the ones you're modifying.
Expand Down
189 changes: 150 additions & 39 deletions docs/Alerts & Notifications/Alerts & Notifications.mdx

Large diffs are not rendered by default.

93 changes: 63 additions & 30 deletions docs/Alerts & Notifications/Overriding Stock Alerts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ This guide explains how to customize Netdata's stock alerts. User configurations

## Quick Reference

| Goal | Method |
|------|--------|
| Goal | Method |
|-------------------------------------|--------------------------------------|
| Change thresholds for ALL instances | Create a template with the same name |
| Change thresholds for ONE instance | Create an alarm with the same name |
| Disable an alert completely | Use `enabled alarms` in netdata.conf |
| Silence notifications only | Set `to: silent` |
| Change thresholds for ONE instance | Create an alarm with the same name |
| Disable an alert completely | Use `enabled alarms` in netdata.conf |
| Silence notifications only | Set `to: silent` |

## Understanding Overrides

Expand All @@ -32,36 +32,54 @@ See [Alert Configuration Ordering](/docs/alerts-&-notifications/alert-configurat

## Where to Put Your Overrides

**User config directory** (default): `/etc/netdata/health.d/`
Put your overrides in your [Netdata config directory](/docs/netdata-agent/configuration) under `health.d/` — files there survive upgrades. The **stock config directory** holds Netdata's built-in alert definitions and is replaced during updates.

Files here survive upgrades. Stock files in `/usr/lib/netdata/conf.d/health.d/` are replaced during updates.

Check your `netdata.conf` `[directories]` section for exact paths on your system.
:::note
Config paths vary by install prefix. Run `sudo ./edit-config health.d/<file>` from your Netdata config directory to resolve the correct user path automatically, or check the `[directories]` section of `netdata.conf` (keys `health config` and `stock health config`) for exact locations.
:::

## Method 1: Override All Instances (Template)

Create a template with the same name to change thresholds for ALL instances.

**Example: Raise CPU steal thresholds globally**

Stock alert in `/usr/lib/netdata/conf.d/health.d/cpu.conf`:
Stock alert from `cpu.conf` (in the stock config directory):

```yaml
template: 20min_steal_cpu
on: system.cpu
lookup: average -20m unaligned of steal
units: %
every: 5m
warn: $this > (($status >= $WARNING) ? (5) : (10))
template: 20min_steal_cpu
on: system.cpu
class: Latency
type: System
component: CPU
host labels: _os=linux
lookup: average -20m unaligned of steal
units: %
every: 5m
warn: $this > (($status >= $WARNING) ? (5) : (10))
delay: down 1h multiplier 1.5 max 2h
summary: System CPU steal time
info: Average CPU steal time over the last 20 minutes
to: silent
```

Your override in `/etc/netdata/health.d/my-overrides.conf`:
Your override (create it with `sudo ./edit-config health.d/my-overrides.conf`):

```yaml
template: 20min_steal_cpu
on: system.cpu
lookup: average -20m unaligned of steal
units: %
every: 5m
warn: $this > (($status >= $WARNING) ? (10) : (20))
template: 20min_steal_cpu
on: system.cpu
class: Latency
type: System
component: CPU
host labels: _os=linux
lookup: average -20m unaligned of steal
units: %
every: 5m
warn: $this > (($status >= $WARNING) ? (10) : (20))
delay: down 1h multiplier 1.5 max 2h
summary: System CPU steal time
info: Average CPU steal time over the last 20 minutes
to: silent
```

**Why it works:** Same name + same context. Your template is processed first, creating the alert. The stock template is then skipped.
Expand All @@ -75,6 +93,7 @@ Create an alarm to override thresholds for ONE specific instance while keeping s
**Example: Different disk space threshold for `/mnt/data`**

Stock template (applies to all disks):

```yaml
template: disk_space_usage
on: disk.space
Expand All @@ -83,7 +102,8 @@ template: disk_space_usage
crit: $this < 10
```

Your override in `/etc/netdata/health.d/my-overrides.conf`:
Your override (create it with `sudo ./edit-config health.d/my-overrides.conf`):

```yaml
alarm: disk_space_usage
on: disk_space._mnt_data
Expand All @@ -93,6 +113,7 @@ alarm: disk_space_usage
```

**Why it works:**

- Both have the **same name** (`disk_space_usage`)
- Your **alarm** targets the specific chart ID `disk_space._mnt_data`
- Alarms are processed before templates (when names match)
Expand Down Expand Up @@ -125,6 +146,7 @@ chart labels: mount_point=/mnt/data
```

Use labels when:

- You want to target multiple instances sharing a label
- Chart IDs are dynamic or unpredictable

Expand All @@ -133,7 +155,7 @@ Use labels when:
If you want to modify many alerts in one stock file, copy it entirely:

```bash
cd /etc/netdata
# from your Netdata config directory:
sudo ./edit-config health.d/cpu.conf
```

Expand All @@ -145,7 +167,7 @@ This means your copy must include ALL alerts you want—not just the ones you're

### Option A: Global Disable

In `/etc/netdata/netdata.conf`:
In your `netdata.conf` (edit it with `sudo ./edit-config netdata.conf`):

```ini
[health]
Expand Down Expand Up @@ -195,11 +217,13 @@ If `netdatacli` isn't available, send `SIGUSR2` to the Netdata process.
### Verify Your Override

Check via API:

```bash
curl -s "http://localhost:19999/api/v1/alarms?all" | jq '.alarms | to_entries[] | select(.value.name == "20min_steal_cpu") | .value'
```

Key fields to check:

- `source`: confirms which config file is active (user override vs stock)
- `lookup_*`: data query parameters
- `warn`, `crit`: threshold expressions
Expand All @@ -219,6 +243,7 @@ grep -i health /var/log/netdata/error.log | tail -20
```

Common issues:

- Syntax errors in configuration
- Alert name doesn't match exactly (case-sensitive)
- File permissions prevent Netdata from reading your config
Expand All @@ -235,6 +260,7 @@ Common issues:
### Both Stock and Override Alerts Appear

This happens when matching criteria don't overlap. For example:

- Your override has `host labels: production`
- Stock alert has no host labels restriction

Expand Down Expand Up @@ -274,17 +300,20 @@ Both can coexist because they match different hosts.

### How do I find what stock alerts exist?

List all stock alert files:
List the stock alert files (the command below uses the default stock directory):

```bash
ls /usr/lib/netdata/conf.d/health.d/
```

View a specific stock alert:

```bash
cat /usr/lib/netdata/conf.d/health.d/cpu.conf
```

Or use the API to list all alert names:
Or use the API to list all alert names without dealing with paths:

```bash
curl -s "http://localhost:19999/api/v1/alarms?all" | jq '.alarms | to_entries[].value.name' | sort -u
```
Expand All @@ -303,13 +332,14 @@ template: my_custom_disk_alert

### What happens to my overrides after a Netdata upgrade?

User config files in `/etc/netdata/health.d/` are preserved. Stock files in `/usr/lib/netdata/conf.d/health.d/` are replaced.
User config files are preserved across upgrades. Stock files are replaced.

Your overrides continue working. However, if a stock alert is renamed or removed in a new version, your override may become orphaned (still works, but no longer overriding anything).

### How do I override for multiple specific instances?

Option 1: Create multiple alarms (one per instance):

```yaml
alarm: disk_space_usage
on: disk_space._mnt_data
Expand All @@ -321,6 +351,7 @@ alarm: disk_space_usage
```

Option 2: Use chart labels if instances share a label:

```yaml
template: disk_space_usage
on: disk.space
Expand All @@ -331,6 +362,7 @@ chart labels: storage_tier=bulk
### Can I see what overrides are currently active?

Check which config files Netdata loaded:

```bash
# systemd journal:
journalctl --namespace netdata -g "health.*load\|health.*read" --no-pager
Expand All @@ -339,7 +371,8 @@ journalctl --namespace netdata -g "health.*load\|health.*read" --no-pager
grep -iE "health.*(load|read)" /var/log/netdata/error.log
```

Compare your active alert config vs stock:
Compare your active alert config vs stock (default locations shown):

```bash
# Your override
cat /etc/netdata/health.d/my-overrides.conf
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This collector listens for incoming SNMP Trap and INFORM notifications from netw
- **Deduplication**: Optional configurable per-job dedup that suppresses repeated identical traps within a window. The first matching trap is journaled immediately; subsequent matches increment a summary counter and a periodic summary entry is written.
- **Per-OID overrides**: Operators can override the profile-assigned category, severity, and labels for specific OIDs without editing profiles.
- **Profile-defined trap metrics**: Operators can define trap-to-metric rules in custom trap profiles, then enable selected rules per listener job with `profile_metrics`. Profile metrics are emitted per source device, using vnode host scope when enrichment finds an unambiguous vnode and bounded source labels for chart identity and fallback attribution.
- **Direct journal storage**: Enabled by default for explicit jobs. Stores traps under the configured Netdata log directory (`/var/log/netdata/traps/<job>/` by default) and exposes the embedded `snmp:traps` Function. Direct-journal jobs appear as `__logs_sources` options.
- **Direct journal storage**: Enabled by default for explicit jobs. Stores traps under the configured Netdata log directory (`${NETDATA_LOG_DIR}/traps/<job>/`; package installs usually use `/var/log/netdata`, and static installs commonly use `/opt/netdata/var/log/netdata`) and exposes the embedded `snmp:traps` Function. Direct-journal jobs appear as `__logs_sources` options.
- **OTLP/gRPC export**: Optional backend that exports traps as OTLP LogRecords. When `otlp.enabled` is `true`, traps are exported through OTLP regardless of `journal.enabled`; if direct journal storage is also enabled, both backends receive traps.
- **Self-metrics**: Per-job pipeline counters, trap events (by category and severity), processing errors (by type), dedup suppression (when enabled), bounded per-source receiver health, and profile-metric diagnostics.

Expand Down Expand Up @@ -102,14 +102,12 @@ Example conversion for a MIB module not shipped in the OOB pack:
```


This collector is only supported on the following platforms:

- linux
This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Binding to the standard SNMP trap port (UDP/162) requires `CAP_NET_BIND_SERVICE` or root.
Netdata packages grant this capability to `go.d.plugin` and allow it in `netdata.service`.
Binding to the standard SNMP trap port (UDP/162) requires elevated bind privileges on many platforms.
On Linux, this means `CAP_NET_BIND_SERVICE` or root. Netdata packages grant this capability to `go.d.plugin` and allow it in `netdata.service`.


### Default Behavior
Expand Down Expand Up @@ -162,7 +160,7 @@ Configure the network devices sending traps:

#### Verify Netdata log directory access

Direct-journal jobs write under the configured Netdata log directory (`/var/log/netdata/traps/` by default, or `${NETDATA_LOG_DIR}/traps/` at runtime).
Direct-journal jobs write under the configured Netdata log directory (`${NETDATA_LOG_DIR}/traps/`). Package installs usually use `/var/log/netdata`; static installs commonly use `/opt/netdata/var/log/netdata`.
Job creation fails if the configured Netdata log directory is missing or unusable. For OTLP-only jobs, set `journal.enabled: false` and `otlp.enabled: true`.


Expand Down Expand Up @@ -263,7 +261,7 @@ Each user has:
<a id="option-direct-journal-journal"></a>
##### journal

- `enabled`: Write traps to local direct journal files under the configured Netdata log directory (`/var/log/netdata/traps/<job>/` by default, or `${NETDATA_LOG_DIR}/traps/<job>/` at runtime) and expose the job as a `__logs_sources` option in the embedded `snmp:traps` Function.
- `enabled`: Write traps to local direct journal files under the configured Netdata log directory (`${NETDATA_LOG_DIR}/traps/<job>/`) and expose the job as a `__logs_sources` option in the embedded `snmp:traps` Function. Package installs usually use `/var/log/netdata`; static installs commonly use `/opt/netdata/var/log/netdata`.
- Set `enabled: false` only when another output backend, such as OTLP, is enabled.


Expand Down Expand Up @@ -619,7 +617,7 @@ Metrics:
| snmp.trap.pipeline | received, decoded, accepted, committed, dedup_suppressed, dropped, write_failed | events/s |
| snmp.trap.events | state_change, config_change, security, auth, license, mobility, diagnostic, unknown | events/s |
| snmp.trap.severity | emerg, alert, crit, err, warning, notice, info, debug | events/s |
| snmp.trap.errors | unknown_oid, decode_failed, template_unresolved, malformed_pdu, dropped_allowlist, rate_limited, auth_failures, usm_failures, unknown_engine_id, inform_response_failed, binary_encoded, profile_load_failed, journal_write_failed, otlp_export_failed, listener_read_failed | errors/s |
| snmp.trap.errors | unknown_oid, decode_failed, template_unresolved, malformed_pdu, dropped_allowlist, rate_limited, auth_failures, usm_failures, unknown_engine_id, inform_response_failed, binary_encoded, profile_load_failed, journal_write_failed, otlp_export_failed, listener_read_failed, listener_buffer_degraded | errors/s |
| snmp.trap.dedup_suppressed | suppressed | events/s |
| snmp.trap.sources | active | sources |
| snmp.trap.source_attribution | vnode, fallback, ambiguous, failed, overflow_dropped, source_transitions | events/s |
Expand Down
30 changes: 25 additions & 5 deletions docs/Collecting Metrics/StatsD.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,13 @@ For example, to monitor the application `myapp` using StatsD and Netdata, create

Using this configuration, `myapp` gets its own dashboard section with one chart containing two [dimensions](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d).

When you send metrics like `foo:10|g` and `bar:20|g`, you'll see both private charts and your synthetic chart.
When you send metrics like `myapp.metric1:10|g` and `myapp.metric2:20|g`, you'll see both private charts and your synthetic chart. These metric names must match the pattern defined in the `[app]` section (e.g., `myapp.*`) for them to appear in your synthetic charts.

:::note

**Synthetic chart appears empty or is missing?** This happens when the metric names you send don't match the `metrics` pattern in your `[app]` section. StatsD matches incoming metric names against the `metrics` pattern using Netdata's [simple pattern](/docs/developer-and-contributor-corner/libnetdata/simple-patterns) syntax — if a metric name doesn't match, it is never linked to the app's synthetic charts. For example, with `metrics = myapp.*`, sending bare names like `foo:10|g` creates a private chart for `foo` but never feeds the synthetic chart. To fix this, send metric names that include the prefix matching the pattern (e.g., `myapp.foo:10|g`).

:::

<details>
<summary><strong>Synthetic Chart Example</strong></summary>
Expand All @@ -523,14 +529,22 @@ Example of a synthetic chart combining multiple metrics:

The `[app]` section defines the application and has these options:

:::warning

The `[app]` section is a **namespace/container** — it groups metrics and sets defaults, but does **not** create any dashboard charts by itself. To see synthetic charts on the dashboard, you **must** add one or more chart definition sections (e.g., `[mychart]`) below the `[app]` section. If you only define an `[app]` section without chart definitions, the only visible charts will be private charts for individual metrics (if `private charts = yes` or the global default is enabled).

Settings like `private charts`, `gaps when not collected`, and `history` configure how the app's metrics and charts behave — they are not chart-level settings. The `memory mode` setting under `[app]` is currently ignored. See [Chart Definitions](#chart-definitions) below for how to create charts.

:::

:::note

- **name** - Defines the application name
- **metrics** - [Simple pattern](/docs/developer-and-contributor-corner/libnetdata/simple-patterns) matching all metrics for this app
- **private charts** - Enable/disable private charts for matched metrics (yes|no)
- **gaps when not collected** - Show gaps when no metrics are collected (yes|no)
- **memory mode** - Sets memory mode for application charts (optional, default is global Netdata setting)
- **history** - Size of round-robin database (optional, only relevant with `memory mode = save`)
- **memory mode** - Ignored in the `[app]` section; application charts use the host's default memory mode
- **history** - Size of round-robin database for application charts (optional, minimum 5)

:::

Expand Down Expand Up @@ -690,6 +704,14 @@ To rename methods automatically:
This adds dimensions named `GET`, `ADD`, and `DELETE`.
</details>

### Scope of StatsD Chart Configuration

All chart and dimension configuration directives in `/etc/netdata/statsd.d/*.conf` control **local agent behavior only** — they define how the local Netdata agent processes, names, and visualizes statsd metrics it receives.

The dimension `TYPE` field (see [Dimension Format](#dimension-format) above) selects which computed value of a metric a single agent displays. It does **not** control how that metric is combined across multiple Netdata instances.

Cross-node aggregation — how metrics from multiple agents are combined in Netdata Cloud dashboards — is governed by the Cloud query engine, not by statsd configuration files. To choose Sum, Average, or another aggregation for a chart in Netdata Cloud, use the [aggregate function](/docs/dashboards-and-charts/charts#aggregate-functions-dropdown) on that chart. There is no `aggregation = SUM` or `aggregation = AVG` directive in statsd.d configuration.

## Using StatsD with Different Languages

<details>
Expand Down Expand Up @@ -925,7 +947,6 @@ Start with this basic configuration:
metrics = k6*
private charts = yes
gaps when not collected = no
memory mode = dbengine
```

</details>
Expand Down Expand Up @@ -963,7 +984,6 @@ Here's a complete configuration for k6:
metrics = k6*
private charts = yes
gaps when not collected = no
memory mode = dbengine

[dictionary]
http_req_blocked = Blocked HTTP Requests
Expand Down
Loading