diff --git a/docs/Alerts & Notifications/Alerts & Notifications.mdx b/docs/Alerts & Notifications/Alerts & Notifications.mdx index 8bc926f1db..5db6605b16 100644 --- a/docs/Alerts & Notifications/Alerts & Notifications.mdx +++ b/docs/Alerts & Notifications/Alerts & Notifications.mdx @@ -3,7 +3,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/health/READ sidebar_label: "Alerts & Notifications" learn_status: "Published" learn_rel_path: "Alerts & Notifications" -sidebar_position: "130" +sidebar_position: "140" learn_link: "https://learn.netdata.cloud/docs/alerts-&-notifications" slug: "/alerts-&-notifications" --- diff --git a/docs/Collecting Metrics/SNMP Profile Format.mdx b/docs/Collecting Metrics/SNMP Profile Format.mdx index cea29208fb..d4aabf4b7b 100644 --- a/docs/Collecting Metrics/SNMP Profile Format.mdx +++ b/docs/Collecting Metrics/SNMP Profile Format.mdx @@ -28,6 +28,7 @@ It tells the Netdata SNMP collector: - which **OIDs** to query - how to **interpret** the returned values - how to **transform** them into **metrics**, **dimensions**, **tags**, and **metadata** +- which rows are **regular metrics** and which rows are **SNMP topology** observations Profiles make it possible to describe _entire device families_ (switches, routers, UPSes, firewalls, printers, etc.) declaratively — so you don’t need to hard-code logic in Go or manually define metrics for each device. @@ -62,6 +63,8 @@ When Netdata connects to an SNMP device, the collector: ├──────────────────────┤ │ metrics │ → OIDs to collect ├──────────────────────┤ +│ topology │ → OIDs to collect for SNMP topology +├──────────────────────┤ │ metric_tags │ → dynamic tags for all metrics ├──────────────────────┤ │ static_tags │ → fixed tags for all metrics @@ -159,6 +162,7 @@ selector: extends: metadata: metrics: +topology: metric_tags: static_tags: virtual_metrics: @@ -170,6 +174,7 @@ virtual_metrics: | [**extends**](#2-extends) | Inherits and merges other base profiles. | | [**metadata**](#3-metadata) | Collects device-level information (host labels). | | [**metrics**](#4-metrics) | Defines which OIDs to collect and how to chart them. | +| [**topology**](#41-topology) | Defines SNMP topology rows and their topology kind. | | [**metric_tags**](#5-metric_tags) | Defines global dynamic tags collected once per device and attached to all metrics. | | [**static_tags**](#6-static_tags) | Defines fixed tags applied to all metrics. | | [**virtual_metrics**](#7-virtual_metrics) | Defines calculated or aggregated metrics based on others. | @@ -283,6 +288,9 @@ metadata: - `model` is collected dynamically. The collector tries the listed OIDs **in order** and uses the **first** one that returns a non-empty value. - These values appear as **device (virtual node) host labels** in the Netdata UI. - They are **not per-metric tags** and are applied to the device itself, not individual charts. +- Metadata fields are available to both regular metrics and topology by default. + Use `consumers: [metrics]` or `consumers: [topology]` only when a field is + intentionally limited to one view. :::tip @@ -384,6 +392,76 @@ virtual_metrics: - { metric: _ifHCOutOctets, table: ifXTable, as: out } ``` +### 4.1 topology + +The `topology` section defines SNMP rows consumed by the SNMP topology collector. +Topology rows are collected through the same scalar and table mechanics as +regular metrics, but they are not exported as charts. Instead, each row is routed +to a topology handler through its closed `kind` value. + +Use top-level `topology:` when the row describes a topology actor, link, VLAN, +bridge, FDB, ARP, LLDP, CDP, STP, VTP, or interface-mapping observation. + +```yaml +topology: + - kind: lldp_rem + MIB: LLDP-MIB + table: + OID: 1.0.8802.1.1.2.1.4.1 + name: lldpRemTable + symbols: + - OID: 1.0.8802.1.1.2.1.4.1.1.6 + name: lldp_rem + metric_tags: + - tag: lldp_loc_port_num + index: 2 + - tag: lldp_rem_index + index: 3 + - tag: lldp_rem_sys_name + symbol: + OID: 1.0.8802.1.1.2.1.4.1.1.9 + name: lldpRemSysName +``` + +**Rules**: + +- `kind` is required and must be one of the closed topology kinds below. +- Topology row symbol names must not start with `_`. +- Topology rows do not use chart/export-only fields such as `chart_meta`, + `metric_type`, `mapping`, `transform`, `scale_factor`, `format`, or + `constant_value_one` on the row value symbol. +- `metric_tags` inside a topology row work like table metric tags and identify + or enrich the topology row. +- `systemUptime` stays under `metrics:` for regular SNMP collection. It is not a + topology kind and should not be declared under `topology:`. + +Valid topology kinds: + +```text +lldp_loc_port +lldp_loc_man_addr +lldp_rem +lldp_rem_man_addr +lldp_rem_man_addr_compat +cdp_cache +if_name +if_status +if_duplex +ip_if_index +bridge_port_if_index +fdb_entry +qbridge_fdb_entry +qbridge_vlan_entry +stp_port +vtp_vlan +arp_entry +arp_legacy_entry +``` + +Topology mixins can be inherited through `extends` just like metric mixins. When +two inherited topology rows collide, the identity is `kind + table identity + +symbol name`, matching regular table metric merge behavior. + #### Scalar symbol fallbacks You can express “try this OID, otherwise try that OID” by declaring **multiple scalar metrics with the same** `symbol.name`, each pointing to a different OID. At runtime the collector **GETs** all declared scalar OIDs, marks missing ones, and **emits** the metric from whichever OID returns data. Missing OIDs are skipped cleanly. @@ -440,6 +518,10 @@ metric_tags: - Each tag is collected once per device, not per metric or per table row. - The resulting tag values are attached to **all metrics** collected by the profile. - Tags can be transformed (for example, reformatted or mapped) using the same rules as per-metric tags. +- Top-level `metric_tags` are available to both regular metrics and topology by + default. In topology they become device/profile labels, not per-row dispatch + keys. Use `consumers: [metrics]` or `consumers: [topology]` only when a tag is + intentionally limited to one view. :::tip @@ -1867,6 +1949,8 @@ metrics: - Virtual metrics are **calculated metrics** built from other metrics in your profile (or inherited ones). - They don’t query SNMP; they **reuse existing metric values** to create totals, fallbacks, or per-row aggregations. - Once computed, they behave like normal metrics: charted, tagged, and alertable. +- Virtual metrics are part of the regular metrics view. A virtual metric cannot + depend on both regular metric rows and topology rows. Common use cases: diff --git a/docs/Dashboards and Charts/Dashboards and Charts.mdx b/docs/Dashboards and Charts/Dashboards and Charts.mdx index 556be3cb5e..8075836bb6 100644 --- a/docs/Dashboards and Charts/Dashboards and Charts.mdx +++ b/docs/Dashboards and Charts/Dashboards and Charts.mdx @@ -3,7 +3,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/dashboards sidebar_label: "Dashboards and Charts" learn_status: "Published" learn_rel_path: "Dashboards and Charts" -sidebar_position: "150" +sidebar_position: "160" learn_link: "https://learn.netdata.cloud/docs/dashboards-and-charts" slug: "/dashboards-and-charts" --- diff --git a/docs/Developer and Contributor Corner/Developer and Contributor Corner.mdx b/docs/Developer and Contributor Corner/Developer and Contributor Corner.mdx index 59b1bca22a..342b02d91b 100644 --- a/docs/Developer and Contributor Corner/Developer and Contributor Corner.mdx +++ b/docs/Developer and Contributor Corner/Developer and Contributor Corner.mdx @@ -3,7 +3,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/developer- sidebar_label: "Developer and Contributor Corner" learn_status: "Published" learn_rel_path: "Developer and Contributor Corner" -sidebar_position: "170" +sidebar_position: "180" learn_link: "https://learn.netdata.cloud/docs/developer-and-contributor-corner" slug: "/developer-and-contributor-corner" --- diff --git a/docs/Live View/Live View.mdx b/docs/Live View/Live View.mdx index 7ca0297e1a..3ab9eef764 100644 --- a/docs/Live View/Live View.mdx +++ b/docs/Live View/Live View.mdx @@ -4,7 +4,7 @@ sidebar_label: "Live View" learn_status: "Published" learn_rel_path: "Live View" description: "Present the Netdata Functions what these are and why they should be used." -sidebar_position: "120" +sidebar_position: "130" learn_link: "https://learn.netdata.cloud/docs/live-view" slug: "/live-view" --- diff --git a/docs/Logs/Logs.mdx b/docs/Logs/Logs.mdx index 1fa9e7e422..1ea99af4ef 100644 --- a/docs/Logs/Logs.mdx +++ b/docs/Logs/Logs.mdx @@ -1,6 +1,6 @@ --- sidebar_label: "Logs" -sidebar_position: "110" +sidebar_position: "120" hide_table_of_contents: true learn_status: "AUTOGENERATED" slug: "/logs" diff --git a/docs/Netdata AI/Netdata AI.mdx b/docs/Netdata AI/Netdata AI.mdx index 272310905b..9478231fd5 100644 --- a/docs/Netdata AI/Netdata AI.mdx +++ b/docs/Netdata AI/Netdata AI.mdx @@ -3,7 +3,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/category-o sidebar_label: "Netdata AI" learn_status: "Published" learn_rel_path: "Netdata AI" -sidebar_position: "140" +sidebar_position: "150" learn_link: "https://learn.netdata.cloud/docs/netdata-ai" slug: "/netdata-ai" --- diff --git a/docs/Netdata Agent/Installation/Linux/Linux.mdx b/docs/Netdata Agent/Installation/Linux/Linux.mdx index 9109db2657..9a6c671527 100644 --- a/docs/Netdata Agent/Installation/Linux/Linux.mdx +++ b/docs/Netdata Agent/Installation/Linux/Linux.mdx @@ -121,7 +121,7 @@ The user running the script needs write and execute permissions in the temporary Before running the installation script, you can verify its integrity using the following command: ```bash -[ "39321e7a8e05f0054f93df1824189abd" = "$(curl -Ss https://get.netdata.cloud/kickstart.sh | md5sum | cut -d ' ' -f 1)" ] && echo "OK, VALID" || echo "FAILED, INVALID" +[ "1f92a740bd8857893d4d66e5887acd16" = "$(curl -Ss https://get.netdata.cloud/kickstart.sh | md5sum | cut -d ' ' -f 1)" ] && echo "OK, VALID" || echo "FAILED, INVALID" ``` If the script is valid, this command will return `OK, VALID`. We recommend verifying script integrity before installation, especially in production environments. diff --git a/docs/Network Flows/Anti-patterns.mdx b/docs/Network Flows/Anti-patterns.mdx new file mode 100644 index 0000000000..98a12690e7 --- /dev/null +++ b/docs/Network Flows/Anti-patterns.mdx @@ -0,0 +1,160 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/anti-patterns.md" +sidebar_label: "Anti-patterns" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "120" +learn_link: "https://learn.netdata.cloud/docs/network-flows/anti-patterns" +slug: "/network-flows/anti-patterns" +--- + + +# Anti-patterns and pitfalls + +Flow data is powerful but easy to misuse. The mistakes below are the ones that cause the most lost analyst time and the most wrong conclusions in real deployments. Each entry explains how the mistake happens, what it costs, and how to avoid it. + +## 1. Reading aggregate volume without filtering + +**The mistake.** You open the Network Flows tab, see a total bandwidth number, and assume it represents your real traffic. + +**Why it's wrong.** Routers normally export both ingress and egress flow records on every monitored interface. A single packet entering interface A and leaving interface B produces two records — one tagged ingress on A, one tagged egress on B. With one router and the standard configuration, summing all flow records gives you roughly **2× the actual traffic**. Add a second router on the same path and you see 4×. + +**What it costs.** You think your link carries 2 Gbps when it really carries 1 Gbps. Capacity decisions based on these numbers are wrong by a factor of 2 or more. + +**How to avoid it.** Always filter by one exporter and one direction (Input Interface OR Output Interface, not both) when reading absolute volume numbers. To validate: compare to SNMP interface counters on the same interface — values should be close. + +## 2. Ignoring the sampling rate + +**The mistake.** Your router is configured to sample 1-in-1000 packets. Nobody documented this. The dashboard shows 5 Mbps. You assume that's your traffic. + +**Why it's wrong.** With sampling, a flow record represents one observed packet out of every N. Netdata multiplies bytes and packets by the sampling rate at ingestion, so the dashboard numbers are estimates of actual traffic — *if* the multiplication is consistent. When sampling rates differ across exporters in the same query, the aggregate becomes a blend of estimates that is hard to interpret correctly. + +**What it costs.** Volume analysis is off by orders of magnitude when the rate isn't documented. Small flows are statistically invisible — at 1-in-1000, a single-packet flow has a 99.9% chance of being missed entirely. Security investigations miss low-volume threats like beaconing and probing. + +**How to avoid it.** + +- Use a uniform sampling rate across your network, or run unsampled where flow rates allow. +- For Internet-edge security work, use 1-in-100 or unsampled. Sampling at 1-in-1000 hides small flows. +- Document sampling rates per exporter and audit them quarterly. +- Cross-check flow-derived bandwidth against SNMP. If they diverge by more than 30%, investigate before trusting the data. + +## 3. Trusting GeoIP for internal IPs + +**The mistake.** You enable GeoIP enrichment. Internal IPs (10.x, 172.16-31.x, 192.168.x) appear in random countries on the geographic map. + +**Why it's wrong.** GeoIP databases don't have entries for private IP ranges. Netdata doesn't skip private IPs — it just hands the IP to the database and uses what comes back. With the stock DB-IP database, private ranges are tagged so they render as "AS0 Private IP Address Space" with empty country. With third-party databases, results vary. Some return spurious country data for RFC 1918 addresses. + +**What it costs.** Geographic anomalies look like security incidents. Analysts waste time investigating "traffic from China" that's actually traffic to a server in the 10.x.x.x range. + +**How to avoid it.** Configure your internal IP ranges as static metadata before relying on geographic analysis. Use the [`networks`](/docs/network-flows/enrichment-concepts/static-metadata) block to declare each internal CIDR with a name, role, and (optionally) overridden country. The labels you set there override whatever GeoIP returns. Validate by spot-checking known IPs against the map. + +## 4. Alerting on absolute volume thresholds + +**The mistake.** You configure an alert: "page me if any IP sends more than 10 GB in an hour." + +**Why it's wrong.** That threshold is a guess. Your backup server legitimately sends 500 GB/hour. An attacker exfiltrating 200 MB/hour is invisible. + +**What it costs.** The alert is either constant noise (false positives) or completely silent (false negatives). Either way, alerts get ignored. + +**How to avoid it.** Establish baselines first. Compare current traffic to the same time period in previous weeks (Tuesday 10 AM vs the average of the last four Tuesdays at 10 AM). Alert on deviation from the baseline, not on absolute values. + +(Netdata's alerting on flow data is in development; for now this pattern lives in your monitoring practice, not in the plugin.) + +## 5. Collecting flows but never looking at them + +**The mistake.** Flow export is enabled on every router. Storage fills up. Nobody opens the dashboard between incidents. + +**Why it's wrong.** Flow data is only useful when someone actively interprets it. Without baselines, watchlists, and routine review, you have data without insight. + +**What it costs.** When an incident happens, you don't know what "normal" looks like, so you can't recognise abnormal. Storage and CPU are spent without operational value. + +**How to avoid it.** Schedule a weekly 15-minute review. Document what "normal" looks like — top 10 talkers, traffic curve shape, protocol distribution, geographic distribution. Add anything new that appears in the top-10 to a watchlist for investigation. Use [Investigation Playbooks](/docs/network-flows/investigation-playbooks) for the recurring questions. + +## 6. Confusing flows with sessions + +**The mistake.** You see 50 000 flow records in an hour and report it as "we had 50 000 user sessions". + +**Why it's wrong.** A flow record is a network-level artifact, not an application session. A single page load generates dozens of flows: DNS lookups, the TCP handshake, the TLS handshake, HTTP requests for embedded resources, telemetry pings. A long file transfer may be one flow or many, depending on timeout configuration. + +**What it costs.** Wildly inflated user activity numbers. Misinterpretation of usage patterns. + +**How to avoid it.** Aggregate by source IP and time window for a session-like view. Use ports and protocols to classify, not to count transactions. If you need real session data, use application logs or APM, not flow records. + +## 7. NAT blindness + +**The mistake.** You place the collector outside a NAT gateway because mirroring traffic there is easier. + +**Why it's wrong.** Every internal host appears as the same public IP after NAT. You can't identify the actual source of the traffic. + +**What it costs.** Your top talker is "the firewall". Security can't find the infected host, capacity can't identify the bandwidth hog. + +**How to avoid it.** Collect inside each NAT boundary, or correlate flow data with NAT translation logs (`iptables NFLOG`, vendor NAT logging) to map external 5-tuples back to internal hosts. + +## 8. Geographic firewall of shame + +**The mistake.** You configure an alert: "page security if traffic goes to any country except the home country." + +**Why it's wrong.** CDNs, cloud providers, and SaaS endpoints serve from edge nodes worldwide. Traffic to the same SaaS provider may resolve to Singapore one day and Frankfurt the next. None of this is suspicious. + +**What it costs.** Constant false positives. Trust in the alerting system collapses. Real anomalies get ignored among the noise. + +**How to avoid it.** Whitelist known cloud and CDN ASNs. Use ASN as the primary signal and country as secondary corroboration. If you must alert on country, alert only on countries you have no business relationship with — and review the whitelist quarterly. + +## 9. Treating flow duration as latency + +**The mistake.** You divide flow bytes by flow duration and present that as "speed", or use duration as a proxy for round-trip time. + +**Why it's wrong.** Flow duration is dominated by the active timeout setting and application think time. A flow with a 60-second active timeout is exported every 60 seconds whether the network is fast or slow. There's no relationship between flow duration and latency. + +**What it costs.** False conclusions about network performance. Misdirected troubleshooting. + +**How to avoid it.** Use SNMP for interface utilisation, ICMP probes for round-trip time, APM tools for application performance. Flow data answers "how much" and "between whom", never "how fast". + +## 10. Trying to detect microbursts + +**The mistake.** Users complain about momentary slowness. You look in flow data for the burst. + +**Why it's wrong.** NetFlow active timeout aggregates traffic into windows of 60 seconds or more. sFlow random sampling misses bursts that occur between sampled packets. Neither protocol can resolve sub-second events. The Netdata time-series view also clamps to 60-second buckets. + +**What it costs.** You spend time looking for something flow data physically cannot show. + +**How to avoid it.** For microburst detection use packet capture, switch microburst counters, or hardware-assisted telemetry. Flow data is for sustained patterns, not millisecond events. + +## 11. Reasoning from raw byte counts when sampling is on + +**The mistake.** You see `RAW_BYTES = 5000` for a flow and assume 5000 bytes was the actual traffic. + +**Why it's wrong.** `RAW_BYTES` is the unscaled byte count from the exporter. With sampling at 1-in-1000, the actual traffic was approximately 5 000 000 bytes. The scaled value is in `BYTES`. + +**How to avoid it.** Use `BYTES` (auto-scaled) for normal analysis. Use `RAW_BYTES` only when sampling is uniform across all exporters and you specifically need exact pre-scaling counts. + +## 12. Comparing flow counts across protocols + +**The mistake.** You report "Arista switches see far more flows than Cisco routers" based on flow counts. + +**Why it's wrong.** NetFlow aggregates millions of packets into one flow record. sFlow exports individual packet samples — each becomes its own "flow" record. Their counts are not comparable. Same goes for sampling-rate differences across exporters. + +**How to avoid it.** Aggregate by IP/port/time window before comparing. Compare bytes (after scaling), not flow counts. Document which protocol each exporter speaks. + +## Summary + +| Mistake | One-line fix | +|---|---| +| Doubled aggregate | Filter by exporter + interface + direction | +| Ignored sampling | Document and uniform-rate; cross-check SNMP | +| GeoIP for internal IPs | Configure internal CIDRs in `enrichment.networks` | +| Absolute thresholds | Baseline first, alert on deviation | +| Collect-and-ignore | Weekly 15-minute review with documented baselines | +| Flows ≠ sessions | Aggregate by IP and time window | +| NAT blindness | Collect inside the NAT boundary | +| Geographic firewall of shame | Use ASN, whitelist cloud and CDN providers | +| Duration as latency | Use SNMP/ICMP/APM for latency | +| Microburst hunting | Use packet capture or hardware telemetry | +| Raw bytes when sampling | Use `BYTES`, not `RAW_BYTES`, unless rates are uniform | +| Cross-protocol flow counts | Use bytes (scaled), not flow counts | + +## What's next + +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to confirm your data is trustworthy. +- [Investigation Playbooks](/docs/network-flows/investigation-playbooks) — Step-by-step recipes for common questions. +- [Sources](/docs/network-flows/sources/netflow) — Per-protocol behaviour that drives many of these gotchas. diff --git a/docs/Network Flows/BGP Routing/BGP Routing.mdx b/docs/Network Flows/BGP Routing/BGP Routing.mdx new file mode 100644 index 0000000000..9687845645 --- /dev/null +++ b/docs/Network Flows/BGP Routing/BGP Routing.mdx @@ -0,0 +1,25 @@ +--- +sidebar_position: "140" +sidebar_label: "BGP Routing" + +hide_table_of_contents: true +learn_status: "AUTOGENERATED" +slug: "/network-flows/bgp-routing" +learn_link: "https://learn.netdata.cloud/docs/network-flows/bgp-routing" +--- + +# BGP Routing + +import { Grid, Box } from '@site/src/components/Grid_integrations'; + + + + + + + + + + + + diff --git a/docs/Network Flows/BGP Routing/BMP BGP Monitoring Protocol.mdx b/docs/Network Flows/BGP Routing/BMP BGP Monitoring Protocol.mdx new file mode 100644 index 0000000000..39f29a339b --- /dev/null +++ b/docs/Network Flows/BGP Routing/BMP BGP Monitoring Protocol.mdx @@ -0,0 +1,243 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "BMP (BGP Monitoring Protocol)" +learn_status: "Published" +learn_rel_path: "Network Flows/BGP Routing" +keywords: [bmp, bgp, rfc 7854, route monitoring, cisco, juniper, frr] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/bgp-routing/bmp-bgp-monitoring-protocol" +slug: "/network-flows/bgp-routing/bmp-bgp-monitoring-protocol" +--- + + +# BMP (BGP Monitoring Protocol) + + + + + +Plugin: netflow-plugin +Module: bmp + + + +## Overview + +BMP (BGP Monitoring Protocol, RFC 7854) lets a router push its BGP route updates +to a passive collector. With this integration enabled, Netdata acts as that +collector -- it listens for BMP TCP connections from your routers, parses the BGP +UPDATE messages, and builds an in-memory routing table that flow enrichment then +reads from. + +The result: every flow gets accurate AS numbers, AS paths, communities, and +next-hop information from your real-time BGP table -- not from a stale GeoIP +database or from whatever the exporter happened to send in the flow record. + +For the full BGP-routing concept (shared trie with BioRIS, withdrawal handling, +per-vendor caveats, integration test gap), see +[BGP Routing](https://learn.netdata.cloud/docs/network-flows/enrichment/bgp-routing). + + +The plugin runs a TCP listener on `0.0.0.0:10179` (Akvorado convention -- not the +IANA-registered port 7854). Routers initiate BMP sessions to it. The plugin +processes Initiation, Termination, RouteMonitoring (BGP UPDATE messages), and +PeerDownNotification messages. NLRI types: IPv4/IPv6 unicast, MPLS-labelled, VPNv4, +VPNv6, EVPN IP-prefix. + +BMP and BioRIS share a single in-memory routing trie. Memory grows with the size +of the BGP table; a full IPv4+IPv6 feed is roughly 1.2M prefixes per peer. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Set enrichment.routing_dynamic.bmp.enabled to true and configure your routers. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### BMP-capable routers + +Modern Cisco IOS-XR, Juniper JunOS, Arista EOS, and FRR all support BMP v3. +The plugin parses RFC 7854 BMP v3 specifically. Older versions (v1, v2) are +not supported and will fail to parse. + + +#### TCP reachability between routers and the agent + +Routers initiate the connection -- the plugin is a passive listener. Allow +inbound TCP on the configured port (default 10179) from each BMP-speaking +router to the agent. + + +#### No TLS, no authentication + +The listener accepts plain TCP only. Restrict access at the firewall and on a +dedicated management network -- do not expose 10179 to the public internet. + + + +### Configuration + +#### Options + +All BMP options live under `enrichment.routing_dynamic.bmp` in `netflow.yaml`. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enabled | Master switch. Set to true to start the listener. | false | no | +| listen | TCP bind address (host:port). | 0.0.0.0:10179 | no | +| keep | Grace window after a BMP disconnect before purging that session's routes. | 5m | no | +| max_consecutive_decode_errors | Close the session after N consecutive decode errors. | 8 | no | +| receive_buffer | Optional SO_RCVBUF per connection in bytes (0 = OS default). | 0 | no | +| collect_asns | When false, AS numbers from BMP are forced to 0. | true | no | +| collect_as_paths | When false, AS paths are dropped before storage. | true | no | +| collect_communities | When false, communities and large communities are dropped. | true | no | +| rds | Whitelist of accepted Route Distinguishers. Empty = accept all. Formats: "0", "ASN:idx", "IPv4:idx", or full text. | [] | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Enable BMP listener + +Start the listener on the default port. + +```yaml +enrichment: + routing_dynamic: + bmp: + enabled: true + listen: "0.0.0.0:10179" + keep: 5m + +``` +###### Cisco IOS-XR router config + +Vendor-side config to send BMP to Netdata. The bmp server block is global, not under router bgp. + +
+Config + +```yaml +bmp server 1 + host 10.0.0.10 port 10179 + description "Netdata BMP collector" + initial-delay 5 + stats-reporting-period 60 + initial-refresh delay 30 spread 2 +! +router bgp 65000 + neighbor 192.0.2.1 + bmp-activate server 1 + +``` +
+ +###### Juniper JunOS router config + +Recommended local-address and statistics-timeout for production. + +
+Config + +```yaml +set routing-options bmp station netdata station-address 10.0.0.10 +set routing-options bmp station netdata station-port 10179 +set routing-options bmp station netdata connection-mode active +set routing-options bmp station netdata local-address 10.0.0.1 +set routing-options bmp station netdata statistics-timeout 60 +set routing-options bmp station netdata route-monitoring pre-policy + +``` +
+ +###### FRR (bgpd) router config + +Critical -- BMP is a runtime module in FRR. Without "-M bmp" in +/etc/frr/daemons (bgpd_options), every BMP command silently fails. + + +
+Config + +```yaml +# /etc/frr/daemons: +# bgpd_options=" -A 127.0.0.1 -M bmp" +router bgp 65000 + bmp targets netdata + bmp connect 10.0.0.10 port 10179 min-retry 5000 max-retry 60000 + bmp stats interval 60000 + bmp monitor ipv4 unicast pre-policy + bmp monitor ipv6 unicast pre-policy + exit + +``` +
+ + + +### Listener not receiving BMP sessions + +Check `show bmp` (Cisco) / `show bmp connections` (Juniper) / `show bmp targets` (FRR) +to confirm the router has dialed in. The plugin does not initiate -- it listens. +Firewall: allow inbound TCP on 10179. + + +### Memory growth + +A full BGP feed adds ~1.2M prefixes per peer permanently. There is no time-based +eviction. Plan capacity accordingly. After a router disconnect, the routes +for that session are kept for `keep` (default 5 min) before purging. + + +### Integration-test gap + +BMP message parsing has unit tests. The TCP listener path, framed decode loop, +trie apply, and per-router cleanup are NOT integration-tested. Validate against +your specific router firmware before depending on this for capacity / security +decisions. + + + diff --git a/docs/Network Flows/BGP Routing/bio-rd RIPE RIS.mdx b/docs/Network Flows/BGP Routing/bio-rd RIPE RIS.mdx new file mode 100644 index 0000000000..de1049db42 --- /dev/null +++ b/docs/Network Flows/BGP Routing/bio-rd RIPE RIS.mdx @@ -0,0 +1,201 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "bio-rd / RIPE RIS" +learn_status: "Published" +learn_rel_path: "Network Flows/BGP Routing" +keywords: [bioris, bio-rd, ripe ris, bgp, grpc, route information service] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/bgp-routing/bio-rd-ripe-ris" +slug: "/network-flows/bgp-routing/bio-rd-ripe-ris" +--- + + +# bio-rd / RIPE RIS + + + + + +Plugin: netflow-plugin +Module: bioris + + + +## Overview + +BioRIS lets Netdata consume BGP routing data from a [bio-rd](https://github.com/bio-routing/bio-rd) +`cmd/ris/` daemon over gRPC. bio-rd is a Go-based BGP daemon that can peer with +[RIPE RIS](https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris) +Route Collectors -- or any BGP / BMP source you have access to -- and expose the +resulting RIB through a gRPC interface. Netdata is a client of that interface. + +Use this when you want a third-party view of the BGP routing table (e.g., RIPE +RIS's view) without running a BGP session yourself or deploying BMP across your +network. + +For the full BGP-routing concept and how BMP and BioRIS share the same trie, see +[BGP Routing](https://learn.netdata.cloud/docs/network-flows/enrichment/bgp-routing). + + +The plugin connects to one or more bio-rd `ris` gRPC endpoints. It runs three RPCs: +`GetRouters` to discover what's available, `DumpRIB` to do baseline reconciliation, +and `ObserveRIB` for incremental updates. Multiple instances are additive (not +failover); they all merge into the shared in-memory trie. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Set enrichment.routing_dynamic.bioris.enabled to true and provide at least one ris_instances entry. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### A running bio-rd 'ris' daemon + +bio-rd is a separate project. The plugin only consumes its gRPC interface; it +does not bundle bio-rd. You install it yourself: + +```bash +# Install Go (>=1.20), then: +git clone https://github.com/bio-routing/bio-rd.git +cd bio-rd/cmd/ris +go build -o /usr/local/bin/ris . +``` + +Configure `ris` to peer with one or more BGP / BMP sources (RIPE RIS Route +Collectors, your own peers, etc.). Refer to the bio-rd documentation for the +peering setup -- this is bio-rd's configuration, not Netdata's. + +Run the daemon with a gRPC port: +`/usr/local/bin/ris --grpc_port 50051 --config.file /etc/bio-rd.yml` + + +#### Network reachability + no auth + +The gRPC connection is plain HTTP/2 by default (or TLS with system-CA when +`grpc_secure: true`). There is no authentication. Restrict access at the +firewall, or run bio-rd on the same host as the agent and bind it to localhost. + + + +### Configuration + +#### Options + +BioRIS options live under `enrichment.routing_dynamic.bioris`. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enabled | Master switch. | false | no | +| timeout | Connect + per-RPC timeout. Default is aggressive for public RIS over the internet -- raise if you see "deadline exceeded". | 200ms | no | +| refresh | How often to re-dump every router's RIB from scratch. | 30m | no | +| refresh_timeout | Per-DumpRIB request timeout and per-message stream timeout. | 10s | no | +| ris_instances | List of bio-rd endpoints. Each: grpc_addr, grpc_secure, vrf, vrf_id. Multiple instances are additive (not failover) -- routes from all merge. | [] | yes | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Local bio-rd + +bio-rd running on the same host, plain gRPC. + +```yaml +enrichment: + routing_dynamic: + bioris: + enabled: true + timeout: 2s + refresh: 30m + refresh_timeout: 30s + ris_instances: + - grpc_addr: "127.0.0.1:50051" + grpc_secure: false + +``` +###### Remote bio-rd over TLS + +Across a network, system CA bundle. No client cert / mTLS. + +
+Config + +```yaml +enrichment: + routing_dynamic: + bioris: + enabled: true + timeout: 5s + ris_instances: + - grpc_addr: "ris.example.internal:50051" + grpc_secure: true + vrf: "global" + +``` +
+ + + +### Default 200ms timeout too aggressive + +Over the public internet to RIPE RIS, you may need 2-5 seconds. If you see +"deadline exceeded" errors in the journal, raise `timeout`. + + +### Initial dump takes minutes for full feeds + +A full IPv4+IPv6 RIB from a route collector is millions of prefixes. The first +refresh takes time; subsequent observe streams are incremental. + + +### Integration-test gap + +proto and route conversion are unit-tested. The gRPC client path +(connecting, consuming streams, retry/backoff) is NOT integration-tested. +Validate against your specific bio-rd setup before relying on this for +capacity / security decisions. + + + diff --git a/docs/Network Flows/Configuration.mdx b/docs/Network Flows/Configuration.mdx new file mode 100644 index 0000000000..5834b1a316 --- /dev/null +++ b/docs/Network Flows/Configuration.mdx @@ -0,0 +1,312 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/configuration.md" +sidebar_label: "Configuration" +learn_status: "Published" +learn_rel_path: "Network Flows" +description: "Full reference for netflow.yaml configuration options." +sidebar_position: "40" +learn_link: "https://learn.netdata.cloud/docs/network-flows/configuration" +slug: "/network-flows/configuration" +--- + + +# Configuration + +The netflow plugin reads its configuration from `netflow.yaml`. Defaults are sane out of the box; most operators only adjust three things — the listener address, the journal retention, and (rarely) the per-tier overrides. This page documents every option, with its real default and the file that defines it. + +## Where the file lives + +| Path | Purpose | +|---|---| +| `/etc/netdata/netflow.yaml` | Your configuration. Edits here survive package upgrades. | +| `/usr/lib/netdata/conf.d/netflow.yaml` | The stock file shipped with the package. Reference only. | + +The plugin reads the user file when it exists, and the stock file otherwise. To start customising, copy the stock file: + +```bash +sudo cp /usr/lib/netdata/conf.d/netflow.yaml /etc/netdata/netflow.yaml +``` + +## Three things to know before you edit + +1. **Restart required.** There is no live-reload for plugin configuration. After saving the file, run `sudo systemctl restart netdata`. Only the GeoIP databases reload on a timer; everything else needs a restart. +2. **Strict YAML.** Every section refuses unknown keys. A misspelled key fails the plugin at startup with an error in the journal. If you see "the plugin won't start after my edit", check for a typo before anything else. +3. **CLI flags vs YAML.** When the plugin runs as a Netdata Agent plugin (the normal case), only the YAML is read — CLI flags do nothing. The CLI flags shown below apply only if you run the binary directly outside of Netdata. + +## Top-level layout + +```yaml +enabled: true # global on/off +listener: { ... } # UDP socket and journal sync +protocols: { ... } # which protocols to accept; decapsulation; timestamps +journal: { ... } # tier directories, retention, query guardrails +enrichment: { ... } # GeoIP, classifiers, ASN, BMP, BioRIS, network sources +``` + +The `listener`, `protocols`, and `journal` sections are flattened — their keys can also appear at the top level (the stock file does this for compatibility). Both forms are accepted. + +## `enabled` + +```yaml +enabled: true +``` + +Set to `false` to turn the entire flow plugin off. The plugin still loads but does nothing. Default: `true`. + +## `listener` + +Controls the UDP socket and the journal write cadence. + +```yaml +listener: + listen: "0.0.0.0:2055" + max_packet_size: 9216 + sync_every_entries: 1024 + sync_interval: "1s" +``` + +| Key | CLI flag | Default | Notes | +|---|---|---|---| +| `listen` | `--netflow-listen` | `0.0.0.0:2055` | Address and port for the UDP socket. Same socket handles NetFlow v5/v7/v9, IPFIX, and sFlow. | +| `max_packet_size` | `--netflow-max-packet-size` | `9216` | Maximum UDP datagram in bytes. Increase for jumbo sFlow datagrams or routers that send oversized IPFIX. | +| `sync_every_entries` | `--netflow-sync-every-entries` | `1024` | Flush the raw journal to disk after this many records, regardless of `sync_interval`. | +| `sync_interval` | `--netflow-sync-interval` | `1s` | Maximum time between forced flushes. | + +### UDP buffer tuning is not in this file + +If you receive a high flow rate, the kernel UDP receive buffer matters more than `max_packet_size`. Tune at the kernel level: + +```bash +sudo sysctl -w net.core.rmem_max=33554432 +sudo sysctl -w net.core.rmem_default=8388608 +sudo sysctl -w net.core.netdev_max_backlog=250000 +``` + +Persist these in `/etc/sysctl.d/99-netflow.conf`. The plugin does not call `setsockopt(SO_RCVBUF)` itself; whatever the kernel default is, that's what the listener gets. + +## `protocols` + +```yaml +protocols: + v5: true + v7: true + v9: true + ipfix: true + sflow: true + decapsulation_mode: none + timestamp_source: input +``` + +| Key | CLI flag | Default | Values | +|---|---|---|---| +| `v5` | `--netflow-enable-v5` | `true` | Boolean. NetFlow v5. | +| `v7` | `--netflow-enable-v7` | `true` | Boolean. NetFlow v7 (Catalyst). | +| `v9` | `--netflow-enable-v9` | `true` | Boolean. NetFlow v9. | +| `ipfix` | `--netflow-enable-ipfix` | `true` | Boolean. IPFIX. | +| `sflow` | `--netflow-enable-sflow` | `true` | Boolean. sFlow v5. | +| `decapsulation_mode` | `--netflow-decapsulation-mode` | `none` | `none`, `srv6`, `vxlan`. Strips outer headers from the data-link section, surfaces the inner 5-tuple. | +| `timestamp_source` | `--netflow-timestamp-source` | `input` | Where the dashboard's flow timestamps come from. See below. | + +You must keep at least one protocol enabled or the plugin refuses to start. + +### `timestamp_source` values + +- **`input`** (default) — the time the plugin received the datagram. Charts always look "now". This is the safest choice for dashboards. +- **`netflow_packet`** — the time the exporter put in the NetFlow/IPFIX header. +- **`netflow_first_switched`** — the time the flow actually started, from the per-record first-switched field. Records arrive with timestamps in the past (up to your active timeout). This gives the most accurate timeline but charts may show data appearing "behind" real time. + +## `journal` + +This is the section most operators tune. It controls where flow data lives, how much of it lives, and how the query engine guardrails its work. + +```yaml +journal: + journal_dir: flows + size_of_journal_files: 10GB + duration_of_journal_files: 7d + query_1m_max_window: 6h + query_5m_max_window: 24h + query_max_groups: 50000 + query_facet_max_values_per_field: 5000 + tiers: + raw: { duration_of_journal_files: 24h } + minute_1: { duration_of_journal_files: 14d } + minute_5: { duration_of_journal_files: 30d } + hour_1: { duration_of_journal_files: 365d } +``` + +### Top-level retention + +| Key | Default | Notes | +|---|---|---| +| `journal_dir` | `flows` | Relative paths resolve under `NETDATA_CACHE_DIR` (typically `/var/cache/netdata/flows`). Absolute paths are used as-is. | +| `size_of_journal_files` | `10GB` | Disk budget per tier (not total). Minimum `100MB`. Set to `null` to disable size-based retention. | +| `duration_of_journal_files` | `7d` | Time budget per tier. Set to `null` to disable time-based retention. | + +**Important.** The top-level retention applies to **every tier independently** unless you override it per-tier. So with the defaults, all four tiers (raw, 1m, 5m, 1h) share the same 10GB / 7d budget. **This is rarely what you want.** The whole point of having rollup tiers is to keep them around longer than raw. See per-tier overrides below. + +Either limit triggers rotation. With size = 10GB and duration = 7d, the tier expires whichever is hit first. + +### Per-tier overrides + +```yaml +tiers: + raw: # name in YAML + size_of_journal_files: 50GB + duration_of_journal_files: 24h + minute_1: + duration_of_journal_files: 14d + minute_5: + duration_of_journal_files: 30d + hour_1: + duration_of_journal_files: 365d + size_of_journal_files: null # time-only retention for the long tail +``` + +| YAML name | Aliases | On-disk directory | +|---|---|---| +| `raw` | — | `flows/raw/` | +| `minute_1` | `1m`, `minute-1`, `minute1` | `flows/1m/` | +| `minute_5` | `5m`, `minute-5`, `minute5` | `flows/5m/` | +| `hour_1` | `1h`, `hour-1`, `hour1` | `flows/1h/` | + +The on-disk directory names are short (`1m`, `5m`, `1h`); the YAML keys are explicit (`minute_1`, `minute_5`, `hour_1`). Mind the difference if you go look at the disk. + +For each per-tier knob (`size_of_journal_files`, `duration_of_journal_files`): + +- **Omit the key** to inherit the top-level default. +- Set to `null` to **disable** that limit on this tier. +- Set to a value to override. + +A typical production profile is the example block above: 24 hours of raw, 2 weeks at 1-minute, 30 days at 5-minute, 1 year at 1-hour. This profile keeps detailed forensics within reach while supporting year-over-year capacity trends. + +### Rotation + +Each tier rotates files at `size_of_journal_files / 20`, clamped between 5 MB and 200 MB. Time-based rotation is fixed at one hour per file. You don't configure these directly. + +### Query guardrails + +| Key | Default | What it limits | +|---|---|---| +| `query_1m_max_window` | `6h` | Above this window, the dashboard skips the 1-minute tier and uses the 5-minute or 1-hour tier. | +| `query_5m_max_window` | `24h` | Above this window, the dashboard skips the 5-minute tier and uses the 1-hour tier. | +| `query_max_groups` | `50000` | Maximum groups returned by a single aggregation query. Past this, results overflow into a single `__overflow__` bucket and the response carries a warning. | +| `query_facet_max_values_per_field` | `5000` | Maximum distinct values returned per facet field. | + +The query-window limits are about responsiveness — large windows on fine-grained tiers are slow. The group/value limits are about memory — wide aggregations on high-cardinality fields can blow up. Raise them carefully. + +## `enrichment` + +Enrichment is a large topic and lives in dedicated pages. The top-level enable/disable knobs: + +```yaml +enrichment: + # default_sampling_rate: 1024 # set to override; default is unset (rate=1) + # override_sampling_rate: { 10.1.0.0/16: 1024 } # per-prefix override map + default_sampling_rate: ~ + override_sampling_rate: {} + metadata_static: { exporters: {} } + geoip: { asn_database: [], geo_database: [] } + networks: {} + network_sources: {} + exporter_classifiers: [] + interface_classifiers: [] + classifier_cache_duration: 5m + asn_providers: [flow, routing, geoip] + net_providers: [flow, routing] + routing_static: { prefixes: {} } + routing_dynamic: + bmp: { enabled: false } + bioris: { enabled: false } +``` + +Detailed configuration of each section lives on its own page: + +- [GeoIP](/docs/network-flows/enrichment-concepts/ip-intelligence) +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) +- [Classifiers](/docs/network-flows/enrichment-concepts/classifiers) +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) +- [BMP routing](/docs/network-flows/enrichment-concepts/bgp-routing) +- [BioRIS](/docs/network-flows/enrichment-concepts/bgp-routing) +- [Network sources](/docs/network-flows/enrichment-concepts/network-identity) +- [Decapsulation](/docs/network-flows/enrichment-concepts/decapsulation) + +The enrichment section has no CLI flag — it is YAML-only. + +## Common edits + +### Listen on a different port + +```yaml +listener: + listen: "0.0.0.0:9995" +``` + +### Bind to a specific address + +```yaml +listener: + listen: "10.0.0.10:2055" +``` + +### Disable a protocol you don't use + +```yaml +protocols: + v5: false +``` + +### Move the journal directory + +```yaml +journal: + journal_dir: /var/lib/netflow +``` + +Absolute paths are used as-is. Relative paths resolve under `NETDATA_CACHE_DIR`. + +### Strip VXLAN tunnel headers + +```yaml +protocols: + decapsulation_mode: vxlan +``` + +The plugin reads the inner 5-tuple from `dataLinkFrameSection` records (IPFIX IE 315) when the exporter ships them. + +### Production retention profile + +```yaml +journal: + size_of_journal_files: 100GB + duration_of_journal_files: 7d + tiers: + raw: + size_of_journal_files: 200GB + duration_of_journal_files: 24h + minute_1: + duration_of_journal_files: 14d + minute_5: + duration_of_journal_files: 30d + hour_1: + duration_of_journal_files: 365d + size_of_journal_files: null +``` + +The default 10GB / 7d on every tier is too tight for most production deployments. This profile gives you 24 hours of full-detail forensics, 14 days of 1-minute trends, 30 days of 5-minute snapshots, and a year of hourly aggregates. Storage required scales with your flow rate — see [Sizing and Capacity Planning](/docs/network-flows/sizing-and-capacity-planning). + +## Things that go wrong + +- **The plugin doesn't start.** Check `journalctl -u netdata --since "5 minutes ago" | grep netflow`. The most common cause is a typo in a YAML key (strict mode rejects unknowns). +- **Edits don't take effect.** Restart Netdata. There is no DynCfg integration for the plugin's configuration. +- **CLI flags I added don't do anything.** When running under Netdata, only the YAML is read. +- **Tiers fill up faster than expected.** All tiers share the top-level retention by default. Set explicit per-tier overrides. +- **Queries time out at 30 seconds.** Function calls have a hard 30s timeout in the plugin. If your query is too wide, narrow the time range or add filters that let a higher tier serve it. +- **`__overflow__` appears in results.** A group-by exceeded `query_max_groups` (default 50 000). Either narrow the filter, reduce the number of group-by fields, or raise the limit. + +## What's next + +- [Retention and Querying](/docs/network-flows/retention-and-querying) — How the four tiers work and how the dashboard picks one. +- [Sizing and Capacity Planning](/docs/network-flows/sizing-and-capacity-planning) — How much disk and CPU you need. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to confirm the data is right. +- [Troubleshooting](/docs/network-flows/troubleshooting) — When things break. diff --git a/docs/Network Flows/Enrichment Concepts/ASN Resolution.mdx b/docs/Network Flows/Enrichment Concepts/ASN Resolution.mdx new file mode 100644 index 0000000000..c3aa181b8f --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/ASN Resolution.mdx @@ -0,0 +1,124 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/asn-resolution.md" +sidebar_label: "ASN Resolution" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +sidebar_position: "60" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/asn-resolution" +slug: "/network-flows/enrichment-concepts/asn-resolution" +--- + + +# ASN resolution + +ASN resolution is how Netdata fills in the `SRC_AS`, `DST_AS`, `SRC_AS_NAME`, and `DST_AS_NAME` fields on every flow record. It uses a configurable provider chain — your flow data first, then dynamic routing if you have BMP/BioRIS/static prefixes set up, then the GeoIP database as a fallback. The chain runs independently for source and destination IPs. + +## Numbers vs names + +The two are resolved by completely different code paths. + +**AS numbers** (`SRC_AS`, `DST_AS`) come from the **provider chain** (`asn_providers`). The chain walks providers in order; the first one that returns a non-zero AS wins. + +**AS names** (`SRC_AS_NAME`, `DST_AS_NAME`) always come from the **ASN database** lookup, regardless of the chain. Whichever AS number ends up resolved, the name is rendered as `AS{n} {organisation}` from the ASN MMDB. If the MMDB doesn't know the name, the rendering is `AS{n}` with no trailing label. If the resolved AS is `0`, the rendering is `AS0 Unknown ASN` (or `AS0 Private IP Address Space` if the MMDB tagged the IP as private). + +There is **no static configuration option** for AS names. You can set `enrichment.networks..asn` to override the AS *number*, but the name is always looked up. + +## The provider chain + +```yaml +enrichment: + asn_providers: [flow, routing, geoip] +``` + +This is the default. The plugin walks it left-to-right; the first provider returning a non-zero value wins. + +| Provider | What it reads | Notes | +|---|---|---| +| `flow` | `SRC_AS` / `DST_AS` from the flow record itself | What the exporter sent | +| `flow_except_private` | Same, but treats private/reserved AS numbers as zero | Use when your exporters announce private AS that you don't want to surface | +| `flow_except_default_route` | Same, but treats AS 0 with mask 0 as zero | Use when default-route flows pollute your top-N | +| `routing` | `lookup_routing()` — BMP runtime, BioRIS, static prefixes | Requires routing enrichment to be configured | +| `routing_except_private` | Same as `routing`, with the private filter | | +| `geoip` | **Always returns 0** — terminal short-circuit | See below | + +`geoip` is **terminal**: putting it in the chain stops the chain at that position. Once reached, the chain returns 0, and the GeoIP MMDB's AS data is then re-applied separately (without going through the chain). Because of that, `geoip` is meaningful only as the last entry in the chain — it acts as "let GeoIP fill in if nothing else did". + +If you put `[geoip, flow, routing]`, you effectively set every AS to 0, and only the GeoIP-derived AS makes it through. That is rarely what you want. + +### Common chain configurations + +| Configuration | Behaviour | +|---|---| +| `[flow, routing, geoip]` (default) | Trust the exporter, fall back to routing, then to GeoIP. | +| `[flow, routing]` | No GeoIP at all. Use when you don't trust GeoIP for your traffic mix. | +| `[routing, flow, geoip]` | Trust your BMP/BGP feed first. Use when your routers report stale AS. | +| `[flow_except_private, routing, geoip]` | Drop AS 64512-65534 from flow data; let routing fill in. | + +### What counts as private/reserved + +`is_private_as` returns true for: + +- `0` (unknown / default route) +- `23456` (RFC 4893 transition-period reserved) +- `64496..=65551` (documentation, RFC 6996/RFC 5398/RFC 6793 private/reserved range) +- `>= 4_200_000_000` (32-bit private range and reserved high values) + +These are filtered out by the `*_except_private` variants. + +## The network-prefix chain + +A second chain controls how `SRC_MASK`, `DST_MASK`, and `NEXT_HOP` get resolved: + +```yaml +enrichment: + net_providers: [flow, routing] +``` + +This is the default. Same logic: first non-empty value wins. Only `flow` and `routing` are valid here — there is no `geoip` provider for network attributes. + +## AS overrides via static configuration + +If a flow's source or destination IP falls inside a CIDR you've declared under `enrichment.networks`, and that entry includes an `asn` field, the configured value **overrides whatever the chain produced**: + +```yaml +enrichment: + networks: + 198.51.100.0/24: + name: customer-acme + asn: 64500 # forces SRC_AS / DST_AS = 64500 for traffic in this prefix +``` + +This override is applied after the chain. It only sets the AS number — the name is still resolved from the ASN database (so it'll render as `AS64500` if your MMDB doesn't have a name, or `AS64500 Acme Corp` if it does). + +## What you get out of the box + +With the default `[flow, routing, geoip]` chain, no routing enrichment configured, and the stock ASN MMDB shipped with native packages: + +- `SRC_AS` / `DST_AS` populated whenever the exporter sends them (most NetFlow v9, IPFIX, sFlow exporters do for public IPs) +- `SRC_AS_NAME` / `DST_AS_NAME` populated whenever the IP is in the ASN MMDB +- For internal RFC 1918 addresses: `*_AS = 0`, `*_AS_NAME = AS0 Private IP Address Space` (because the stock MMDB tags private ranges with that flag) +- For unknown public addresses: `*_AS = 0`, `*_AS_NAME = AS0 Unknown ASN` + +If you don't have an ASN MMDB at all, names render as `AS{n}` for non-zero ASNs and `AS0 Unknown ASN` for zero — the dashboard never shows blank cells. + +## Failure modes + +- **ASN MMDB missing.** With `optional: true` (the default for auto-detected files), the plugin starts and AS names render as `AS{n}` or `AS0 Unknown ASN`. With `optional: false` and a configured path, the plugin fails to start. +- **AS not in any provider.** `*_AS = 0`, `*_AS_NAME = AS0 Unknown ASN`. +- **Wrong order of providers.** Putting `geoip` mid-chain truncates everything after it. Putting `routing` before `flow` makes routing data win over what the exporter sent — fine if your BGP feed is more accurate than your exporter's view. +- **Empty `asn_providers`.** No validation rejects this. The plugin starts but every AS number resolves to 0; only `enrichment.networks..asn` overrides can produce non-zero AS. + +## What can go wrong + +- **AS numbers all zero.** Check the chain. If `[geoip, ...]` is the order, `geoip` short-circuits to 0. Reorder to put `geoip` last. +- **Wrong AS for a known prefix.** Likely the exporter's view differs from the BGP table. Override per-prefix via `enrichment.networks..asn`, or reorder the chain to `[routing, flow, geoip]`. +- **Names show `AS{n}` without an organisation.** The MMDB doesn't have a name for that AS. Either accept it or use a richer MMDB. +- **Names show wrong organisation.** ASN ownership data is best-effort and lags real-world transfers by weeks. Refresh the MMDB. If that doesn't help, file an issue with the database vendor — Netdata is a passive consumer. + +## What's next + +- [GeoIP](/docs/network-flows/enrichment-concepts/ip-intelligence) — How the ASN MMDB gets installed and refreshed. +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Per-prefix AS overrides and network labels. +- [BMP routing](/docs/network-flows/enrichment-concepts/bgp-routing) — Live BGP feed as an AS source for the `routing` provider. +- [BioRIS](/docs/network-flows/enrichment-concepts/bgp-routing) — RIPE RIS as an AS source for the `routing` provider. +- [Network sources](/docs/network-flows/enrichment-concepts/network-identity) — HTTP-fetched prefix metadata. diff --git a/docs/Network Flows/Enrichment Concepts/BGP Routing.mdx b/docs/Network Flows/Enrichment Concepts/BGP Routing.mdx new file mode 100644 index 0000000000..79ace435a5 --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/BGP Routing.mdx @@ -0,0 +1,119 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/bgp-routing.md" +sidebar_label: "BGP Routing" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +description: "How live BGP routes (BMP, BioRIS) feed AS path, communities, and next-hop into flow records." +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/bgp-routing" +slug: "/network-flows/enrichment-concepts/bgp-routing" +--- + + +# BGP Routing + +BGP-routing enrichment fills `SRC_AS`, `DST_AS`, `SRC_MASK`, `DST_MASK`, `NEXT_HOP`, `DST_AS_PATH`, `DST_COMMUNITIES`, and `DST_LARGE_COMMUNITIES` from a live BGP feed. Two transports are supported, both of which feed the same in-memory routing trie: + +- **BMP** (BGP Monitoring Protocol, RFC 7854) — routers push their BGP updates to Netdata over TCP +- **BioRIS** — Netdata pulls BGP data from a [bio-rd](https://github.com/bio-routing/bio-rd) `cmd/ris/` daemon over gRPC + +This page covers the **cross-cutting concept**: how the trie works, how the two sources combine, what survives a restart, and what to expect operationally. For per-protocol setup, follow the integration cards on Learn (BMP and bio-rd / RIPE RIS). + +## What gets enriched + +BMP and BioRIS populate the same fields. When a flow's source or destination IP matches a learned BGP route: + +| Field | Side | Notes | +|---|---|---| +| `SRC_AS` / `DST_AS` | both | When the `routing` provider in `asn_providers` chain reaches BGP data | +| `SRC_MASK` / `DST_MASK` | both | When the `routing` provider in `net_providers` chain reaches BGP data | +| `NEXT_HOP` | dest only | BGP next-hop from the destination route | +| `DST_AS_PATH` | dest only | Full BGP AS path (CSV of ASNs) | +| `DST_COMMUNITIES` | dest only | Standard BGP communities (CSV of u32) | +| `DST_LARGE_COMMUNITIES` | dest only | RFC 8092 large communities | + +Two notes: + +- AS *names* (`*_AS_NAME`) come from the [GeoIP/ASN MMDB](/docs/network-flows/enrichment-concepts/ip-intelligence), not BGP. BGP gives you accurate AS *numbers* and path/communities; the names come from the ASN database. +- Source-side AS path and communities are **not** surfaced. BGP path attributes are most meaningful for the destination of the traffic. + +## Shared trie + +Both BMP and BioRIS populate a single in-memory routing trie keyed by IP prefix. Each prefix entry holds a list of routes (one per `(peer, route_key)` tuple), so multipath BGP and multiple BGP peers contributing the same prefix coexist cleanly. + +When both BMP and BioRIS are enabled, they contribute to the same trie. Lookups pick the best-matching route across both sources, preferring routes whose exporter or next-hop matches the flow being enriched, falling back to longest-prefix-match. + +This is intentional: a deployment that runs BMP from internal routers and BioRIS for external (RIPE RIS) views gets unified enrichment without duplicate trie entries. + +## Memory growth + +The trie has **no time-based eviction**. Routes are only removed via: + +- Explicit BGP withdrawal (`MP_UNREACH`, `withdraw_routes`) +- Peer Down notification (BMP) — clears all routes for the affected peer +- TCP disconnect (BMP) followed by the `keep` interval expiring (default 5 minutes) — clears all routes for that session +- bio-rd refresh cycle — explicit removal of routes for routers that have disappeared + +A full IPv4+IPv6 BGP table is roughly 1.2M prefixes per peer (2026 figures). Each entry stores the AS-path `Vec`, communities `Vec`, large communities `Vec<(u32,u32,u32)>`, plus a `route_key` `String` per path. Expect several hundred MB of resident memory per peer for a full feed. + +Plan capacity accordingly. If you run many peers with full feeds, watch the agent's RSS. + +## Restart behaviour + +The trie is **not persisted**. Restarting the netflow plugin wipes BGP-derived data. Routes are re-learned as routers re-send Initiation + Update messages (for BMP) or as bio-rd's next refresh cycle dumps the RIB (BioRIS). + +Convergence times after restart: + +| Source | Typical convergence | +|---|---| +| FRR over BMP | seconds (FRR re-emits everything immediately) | +| Cisco IOS-XR over BMP | minutes (IOS-XR's initial-refresh has a configurable spread) | +| Juniper JunOS over BMP | seconds to minutes (depends on station options) | +| BioRIS over RIPE RIS | minutes (full DumpRIB takes a while for large feeds) | + +Until convergence, BGP-derived enrichment is incomplete. Plan restarts during low-traffic windows if BGP attribution matters for your workflow. + +## Provider chain integration + +BGP-derived routes contribute to flow enrichment via the `routing` entry in the [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) provider chain: + +```yaml +enrichment: + asn_providers: [flow, routing, geoip] # default + net_providers: [flow, routing] # default +``` + +With the defaults, an exporter-supplied AS number wins over BGP. To prefer BGP over the exporter (useful when your BMP/BioRIS feed is more accurate than the exporter's view), reorder: + +```yaml +enrichment: + asn_providers: [routing, flow, geoip] +``` + +`bmp` is accepted as an alias for `routing` in the provider list, for backward compatibility. + +## Integration test gap + +The runtime path of both BMP and BioRIS — TCP listener / gRPC client, framed decode loop, trie apply, per-router cleanup — is **not** integration-tested in this repository. The parsing layers (BMP message parsing, gRPC proto conversion) are well-unit-tested, but end-to-end against real router firmware or real bio-rd daemons is not exercised. + +Implications: + +- The features ship because the parsing is solid and the runtime is built on standard tokio + netgauze + tonic primitives. +- Vendor compatibility (Cisco IOS-XR / IOS-XE, Juniper JunOS, Arista EOS, FRR) is not validated by tests in this repository. +- Treat configuration changes as production-impacting. Validate against your specific gear before relying on BGP-derived data for capacity or security decisions. + +## What can go wrong + +- **No connections forming (BMP).** Routers initiate BMP sessions to the plugin. Check the router side (`show bmp` / `show bmp connections` / `show bmp targets`). The plugin doesn't proactively retry; it waits. +- **gRPC deadline exceeded (BioRIS).** Default timeout 200 ms is aggressive over the public internet. Raise to 2-5 s. +- **Memory growth without bound.** A full BGP feed is permanent (no eviction). Plan capacity. +- **Plugin restart wipes the trie.** Re-converge takes seconds (FRR) to minutes (IOS-XR). Schedule restarts off-peak. +- **AS path inconsistent with the exporter's view.** Different vantage points see different paths. This is normal in BGP. If your exporter and your BMP-feeding router are different boxes with different routing tables, expect divergence. +- **Empty BGP data after enabling.** Check the per-provider integration card for the specific protocol's setup gotchas — e.g., FRR requires `-M bmp` in `/etc/frr/daemons` (otherwise every BMP command silently fails). + +## What's next + +- **BMP** integration card — how to enable the listener, configure routers (Cisco, Juniper, Arista, FRR). +- **bio-rd / RIPE RIS** integration card — how to set up bio-rd, configure the gRPC client. +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) — How BGP plugs into the provider chain. +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Per-prefix overrides that win over BGP. diff --git a/docs/Network Flows/Enrichment Concepts/Classifiers.mdx b/docs/Network Flows/Enrichment Concepts/Classifiers.mdx new file mode 100644 index 0000000000..670a6f16ba --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/Classifiers.mdx @@ -0,0 +1,185 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/classifiers.md" +sidebar_label: "Classifiers" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +sidebar_position: "50" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/classifiers" +slug: "/network-flows/enrichment-concepts/classifiers" +--- + + +# Classifiers + +Classifiers tag exporters and interfaces using small expression-based rules. Where [static metadata](/docs/network-flows/enrichment-concepts/static-metadata) requires you to enumerate every exporter and every ifIndex, classifiers let you write a few rules that match many cases — by name pattern, by IP, by SNMP description, by speed, and so on. + +The plugin's classifier language is **Akvorado-compatible** for the documented operators and actions. It is implemented as a hand-written expression parser in Rust, not jq/jaq, and supports a subset of Akvorado's full expression language. If you've written Akvorado classifiers before, your rules will likely work; if you've written `expr-lang` rules with arithmetic, ternaries, or lambdas, those features are not available here. + +## Two classifier lists + +| Block | Runs | Sees | +|---|---|---| +| `enrichment.exporter_classifiers` | Once per exporter (cached) | Exporter IP and name, current classification fields | +| `enrichment.interface_classifiers` | Once per (exporter, interface) pair, twice per flow (in + out) | Exporter fields, plus interface index/name/description/speed/VLAN, plus current classification | + +Rules are evaluated in YAML order. The plugin short-circuits the list when all classification slots are filled. + +## What a rule can read + +Identifiers available to **exporter classifiers**: + +- `Exporter.IP` — the exporter's IP, as a string +- `Exporter.Name` — the exporter's friendly name (from static metadata, or falls back to the IP) +- `CurrentClassification.Group`, `.Role`, `.Site`, `.Region`, `.Tenant` — values already set (by static metadata, or by an earlier rule) + +Identifiers available to **interface classifiers**: + +- All of the above +- `Interface.Index` — the SNMP ifIndex (integer) +- `Interface.Name`, `Interface.Description` — from static metadata +- `Interface.Speed` — in bits per second +- `Interface.VLAN` — from the flow record's `SRC_VLAN` / `DST_VLAN` (depending on direction) +- `CurrentClassification.Connectivity`, `.Provider`, `.Boundary`, `.Name`, `.Description` — already set + +The plugin does NOT poll SNMP itself, so `Interface.Name` / `Description` / `Speed` come only from what you've configured under `metadata_static`. If you haven't configured them, those identifiers will be empty. + +## What a rule can do + +### Set classification fields + +| Action | Result | +|---|---| +| `Classify("v")` or `ClassifyGroup("v")` | Set `EXPORTER_GROUP` | +| `ClassifyRole("v")` | Set `EXPORTER_ROLE` | +| `ClassifySite("v")` | Set `EXPORTER_SITE` | +| `ClassifyRegion("v")` | Set `EXPORTER_REGION` | +| `ClassifyTenant("v")` | Set `EXPORTER_TENANT` | +| `ClassifyProvider("v")` | Set `IN_IF_PROVIDER` / `OUT_IF_PROVIDER` | +| `ClassifyConnectivity("v")` | Set `IN_IF_CONNECTIVITY` / `OUT_IF_CONNECTIVITY` | +| `ClassifyExternal()` / `ClassifyInternal()` | Set `IN_IF_BOUNDARY` / `OUT_IF_BOUNDARY` | +| `SetName("v")` | Set `IN_IF_NAME` / `OUT_IF_NAME` (or exporter name when in an exporter rule) | +| `SetDescription("v")` | Set `IN_IF_DESCRIPTION` / `OUT_IF_DESCRIPTION` | + +`Classify*Regex(input, pattern, template)` variants exist for every action above. The pattern is a Rust regex; the template uses `$1`, `$2`, `${name}` capture references. + +### Drop the flow + +`Reject()` discards the flow record. Always guard it behind a condition — at top level it drops everything. + +### Format strings + +`Format("...", arg1, arg2)` mimics Go's `fmt.Sprintf` for `%s`, `%v`, `%d`, `%%`. Use it to build values from multiple inputs: + +``` +ClassifyTenant(Format("tenant-%s", Exporter.Name)) +``` + +## What rules can match against + +Operators (highest to lowest precedence): + +| Form | Meaning | +|---|---| +| `value == X`, `value != X` | equality / inequality | +| `value > X`, `value >= X`, `value < X`, `value \<= X` | numeric or lexicographic comparison | +| `value in [a, b, c]` | membership | +| `value contains "x"` | substring (string only) | +| `value startsWith "x"`, `value endsWith "x"` | prefix / suffix (string only) | +| `value matches "pattern"` | regex match (Rust regex) | +| `cond1 && cond2`, `cond1 and cond2` | logical AND | +| `cond1 \|\| cond2`, `cond1 or cond2` | logical OR | +| `!cond`, `not cond` | negation | +| `(cond)` | grouping | + +Whitespace and newlines are ignored, so multi-line rules work. Strings are JSON-quoted. + +## Important behavioural rules + +### First write wins + +Each classification slot is single-write. Once a rule sets `EXPORTER_GROUP`, no subsequent rule can change it. Order rules from most-specific to least-specific. + +### Static metadata overrides classifiers entirely + +If `metadata_static.exporters` set **any** exporter classification field for this exporter, **none of the exporter classifiers run**. Same for interfaces: if static metadata set any of provider, connectivity, or boundary for an interface, the interface classifiers do not run for that interface. + +This is "Akvorado parity" behaviour — operator-provided classification has priority. Don't try to mix them on the same target. + +### `Classify*` value normalisation + +The string passed to `Classify*` actions is **lowercased and stripped to ASCII alphanumerics + `. + -`**. So `ClassifyRegion("EU West")` becomes `euwest`. If you want to preserve casing or whitespace, use `SetName` or `SetDescription` instead. + +### Runtime errors stop the rule list + +If a rule throws (e.g., comparing a string with `>`), the plugin stops evaluating further rules in that list and keeps whatever was set so far. Use `matches`, `startsWith`, or `contains` instead of `>`/`<` on string fields to avoid this. + +### Cache key includes resolved values + +The interface classifier cache keys by `(exporter, exporter classification, interface)`. When the exporter's classification changes — for example, after you push new static metadata and restart — interface caches naturally invalidate. + +The cache TTL is `classifier_cache_duration`, default 5 minutes (`enrichment.classifier_cache_duration`). It's a last-access TTL — entries live as long as they're queried. + +## Rule examples + +### Exporter classifiers + +```yaml +enrichment: + exporter_classifiers: + # Group exporters by name pattern. + - 'Exporter.Name matches "^edge-.*" && Classify("edge")' + - 'Exporter.Name matches "^core-.*" && Classify("core")' + + # Site by IP prefix. + - 'Exporter.IP startsWith "10.1." && ClassifySite("ny-dc1")' + - 'Exporter.IP startsWith "10.2." && ClassifySite("par-dc1")' + + # Tenant computed from name. + - 'ClassifyTenant(Format("tenant-%s", Exporter.Name))' + + # Pull a token out of the name with a regex. + - 'ClassifyRegionRegex(Exporter.Name, "-([a-z]{2})-[0-9]+$", "$1")' + + # Drop traffic from a test exporter. + - 'Exporter.IP startsWith "192.0.2." && Reject()' + + classifier_cache_duration: 5m +``` + +### Interface classifiers + +```yaml +enrichment: + interface_classifiers: + # Provider from a description prefix. + - 'Interface.Description startsWith "BACKBONE-LUMEN" && ClassifyProvider("Lumen")' + - 'Interface.Description startsWith "BACKBONE-COGENT" && ClassifyProvider("Cogent")' + + # Mark transit links by description keyword and tag them external. + - 'Interface.Description contains "TRANSIT" && ClassifyConnectivity("transit") && ClassifyExternal()' + + # Anything matching the IX peering pattern. + - 'Interface.Description matches "(?i)^(IX|peering)-.*" && ClassifyConnectivity("peering") && ClassifyExternal()' + + # 100 Gbps interfaces are core uplinks. + - 'Interface.Speed >= 100000000000 && ClassifyConnectivity("core")' + + # Use exporter classification to scope interface rules. + - 'CurrentClassification.Role == "edge" && ClassifyExternal()' +``` + +## What can go wrong + +- **A rule fails to parse and the plugin won't start.** Look at the journal — the error message includes the index and a parser context. +- **Classifiers aren't running on an exporter.** Likely cause: static metadata already set a classification field for that exporter, which suppresses all classifier rules for it. +- **A rule sets a value but it appears differently in the dashboard.** `Classify*` actions normalise (lowercase + strip non-alphanumeric). Use `SetName` for human-readable values. +- **The first rule in the list always wins.** First-write-wins per slot. Order rules from most-specific to least-specific. +- **A rule that worked at startup stops matching later.** Cached results expire after `classifier_cache_duration`. If you change rules, restart the plugin so the cache clears completely. +- **Comparison error stops processing.** Comparing a string with `>` throws — subsequent rules in the list are skipped. Use string-safe operators. +- **`ClassifyExternal` doesn't fire on the egress side.** Interface classifiers run twice — once for the input interface, once for the output. Both invocations see the same classifier list. If your rule sets `ClassifyExternal()` on a specific ifIndex, it applies whether that ifIndex is `IN_IF` or `OUT_IF`. + +## What's next + +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Declarative labelling that runs before classifiers. +- [GeoIP](/docs/network-flows/enrichment-concepts/ip-intelligence) — Country / city / AS-name labelling. +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) — How `SRC_AS` / `DST_AS` get filled in. diff --git a/docs/Network Flows/Enrichment Concepts/Decapsulation.mdx b/docs/Network Flows/Enrichment Concepts/Decapsulation.mdx new file mode 100644 index 0000000000..68e979e07d --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/Decapsulation.mdx @@ -0,0 +1,129 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/decapsulation.md" +sidebar_label: "Decapsulation" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +sidebar_position: "70" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/decapsulation" +slug: "/network-flows/enrichment-concepts/decapsulation" +--- + + +# Decapsulation + +Decapsulation extracts the inner packet from tunnelled traffic so the dashboard reflects the actual endpoints, not the tunnel endpoints. Two modes are supported: **SRv6** and **VXLAN**. + +This is useful when your routers are observing overlay traffic — VXLAN-encapsulated VM traffic between hypervisors, SRv6-encapsulated data-centre fabric, etc. Without decap, you see the same "10.0.0.x → 10.0.0.y" flow for every VM-to-VM conversation, which tells you nothing. + +## Modes + +```yaml +protocols: + decapsulation_mode: vxlan # one of: none, srv6, vxlan +``` + +| Mode | What it strips | What it surfaces | +|---|---|---| +| `none` (default) | nothing | the outer-header view | +| `srv6` | IPv6 outer + extension headers + Routing Header type 4 (SRH) | the inner IPv4 (next-header 4) or IPv6 (next-header 41) packet | +| `vxlan` | outer Ethernet/IP + UDP (port 4789) + 8-byte VXLAN header | the inner Ethernet frame, then the inner L3/L4 | + +GRE, IP-in-IP, GENEVE, and other tunnel types are **not** supported. Only SRv6 and VXLAN. + +## How the modes interact with each protocol + +Decap relies on the exporter shipping inner-packet bytes. That happens in three different ways depending on the source protocol: + +| Source | Inner-packet bytes carried as | Required exporter capability | +|---|---|---| +| **NetFlow v9** | Information Element 104 (`Layer2packetSectionData`, RFC 7270) | Exporter must include IE 104 in the template; it carries the captured frame bytes | +| **IPFIX** | Information Element 315 (`dataLinkFrameSection`, RFC 7133) | Same idea, IPFIX-standard IE | +| **sFlow** | Always — `SampledHeader` records carry the truncated raw packet | sFlow agents send `SampledHeader` by default for header-sampling mode | + +For **NetFlow v9 / IPFIX without IE 104 / 315 in the template, decapsulation does not run** — even with `decapsulation_mode: vxlan` set. Standard flow records pass through unchanged. So enabling decap on the plugin is half the work; you also have to configure your exporter to ship the frame bytes. + +For **sFlow, decap always runs** when the mode is set, because every flow sample carries a `SampledHeader`. + +## What gets surfaced + +When decap succeeds, the inner 5-tuple replaces the outer one in the flow record: + +- `SRC_ADDR` / `DST_ADDR` — inner source/destination IPs +- `SRC_PORT` / `DST_PORT` — inner ports +- `PROTOCOL` — inner L4 protocol +- `ETYPE` — inner EtherType +- `IPTOS`, `IPTTL`, `IPV6_FLOW_LABEL`, `TCP_FLAGS` — inner IP/TCP fields +- `IP_FRAGMENT_ID`, `IP_FRAGMENT_OFFSET` — inner fragmentation +- `ICMPV4_TYPE` / `ICMPV4_CODE` / `ICMPV6_TYPE` / `ICMPV6_CODE` — inner ICMP +- `MPLS_LABELS` — inner MPLS label stack (if present) +- `BYTES` — inner L3 length (so byte counts represent inner traffic, not outer overhead) + +For VXLAN, the inner Ethernet frame is parsed, so `SRC_MAC` / `DST_MAC` / `SRC_VLAN` / `DST_VLAN` come from the inner frame. **The outer MACs and VLANs are lost.** + +For SRv6, the outer is an IPv6 packet (no L2 to lose). + +The **VXLAN VNI is dropped**. Netdata does not surface it. If you need to distinguish overlay segments, you need a different mechanism — VLAN-tagged inner frames work, but pure VNI-based segmentation isn't visible. + +## Decapsulation is destructive on non-tunnel traffic + +When `decapsulation_mode` is set and the exporter ships records via the special L2-section path (NetFlow v9 IE 104 / IPFIX IE 315 / sFlow `SampledHeader`), but the inner packet doesn't match the configured tunnel: + +- For VXLAN mode: a non-VXLAN packet (different UDP port, malformed VXLAN header, or not UDP at all) is **dropped**. The flow does NOT fall back to outer-header view. +- For SRv6 mode: an IPv6 packet without the right extension-header chain leading to next-header 4 or 41 is **dropped**. +- For sFlow with decap on, only `SampledHeader` records are processed. `SampledIPv4`, `SampledIPv6`, `SampledEthernet`, `ExtendedSwitch`, `ExtendedRouter`, `ExtendedGateway` records are all skipped. + +Plain NetFlow / IPFIX flow records that don't go through the special L2-section path are **unaffected** — they pass through normally regardless of the decap setting. So enabling `decapsulation_mode: vxlan` doesn't break your normal flow stream; it only filters the L2-section path. + +This means decapsulation is safe to enable when: + +- All your tunnel-bearing exporters use the same encapsulation, AND +- The L2-section / `SampledHeader` data they ship is exclusively (or near-exclusively) tunnel traffic. + +If you mix VXLAN and SRv6 traffic on the same exporter, you cannot decap both — the plugin has one global setting. + +## Configuring exporters to ship inner-packet bytes + +For decap to work, your exporter must include the inner-packet bytes in its export. This is platform-specific. The CLI snippets below are starting points — verify against the vendor's reference manual before deploying. + +### Cisco IOS-XE / IOS-XR (NetFlow v9 with `datalink mac`) + +``` +flow record FNF-WITH-MAC + match ipv4 source address + match ipv4 destination address + match transport source-port + match transport destination-port + match ipv4 protocol + match datalink mac source address input + match datalink mac destination address input + collect counter bytes + collect counter packets + collect timestamp absolute first + collect timestamp absolute last + collect datalink frame-section section header size 128 +``` + +The `collect datalink frame-section` directive is what causes the exporter to include IE 104. Adjust the section size based on your maximum tunnel header size; 128 bytes covers VXLAN over Ethernet over IPv4. SRv6 inner extraction needs more — 256 or higher. + +### Juniper JunOS (IPFIX with frame export) + +JunOS' IPFIX support varies by platform. On platforms that support frame-section export, configure the template to include `dataLinkFrameSection` (IE 315). Refer to your platform's documentation. + +### sFlow (built-in) + +sFlow agents send `SampledHeader` by default. No special configuration needed beyond enabling sFlow. + +## Failure modes + +- **Exporter doesn't ship IE 104 / IE 315.** The plugin can't decap. Records pass through with outer-header view. +- **Inner packet isn't VXLAN/SRv6.** With decap on, the flow is dropped. There is no "fall back to outer view" — this is intentional, but be aware. +- **Truncated frame section.** The inner Ethernet/IP/L4 parsing fails and the flow is dropped. +- **VXLAN on a non-standard port.** The plugin only matches UDP destination port 4789 (RFC 7348). VXLAN-GPE on 4790 and vendor-custom ports are not detected. +- **VNI not visible.** Bytes 4-6 of the VXLAN header are skipped. If you need VNI-based segmentation, see if your exporter can place the VNI in a separate field; otherwise this isn't surfaceable today. + +## What's next + +- [Configuration](/docs/network-flows/configuration) — `protocols.decapsulation_mode` setting reference. +- [Sources / NetFlow](/docs/network-flows/sources/netflow) — IE 104 export configuration. +- [Sources / IPFIX](/docs/network-flows/sources/ipfix) — IE 315 export configuration. +- [Sources / sFlow](/docs/network-flows/sources/sflow) — `SampledHeader` semantics. diff --git a/docs/Network Flows/Enrichment Concepts/IP Intelligence.mdx b/docs/Network Flows/Enrichment Concepts/IP Intelligence.mdx new file mode 100644 index 0000000000..780ba90d7d --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/IP Intelligence.mdx @@ -0,0 +1,11 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/ip-intelligence.md" +sidebar_label: "IP Intelligence" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +description: "How GeoIP and ASN data combine to enrich flow records with country, city, and AS-name labels." +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/ip-intelligence" +slug: "/network-flows/enrichment-concepts/ip-intelligence" +--- + diff --git a/docs/Network Flows/Enrichment Concepts/Network Identity.mdx b/docs/Network Flows/Enrichment Concepts/Network Identity.mdx new file mode 100644 index 0000000000..564006075a --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/Network Identity.mdx @@ -0,0 +1,138 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/network-identity.md" +sidebar_label: "Network Identity" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +description: "How external feeds (cloud IP ranges, IPAM systems) label your network prefixes." +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/network-identity" +slug: "/network-flows/enrichment-concepts/network-identity" +--- + + +# Network Identity + +Network-identity enrichment labels your own network prefixes with names, roles, sites, regions, tenants, and country / city overrides. Where [IP intelligence](/docs/network-flows/enrichment-concepts/ip-intelligence) tells you "this IP is in Germany" from a public database, network-identity tells you "this prefix is our staging environment in Frankfurt" from your authoritative source. + +The data comes from external feeds — cloud-provider published prefix lists (AWS, GCP, Azure), IPAM systems (NetBox, Infoblox, BlueCat, phpIPAM), and custom CMDBs. Each source is configured as a separate integration card. This page covers the **cross-cutting concept**: how the lookups combine, what fields can be set, the operational rules. + +## What it populates + +| Field | Notes | +|---|---| +| `SRC_NET_NAME` / `DST_NET_NAME` | Friendly name | +| `SRC_NET_ROLE` / `DST_NET_ROLE` | Role tag (e.g., `dmz`, `office`, `iot`) | +| `SRC_NET_SITE` / `DST_NET_SITE` | Physical site | +| `SRC_NET_REGION` / `DST_NET_REGION` | Region | +| `SRC_NET_TENANT` / `DST_NET_TENANT` | Tenant | +| `SRC_COUNTRY` / `DST_COUNTRY` | Country override (when set explicitly) | +| `SRC_GEO_STATE` / `DST_GEO_STATE` | State / province override | +| `SRC_GEO_CITY` / `DST_GEO_CITY` | City override | + +What network-identity sources cannot set: `SRC_GEO_LATITUDE` / `DST_GEO_LATITUDE`, `SRC_GEO_LONGITUDE` / `DST_GEO_LONGITUDE`. Coordinates are static-only — use the [`networks` block in static metadata](/docs/network-flows/enrichment-concepts/static-metadata) for those. + +The per-row `asn` field can also override the AS *number* via the resolution chain. The AS *name* still comes from the [ASN MMDB](/docs/network-flows/enrichment-concepts/ip-intelligence) — there is no `asn_name` override in network-identity sources. + +## Lookup priority + +In the network-attributes resolution merge order: + +1. **GeoIP** seeds the base layer. +2. **Network-identity sources** (cloud IP ranges, IPAM, generic IPAM) merge on top — at each prefix length (least-specific to most-specific). +3. **Static `networks` config** merges last — at each prefix length, **after** network-identity sources. + +So when a prefix is defined in both a remote source and the static config, the static config wins on any non-empty field. This is intentional: explicit operator configuration overrides imported data. + +## How a fetch works + +For each configured source: + +1. The plugin issues an HTTP request (default GET, or POST if configured) at the `interval` cadence. +2. Headers configured under `headers:` are added (typically for authentication). +3. The response body is parsed as JSON. +4. The configured `transform` (a [jaq](https://github.com/01mf02/jaq) jq-equivalent expression) runs over the parsed JSON. +5. The transform must produce a stream of objects, each with a `prefix` field (a CIDR string) and any of the optional attribute fields. +6. The records are merged into the network-attributes trie. + +Each source runs in its own task. Multiple sources fetch in parallel; within a source, only one fetch is in flight at a time. + +On any failure (HTTP error, JSON parse error, jq runtime error, empty result), the source backs off exponentially (starting at `interval / 10`, doubling up to `interval`) and retries. On success it resets to the configured `interval`. + +## The expected jq output shape + +The `transform` is a jq expression compiled by jaq. It receives the entire parsed JSON body and must produce a **stream of objects**. + +Each output object should look like: + +```json +{ + "prefix": "10.0.0.0/8", + "name": "internal", + "role": "lan", + "site": "fra1", + "region": "eu-central", + "country": "DE", + "state": "HE", + "city": "Frankfurt", + "tenant": "tenant-a", + "asn": 64500, + "asn_name": "Internal AS" +} +``` + +Required: `prefix`. All other fields are optional and default to empty / 0. + +The `asn` field accepts an integer (`64500`), a string (`"64500"`), or the AS notation (`"AS64500"`). + +If the transform produces nothing (empty result), the cycle is treated as a failure and triggers backoff. The same applies to non-object rows — every output element must be an object. + +## TLS verification cannot be disabled + +The configuration accepts the legacy keys `tls.verify` and `tls.skip_verify` for compatibility, but the validation layer **rejects** any attempt to disable verification (`tls.verify: false` or `tls.skip_verify: true`). Self-signed or internal CAs must be supplied via `tls.ca_file`. There is no override. + +This is deliberate. Network-identity data flows directly into enrichment that affects security investigations and capacity decisions — silently accepting MITM-able responses would corrupt every downstream analysis. + +## Single page only + +The fetch is one-shot per cycle. There is no pagination, no cursor handling, no `Link: rel=next` following. If your IPAM exposes paginated endpoints, either: + +- Expose a separate "all prefixes" bulk endpoint (most IPAMs have one). +- Wrap with a server-side script that aggregates all pages and serves the result at one URL. + +## Authentication + +The plugin has no built-in OAuth flow, basic-auth helpers, or token refresh. Set whatever the API needs explicitly: + +```yaml +headers: + Authorization: "Token abc123" +``` + +If your endpoint needs short-lived tokens, refresh them outside Netdata and put the current valid token in the headers config (and reload). + +## Available sources + +Each is configured as a separate integration card. See the per-source card for setup details: + +- **AWS IP Ranges** — public AWS prefix list with per-region and per-service tagging +- **GCP IP Ranges** — public GCP prefix list with per-scope and per-service tagging +- **Azure IP Ranges** — published per Azure Service Tags (requires an internal mirror because Azure's URL rotates weekly) +- **NetBox** — open-source IPAM / DCIM, REST API with bearer-token auth +- **Generic JSON-over-HTTP IPAM** — catch-all for Infoblox, BlueCat, phpIPAM, custom CMDBs + +## What can go wrong + +- **Endpoint is paginated.** Only the first page is fetched. Use a bulk endpoint or wrap with a server-side script. +- **Default interval is 60s.** Fast for an IPAM, slow for AWS/GCP ranges. Tune per source — daily is fine for cloud IP ranges, 5-15 minutes for IPAMs that change often. +- **TLS verify cannot be disabled.** Use `tls.ca_file` for internal CAs. +- **Empty result from the transform** is treated as failure. If your endpoint returns no prefixes (legitimate state for a quiet IPAM), the source backs off as if it errored. Workaround: have the upstream return at least one synthetic prefix. +- **Authorization header must be in `headers:`**, not in the URL. URLs with embedded credentials (`https://user:pass@host`) are not specially handled. +- **JSON parse errors are silent in the dashboard.** Watch the Netdata journal (`journalctl -u netdata | grep network_sources`) for warnings. +- **Static config silently wins ties.** When a prefix is defined in both a remote source and `networks:`, the static config's values overwrite the remote ones. This is by design but can surprise operators expecting the remote feed to be authoritative. + +## What's next + +- **AWS IP Ranges, GCP IP Ranges, Azure IP Ranges, NetBox, Generic JSON-over-HTTP IPAM** — per-source integration cards with concrete setup instructions and example jq transforms. +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Static `networks` block (overrides network-identity at the same prefix length). +- [IP Intelligence](/docs/network-flows/enrichment-concepts/ip-intelligence) — The base layer that network-identity merges on top of. +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) — How the per-row `asn` field plugs in. diff --git a/docs/Network Flows/Enrichment Concepts/Static Metadata.mdx b/docs/Network Flows/Enrichment Concepts/Static Metadata.mdx new file mode 100644 index 0000000000..ac7449c8b8 --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/Static Metadata.mdx @@ -0,0 +1,230 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/static-metadata.md" +sidebar_label: "Static Metadata" +learn_status: "Published" +learn_rel_path: "Network Flows/Enrichment Concepts" +sidebar_position: "40" +learn_link: "https://learn.netdata.cloud/docs/network-flows/enrichment-concepts/static-metadata" +slug: "/network-flows/enrichment-concepts/static-metadata" +--- + + +# Static metadata + +Static metadata is the foundational enrichment for any multi-exporter deployment. It lets you give your routers, your switches, your interfaces, and your own networks the names and labels you want to see on the dashboard — instead of raw IP addresses and SNMP indexes. + +There are two independent configuration blocks. They populate different fields and use different lookup keys, but you typically configure both: + +| Block | Lookup key | What it labels | +|---|---|---| +| `enrichment.metadata_static.exporters` | exporter IP / CIDR + ifIndex | The exporter device and its individual interfaces | +| `enrichment.networks` | source / destination IP | Your own networks (CIDRs you operate) | + +## What it populates + +### From `metadata_static.exporters` + +Per-exporter (matched by source IP of the UDP datagram): + +- `EXPORTER_NAME`, `EXPORTER_GROUP`, `EXPORTER_ROLE`, `EXPORTER_SITE`, `EXPORTER_REGION`, `EXPORTER_TENANT` + +Per-interface (matched by ifIndex from the flow record): + +- `IN_IF_NAME` / `OUT_IF_NAME` +- `IN_IF_DESCRIPTION` / `OUT_IF_DESCRIPTION` +- `IN_IF_SPEED` / `OUT_IF_SPEED` (in **bits per second**) +- `IN_IF_PROVIDER` / `OUT_IF_PROVIDER` +- `IN_IF_CONNECTIVITY` / `OUT_IF_CONNECTIVITY` +- `IN_IF_BOUNDARY` / `OUT_IF_BOUNDARY` (`1` = external, `2` = internal) + +### From `networks` + +Per source/destination IP (matched by CIDR): + +- `SRC_NET_NAME` / `DST_NET_NAME` +- `SRC_NET_ROLE` / `DST_NET_ROLE` +- `SRC_NET_SITE` / `DST_NET_SITE` +- `SRC_NET_REGION` / `DST_NET_REGION` +- `SRC_NET_TENANT` / `DST_NET_TENANT` +- `SRC_COUNTRY` / `DST_COUNTRY`, `SRC_GEO_STATE` / `DST_GEO_STATE`, `SRC_GEO_CITY` / `DST_GEO_CITY` — overrides for the GeoIP-derived fields +- `SRC_GEO_LATITUDE` / `DST_GEO_LATITUDE`, `SRC_GEO_LONGITUDE` / `DST_GEO_LONGITUDE` — overrides for the coordinate fields +- `SRC_AS_NAME` / `DST_AS_NAME` — only when the configured `asn` causes the chain to render `AS{n}` and the MMDB has a matching name + +The `networks` block can also override the AS **number** via the `asn` field, for matching prefixes. AS **names** still come from the ASN database — see [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution). + +## Configuration + +### Naming exporters and interfaces + +```yaml +enrichment: + metadata_static: + exporters: + 192.0.2.10: # single IP (treated as /32) + name: edge-router-1 + site: par1 + region: eu-west + role: edge + tenant: tenant-a + default: # template applied to interfaces not in if_indexes + description: unclassified port + if_indexes: + 1: + name: Gi0/0/1 + description: uplink to ISP-A + speed: 10000000000 # 10 Gbps in bits per second + provider: isp-a + connectivity: transit + boundary: external + 2: + name: Gi0/0/2 + description: LAN core + speed: 1000000000 + connectivity: lan + boundary: internal +``` + +The `if_indexes` map keys by the integer ifIndex the router sends in flow records. If a flow arrives with an ifIndex not present in the map, the `default` interface block is used. The `skip_missing_interfaces: true` option overrides this — when set, missing entries get no interface labels at all. + +### Matching multiple exporters with one block + +CIDR prefixes work too. Longest-prefix match wins. + +```yaml +enrichment: + metadata_static: + exporters: + 198.51.100.0/24: # all routers in this subnet + site: dc-fra1 + region: eu-central + role: spine + default: + connectivity: lan + boundary: internal + 198.51.100.10: # specific override for one IP + name: spine-fra1-a + if_indexes: + 1: + name: 100Ge-0/0/1 + description: leaf-uplink + speed: 100000000000 + connectivity: transit + boundary: external +``` + +### Tagging your own networks + +```yaml +enrichment: + networks: + 10.0.0.0/8: + name: corp-internal + role: internal + tenant: tenant-a + 172.16.0.0/12: + name: corp-internal + role: internal + tenant: tenant-a + 192.168.0.0/16: + name: corp-internal + role: internal + tenant: tenant-a + 198.51.100.0/24: # a public block you operate + name: customer-acme + role: customer + site: par1 + country: FR + city: Paris + latitude: 48.8566 + longitude: 2.3522 + asn: 64500 # forces SRC_AS / DST_AS for traffic in this prefix + 203.0.113.0/24: transit-a # shorthand: name only +``` + +Two things to know: + +- The `networks` map merges all containing CIDRs in ascending prefix-length order — least-specific first, with more-specific overrides. A `/24` entry inherits any non-empty fields from a containing `/16` entry, and adds or overwrites its own fields. +- The shorthand form (`203.0.113.0/24: transit-a`) sets only the `name`. All other fields are empty. + +## Lookup priority and pipeline order + +Within an exporter: + +1. **`metadata_static.exporters` longest-prefix match** wins for exporter labels and interface labels. +2. **`if_indexes` lookup** runs against the ifIndex from the flow record, falling back to `default` (or returning empty when `skip_missing_interfaces: true`). + +Within a flow's source/destination IP: + +1. **GeoIP** runs first as the base layer. +2. **`network_sources` (remote feeds)** merge on top. +3. **`networks` (static config)** merges last and wins on any non-empty field. + +The two paths run independently. An exporter IP that also matches a `networks` entry will get **both** treatments — exporter labels for the device, network labels for any traffic to or from that IP. + +## Things to know + +### `IN_IF_BOUNDARY` / `OUT_IF_BOUNDARY` semantics + +These label **the interface itself**, not the direction of traffic: + +- `1` = external — the port faces the outside world (Internet, peer, transit) +- `2` = internal — the port faces your own infrastructure +- `0` (or omitted) = undefined — the field is removed from the output + +Filtering for `IN_IF_BOUNDARY=1` cleanly gives you "traffic that arrived from outside". The encoding is intentional even if `1` for "external" looks counter-intuitive. + +The values `external` and `internal` are also accepted as strings (case-insensitive) in the YAML. + +### `speed` is in bits per second + +A 1 Gbps interface is `1000000000`, not `1000`. Operators thinking in megabits or gigabits will get the speed wrong by a factor of 1000 to 1 000 000. The plugin treats `speed: 0` as "not set" and removes the field from the output. + +### CIDR prefixes accept single IPs + +`192.0.2.10` and `192.0.2.10/32` are equivalent. Use whichever is clearer. + +### `networks..asn` overrides only the number + +Setting `asn: 64500` overrides whatever the [ASN resolution chain](/docs/network-flows/enrichment-concepts/asn-resolution) computed. The AS *name* still comes from the ASN database — there is no `asn_name` config field. + +### Coordinates are silently dropped if invalid + +`latitude: 91.5` (out of range) sets the field to an empty string with no error. Same for non-finite values. Validate manually if your data is important. + +### Renamed interfaces don't auto-track + +`if_indexes` keys by the numeric ifIndex. If a router renumbers its interfaces (line-card reseat, stack rebuild), the old ifIndex no longer matches and the per-interface block silently no longer applies. Audit after hardware changes. + +### Static metadata blocks classifiers + +If `metadata_static.exporters` set **any** classification field (group / role / site / region / tenant) for an exporter, the [classifiers](/docs/network-flows/enrichment-concepts/classifiers) do not run for that exporter at all. The same applies to interfaces: if static metadata set any of provider / connectivity / boundary, the interface classifiers don't run for that interface. Plan accordingly. + +## Sampling rate overrides + +Sampling rates can also be configured per exporter prefix here, in case your exporter doesn't carry the rate or you want to override it: + +```yaml +enrichment: + default_sampling_rate: 1 # global fallback + override_sampling_rate: + 10.1.0.0/16: 1024 # override for this network of exporters +``` + +`default_sampling_rate` applies when the flow record doesn't carry a rate and no override matches. `override_sampling_rate` always wins when its prefix matches the exporter IP. Both accept either an integer (uniform rate) or a CIDR-keyed map. + +## What can go wrong + +- **Wrong CIDR matches.** Overlapping ranges merge ascending — a more-specific entry that leaves a field blank will inherit the supernet's value. To clear a field on a more-specific entry, you must set it explicitly to a sentinel value, not leave it blank. +- **Forgotten internal range.** Until you declare your RFC 1918 / RFC 6598 / link-local ranges as `networks` entries, GeoIP can return spurious data for them. +- **Renamed interface no longer matches.** ifIndex keys are numeric; renames or hardware changes break the mapping silently. +- **Stale exporter prefix.** A new device with a different management IP doesn't match an old block. Audit when you replace gear. +- **`speed: 1000` means 1 kbps.** Use bits per second. +- **`boundary: 0` is indistinguishable from "not set"**, both result in field removal. If you want explicit "undefined" use the string `"undefined"`. +- **Lat / lng silent drop.** Invalid values become empty strings. The map quietly stops drawing the marker. + +## What's next + +- [GeoIP](/docs/network-flows/enrichment-concepts/ip-intelligence) — How country / city / coordinates and AS names get resolved. +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) — The provider chain that picks AS numbers. +- [Classifiers](/docs/network-flows/enrichment-concepts/classifiers) — Rule-based labelling that runs only when static metadata didn't already classify the exporter or interface. +- [Network sources](/docs/network-flows/enrichment-concepts/network-identity) — Fetching `networks`-style data from remote endpoints. diff --git a/docs/Network Flows/Enrichment Concepts/_category_.json b/docs/Network Flows/Enrichment Concepts/_category_.json new file mode 100644 index 0000000000..884336de8e --- /dev/null +++ b/docs/Network Flows/Enrichment Concepts/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Enrichment Concepts", + "position": 50 +} diff --git a/docs/Network Flows/Field Reference.mdx b/docs/Network Flows/Field Reference.mdx new file mode 100644 index 0000000000..1a7ec52369 --- /dev/null +++ b/docs/Network Flows/Field Reference.mdx @@ -0,0 +1,343 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/field-reference.md" +sidebar_label: "Field Reference" +learn_status: "Published" +learn_rel_path: "Network Flows" +description: "Complete list of flow fields with per-protocol availability." +sidebar_position: "60" +learn_link: "https://learn.netdata.cloud/docs/network-flows/field-reference" +slug: "/network-flows/field-reference" +--- + + +# Field Reference + +Each flow record carries up to 91 fields. Some come straight from the exporter, others are added by enrichment after decode. This page is the canonical list — what each field means, where it comes from, and which protocols populate it. + +In the dashboard, fields appear by their canonical name (uppercase, e.g., `SRC_AS_NAME`). The dashboard is case-insensitive when typing into the filter ribbon. + +## How to read the protocol columns + +| Symbol | Meaning | +|---|---| +| ✓ | Always populated by this protocol when the data is available | +| ◐ | Populated only when the exporter includes the relevant Information Element in its template (v9 / IPFIX) or the relevant record type (sFlow) | +| — | Never populated by this protocol; expect this field to be empty | + +Enrichment-only fields are marked **enrichment** — the decoder never fills them; they come from configured GeoIP databases, static metadata, classifiers, or routing sources. + +## Counters and sampling + +The four most-used fields. Read these first. + +| Field | Type | Description | +|---|---|---| +| `BYTES` | uint64 | Bytes in the flow, **already multiplied by `SAMPLING_RATE`** at ingest. The dashboard's volume numbers come from this. | +| `PACKETS` | uint64 | Packets in the flow, already multiplied by `SAMPLING_RATE`. | +| `RAW_BYTES` | uint64 | Bytes the exporter actually reported, before scaling. Use when sampling is uniform across all your exporters and you want exact counts. | +| `RAW_PACKETS` | uint64 | Packets the exporter actually reported, before scaling. | +| `FLOWS` | uint64 | Number of flows aggregated into this record. Always 1 for raw records. | +| `SAMPLING_RATE` | uint64 | Packets-per-sample reported by the exporter. `1` means unsampled. Used as the multiplier for BYTES and PACKETS. | + +Every protocol populates these. sFlow always sends a sampling rate (per-sample). NetFlow v5 reads a header rate. NetFlow v7 has no rate field and is treated as unsampled. NetFlow v9 and IPFIX may include the rate per-record or via Sampling Options. + +## Identity — who and what + +| Field | Type | v5 | v7 | v9 | IPFIX | sFlow | Description | +|---|---|---|---|---|---|---|---| +| `FLOW_VERSION` | string | ✓ | ✓ | ✓ | ✓ | ✓ | One of `v5`, `v7`, `v9`, `ipfix`, `sflow`. | +| `EXPORTER_IP` | IP | ✓ | ✓ | ✓ | ✓ | ✓ | The device that sent this flow. For sFlow, the agent address takes precedence over the UDP source IP. | +| `EXPORTER_PORT` | uint16 | ✓ | ✓ | ✓ | ✓ | ✓ | Source UDP port of the exporter. | +| `SRC_ADDR` | IP | ✓ | ✓ | ◐ | ◐ | ◐ | Source IP. v9/IPFIX from IE 8/27, sFlow from sampled header or `SampledIPv4`/`SampledIPv6`. | +| `DST_ADDR` | IP | ✓ | ✓ | ◐ | ◐ | ◐ | Destination IP. | +| `SRC_PORT` | uint16 | ✓ | ✓ | ◐ | ◐ | ◐ | Source L4 port. | +| `DST_PORT` | uint16 | ✓ | ✓ | ◐ | ◐ | ◐ | Destination L4 port. | +| `PROTOCOL` | uint8 | ✓ | ✓ | ✓ | ✓ | ◐ | IP protocol number. TCP=6, UDP=17, ICMP=1, ICMPv6=58, GRE=47, ESP=50. | +| `ETYPE` | uint16 | ✓ (IPv4) | ✓ (IPv4) | ◐ | ◐ | ◐ | EtherType. 2048 = IPv4, 34525 = IPv6. | +| `DIRECTION` | enum | — | — | ◐ | ◐ | — | `ingress`, `egress`, or `undefined`. | + +NetFlow v5 and v7 are IPv4-only. For v9, IPFIX, and sFlow, IPv6 fields populate when the exporter sends them. + +## Routing — addresses and AS + +| Field | Type | Source | Description | +|---|---|---|---| +| `SRC_PREFIX` | IP | decoder + enrichment | Source network prefix. | +| `DST_PREFIX` | IP | decoder + enrichment | Destination network prefix. | +| `SRC_MASK` | uint8 | decoder + enrichment | Source prefix length in bits. | +| `DST_MASK` | uint8 | decoder + enrichment | Destination prefix length in bits. | +| `NEXT_HOP` | IP | decoder | BGP next-hop or RIB next-hop, depending on the exporter. | +| `SRC_AS` | uint32 | decoder + enrichment | Source autonomous system. | +| `DST_AS` | uint32 | decoder + enrichment | Destination autonomous system. | +| `SRC_AS_NAME` | string | **enrichment** | Friendly AS name (e.g., `AS15169 Google LLC`). | +| `DST_AS_NAME` | string | **enrichment** | Friendly AS name. | +| `DST_AS_PATH` | string | sFlow `ExtendedGateway` / BGP enrichment | BGP AS path as comma-separated ASNs. | +| `DST_COMMUNITIES` | string | sFlow `ExtendedGateway` / BGP enrichment | BGP communities. | +| `DST_LARGE_COMMUNITIES` | string | BGP enrichment | RFC 8092 large communities. | + +Static-network configuration can override `SRC_MASK` / `DST_MASK` and `SRC_AS` / `DST_AS` with more specific values from your CIDR-to-attribute map. + +## Interfaces + +| Field | Type | Source | Description | +|---|---|---|---| +| `IN_IF` | uint32 | decoder | Ingress SNMP ifIndex. | +| `OUT_IF` | uint32 | decoder | Egress SNMP ifIndex. | +| `IN_IF_NAME` | string | **enrichment** | Friendly name. | +| `OUT_IF_NAME` | string | **enrichment** | Friendly name. | +| `IN_IF_DESCRIPTION` | string | **enrichment** | SNMP `ifDescr` or your label. | +| `OUT_IF_DESCRIPTION` | string | **enrichment** | SNMP `ifDescr` or your label. | +| `IN_IF_SPEED` | uint64 | **enrichment** | Interface speed in bps. | +| `OUT_IF_SPEED` | uint64 | **enrichment** | Interface speed in bps. | +| `IN_IF_PROVIDER` | string | **enrichment** | Your transit provider tag (e.g., `Cogent`, `Lumen`). | +| `OUT_IF_PROVIDER` | string | **enrichment** | Same. | +| `IN_IF_CONNECTIVITY` | string | **enrichment** | Connectivity type tag (`transit`, `peering`, `customer`, `cdn`, ...). | +| `OUT_IF_CONNECTIVITY` | string | **enrichment** | Same. | +| `IN_IF_BOUNDARY` | uint8 | **enrichment** | `1` = External (Internet-facing), `2` = Internal (LAN/private). | +| `OUT_IF_BOUNDARY` | uint8 | **enrichment** | Same. | + +`*_BOUNDARY` is counter-intuitive: 1 means "external" (the Internet side). It's defined that way so that filtering for `IN_IF_BOUNDARY=1` cleanly gives you "traffic that came in from the Internet". + +## Layer 2 + +| Field | Type | v5 | v7 | v9 | IPFIX | sFlow | Description | +|---|---|---|---|---|---|---|---| +| `SRC_MAC` | MAC | — | — | ◐ | ◐ | ◐ | Source MAC. v9 IE 56, IPFIX IE 56/81. sFlow from `SampledHeader` or `SampledEthernet`. | +| `DST_MAC` | MAC | — | — | ◐ | ◐ | ◐ | Destination MAC. v9 IE 80, IPFIX IE 80/57. | +| `SRC_VLAN` | uint16 | — | — | ◐ | ◐ | ◐ | Source VLAN. v9 IE 58, IPFIX IE 58/243. **For sFlow, only from `ExtendedSwitch` records — NOT from 802.1Q tags inside a sampled packet header.** | +| `DST_VLAN` | uint16 | — | — | ◐ | ◐ | ◐ | Destination VLAN. | +| `MPLS_LABELS` | string | — | — | ◐ | ◐ | ◐ | MPLS label stack as comma-separated decimal label values (label only, not EXP/S/TTL). | + +## NAT + +| Field | Type | v5/v7 | v9 | IPFIX | sFlow | Description | +|---|---|---|---|---|---|---| +| `SRC_ADDR_NAT` | IP | — | ◐ | ◐ | — | Post-NAT source address. v9 IE 225, IPFIX IE 225/281. | +| `DST_ADDR_NAT` | IP | — | ◐ | ◐ | — | Post-NAT destination address. | +| `SRC_PORT_NAT` | uint16 | — | ◐ | ◐ | — | Post-NAT source port. | +| `DST_PORT_NAT` | uint16 | — | ◐ | ◐ | — | Post-NAT destination port. | + +## Protocol metadata + +| Field | Type | Description | +|---|---|---| +| `IPTTL` | uint8 | IP TTL. v9 uses Min/MaxTtl; IPFIX uses IE 192/52. | +| `IPTOS` | uint8 | IP Type of Service / DSCP byte. | +| `IPV6_FLOW_LABEL` | uint32 | IPv6 flow label (20-bit). v9/IPFIX only. | +| `TCP_FLAGS` | uint8 | OR of all TCP control bits seen in the flow (SYN/ACK/FIN/RST/PSH/URG). | +| `IP_FRAGMENT_ID` | uint32 | IPv4 ident or IPv6 fragment ID. | +| `IP_FRAGMENT_OFFSET` | uint16 | Non-zero means fragmented. | +| `ICMPV4_TYPE` | uint8 | ICMPv4 type. | +| `ICMPV4_CODE` | uint8 | ICMPv4 code. | +| `ICMPV6_TYPE` | uint8 | ICMPv6 type. | +| `ICMPV6_CODE` | uint8 | ICMPv6 code. | +| `FORWARDING_STATUS` | uint8 | RFC 7270 outcome code: `64..127` = forwarded, `128..191` = dropped, `192..255` = consumed. | + +## Timestamps + +| Field | Type | Description | +|---|---|---| +| `FLOW_START_USEC` | uint64 | Microseconds since epoch. From v5/v7 first-switched + sysUptime; from v9 first-switched normalised against system init time; from IPFIX `flowStartMicroseconds` family. Not populated for sFlow. | +| `FLOW_END_USEC` | uint64 | Microseconds since epoch. Same sources. Not populated for sFlow. | +| `OBSERVATION_TIME_MILLIS` | uint64 | IPFIX observation time (`observationTimeMilliseconds`). | + +## Geolocation (enrichment-only) + +| Field | Type | Description | +|---|---|---| +| `SRC_COUNTRY` | string | ISO 3166 country code. | +| `DST_COUNTRY` | string | ISO 3166 country code. | +| `SRC_GEO_STATE` | string | State / province. | +| `DST_GEO_STATE` | string | State / province. | +| `SRC_GEO_CITY` | string | City. | +| `DST_GEO_CITY` | string | City. | +| `SRC_GEO_LATITUDE` | string | Decimal latitude (string-encoded). Hidden in tables by default. | +| `DST_GEO_LATITUDE` | string | Decimal latitude. | +| `SRC_GEO_LONGITUDE` | string | Decimal longitude. | +| `DST_GEO_LONGITUDE` | string | Decimal longitude. | + +City, latitude, and longitude are **not preserved in the rollup tiers** (1m, 5m, 1h). Aggregating on them forces the query to tier 0 (raw). Country and state survive into rollups. + +## Network labels (enrichment-only) + +These are the labels you assign to your own networks via static-metadata or network-sources configuration. The decoder never fills them. + +| Field | Type | Description | +|---|---|---| +| `SRC_NET_NAME` | string | Friendly name for the source network. | +| `DST_NET_NAME` | string | Friendly name for the destination network. | +| `SRC_NET_ROLE` | string | Role tag (e.g., `dmz`, `office`, `printing`, `iot`). | +| `DST_NET_ROLE` | string | Role tag. | +| `SRC_NET_SITE` | string | Physical site (e.g., `dc-fra1`). | +| `DST_NET_SITE` | string | Physical site. | +| `SRC_NET_REGION` | string | Region (e.g., `eu`, `us-east`). | +| `DST_NET_REGION` | string | Region. | +| `SRC_NET_TENANT` | string | Tenant (multi-tenant deployments). | +| `DST_NET_TENANT` | string | Tenant. | + +## Exporter labels (enrichment-only) + +Labels you attach to your exporters via static-metadata or classifiers. + +| Field | Type | Description | +|---|---|---| +| `EXPORTER_NAME` | string | Friendly name. Falls back to an IP-derived string if no enrichment match. | +| `EXPORTER_GROUP` | string | Group tag. | +| `EXPORTER_ROLE` | string | Role tag (e.g., `edge`, `core`, `wan`). | +| `EXPORTER_SITE` | string | Site tag. | +| `EXPORTER_REGION` | string | Region tag. | +| `EXPORTER_TENANT` | string | Tenant tag. | + +## Per-protocol availability summary + +For exporter-derived fields (not enrichment), the protocols differ. The shortest version: + +- **NetFlow v5**: IPv4 5-tuple, AS, interfaces, next-hop, IPTOS, TCP flags, bytes, packets, sampling rate (header), first/last switched timestamps. No IPv6, MAC, VLAN, NAT, ICMP, MPLS. +- **NetFlow v7**: same as v5 minus the sampling rate. +- **NetFlow v9**: depends on the template. Theoretically all the IEs Netdata maps (see [the IPFIX/v9 IE map](#what-ies-are-mapped) below). IPv6 supported. +- **IPFIX**: superset of v9. Adds biflow (initiator/responder counters and `reverseInformationElement` IEs). Wider IE coverage. ICMP type and code as separate IEs. +- **sFlow v5**: depends on which sFlow record types the agent emits. From `SampledHeader` you get most fields after parsing the truncated packet (Ethernet/IPv4/IPv6/TCP/UDP/ICMP/MPLS). VLANs come only from `ExtendedSwitch`. AS path and BGP communities come from `ExtendedGateway`. Counter samples are dropped. + +## What IEs are mapped + +For NetFlow v9 and IPFIX, only specific Information Elements end up in flow-record fields. The rest of the template is parsed (so the decoder can walk past them) but the values are dropped. + +The mapped IEs cover the standard set: identity (8/12/27/28, 7/11), counters (1/2/23/24/231/232/298/299), interfaces (10/14/252/253), protocol (4/5/6), ToS/DSCP (5/55), TTL (52/192), VLANs (58/59/243/254), MACs (56/80/57/81), NAT (225/226/281/282/227/228), AS (16/17), prefixes (44/45), masks (9/13/29/30), MPLS (70-79), ICMP (32/176-179, 139), fragmentation (54/88), IPv6 flow label (31), forwarding status (89), direction (61/239), sampling (34/50/305/306), timestamps (21/22/152/153/322 and the seconds/microseconds variants), and the data-link section for decapsulation (315). + +Vendor enterprise IEs are recognised only for one Juniper case (PEN 2636 `commonPropertiesId`) used to surface forwarding status. Cisco AVC, Cisco NEL/NSEL NAT events, and similar vendor-private fields are parsed (so the decoder doesn't fail) but their values are not exposed in flow records. + +If you need a specific IE mapped, open an issue with sample fixtures. + +## Filtering and aggregation hints + +Some fields are queryable but not aggregatable: + +- `BYTES`, `PACKETS`, `FLOWS`, `RAW_BYTES`, `RAW_PACKETS`, `SAMPLING_RATE` — these are sums in tables and sankeys; you cannot filter or group-by them. +- `FLOW_START_USEC`, `FLOW_END_USEC`, `OBSERVATION_TIME_MILLIS` — timestamps, used by the time-range picker; not used as facets. +- The four geo-coordinate fields (`SRC_GEO_LATITUDE/LONGITUDE`, `DST_GEO_LATITUDE/LONGITUDE`) are stored but hidden in the table by default and not exposed as facets. + +The dashboard also exposes two **virtual facets** that don't exist in the canonical schema: + +- `ICMPV4` — a synthesised string from `ICMPV4_TYPE` and `ICMPV4_CODE`, useful for filtering ICMPv4 messages by their named type/code combination (e.g., "echo-request"). +- `ICMPV6` — same for ICMPv6. + +Filtering on either of these virtual fields runs against the underlying `*_TYPE` and `*_CODE` fields. + +## A note on field counts + +You may see "89 fields" or "91 fields" in different parts of the codebase. The current canonical list has **91 entries**. The schema has grown over time and not every reference has caught up. The list above is exhaustive for the current release. + +## Master index — every field at a glance + +Use this table as the single reference when you know the field name and want every dimension in one place. Sorted alphabetically. + +Column legend: + +- **v5 / v7 / v9 / IPFIX / sFlow** — `✓` always populated, `◐` only when the exporter sends the relevant IE/record, `—` never. +- **Source** — `decoder` (filled by parsing the protocol), `enrichment` (filled by post-decode lookups; the wire never carries it), or `both` (decoder may fill, enrichment may overlay/override). +- **Tiers** — which tiers preserve the field. `all` means raw + 1m + 5m + 1h. `raw` means raw only (dropped at rollup). +- **Selectivity** — which query roles the field plays. `facet` (autocomplete + filter ribbon), `group-by` (Sankey/timeseries/maps aggregation), `filter` (selections), `metric` (BYTES/PACKETS/FLOWS — sums in tables, not faceted), `time` (used by the time-range picker), `hidden` (queryable but not in the default columns). +- **Notes** — IE numbers / sFlow record types when relevant, plus the enrichment chain for enrichment-derived fields. + +| Field | Type | v5 | v7 | v9 | IPFIX | sFlow | Source | Tiers | Selectivity | Notes | +|---|---|---|---|---|---|---|---|---|---|---| +| `BYTES` | uint64 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | metric, filter | Counter; scaled by `SAMPLING_RATE` at ingest. sFlow derives from decoded L3 length | +| `DIRECTION` | string | — | — | ◐ | ◐ | — | decoder | all | facet, group-by, filter | v9 IE 61, IPFIX IE 61/239. sFlow has no native direction | +| `DST_ADDR` | IP | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 12/28; sFlow `SampledHeader`/`SampledIPv4`/`SampledIPv6`. Raw-only | +| `DST_ADDR_NAT` | IP | — | — | ◐ | ◐ | — | decoder | raw | facet, group-by, filter | v9 IE 226/282; IPFIX `postNATdestinationIPv4/IPv6Address` | +| `DST_AS` | uint32 | ✓ | ✓ | ◐ | ◐ | ◐ | both | all | facet, group-by, filter | decoder IE 17 / sFlow `ExtendedGateway` last AS in path. Enrichment chain: `asn_providers` (default `[flow, routing, geoip]`); per-CIDR `enrichment.networks..asn` overrides | +| `DST_AS_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `format_as_name(DST_AS, attrs.asn_name)` → `AS{n} {name}`; falls back to `AS0 Unknown ASN` or `AS0 Private IP Address Space` | +| `DST_AS_PATH` | string | — | — | — | — | ◐ | both | raw | filter | sFlow `ExtendedGateway` BGP path. Routing enrichment overlay (BMP / BioRIS) for non-sFlow exporters | +| `DST_COMMUNITIES` | string | — | — | — | — | ◐ | both | raw | filter | sFlow `ExtendedGateway` communities. Routing enrichment overlay (BMP / BioRIS) | +| `DST_COUNTRY` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | GeoIP MMDB on `DST_ADDR` → optional override from `enrichment.networks..country` | +| `DST_GEO_CITY` | string | — | — | — | — | — | enrichment | raw | facet, group-by, filter | GeoIP city MMDB. Raw-only (dropped at rollup) | +| `DST_GEO_LATITUDE` | string | — | — | — | — | — | enrichment | raw | filter, hidden | GeoIP coordinates. Raw-only; hidden in default table view | +| `DST_GEO_LONGITUDE` | string | — | — | — | — | — | enrichment | raw | filter, hidden | GeoIP coordinates. Raw-only; hidden in default table view | +| `DST_GEO_STATE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | GeoIP subdivision. Preserved in rollups | +| `DST_LARGE_COMMUNITIES` | string | — | — | — | — | — | enrichment | raw | filter | RFC 8092 large communities from routing enrichment (BMP / BioRIS) | +| `DST_MAC` | MAC | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9 IE 80/57; IPFIX same. sFlow from `SampledHeader` datalink or `SampledEthernet` | +| `DST_MASK` | uint8 | ✓ | ✓ | ◐ | ◐ | ◐ | both | raw | facet, group-by, filter | v9 IE 13/29; sFlow `ExtendedRouter`. Enrichment overlay via `net_providers` (default `[flow, routing]`) plus per-CIDR overrides | +| `DST_NET_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..name` (static) merged with network sources by ascending prefix length | +| `DST_NET_REGION` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..region` from static + network sources | +| `DST_NET_ROLE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..role` from static + network sources | +| `DST_NET_SITE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..site` from static + network sources | +| `DST_NET_TENANT` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..tenant` from static + network sources | +| `DST_PORT` | uint16 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 11. sFlow from `SampledIPv4`/`SampledIPv6` or `SampledHeader` transport parse. Raw-only | +| `DST_PORT_NAT` | uint16 | — | — | ◐ | ◐ | — | decoder | raw | facet, group-by, filter | v9 IE 228; IPFIX `postNAPTdestinationTransportPort` | +| `DST_PREFIX` | IP | ✓ | ✓ | ◐ | — | — | decoder | raw | filter | v5/v7 derived from `DST_ADDR` & `DST_MASK`. v9 IE 45 (`Ipv4DstPrefix`). IPFIX has no canonical mapping; sFlow none | +| `DST_VLAN` | uint16 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9 IE 59; IPFIX IE 254 (`PostVlanId`/`PostDot1qVlanId`). sFlow only via `ExtendedSwitch` (NOT from 802.1Q tag in `SampledHeader`) | +| `ETYPE` | uint16 | ✓ (IPv4) | ✓ (IPv4) | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v5/v7 hardcoded to 2048. v9/IPFIX IE 60 `IpProtocolVersion` (4→2048, 6→34525). sFlow from sampled L2 etype | +| `EXPORTER_GROUP` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..group`. Classifiers fill it when static metadata didn't | +| `EXPORTER_IP` | IP | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | facet, group-by, filter | UDP source IP for NetFlow. sFlow uses datagram `agent_address` (override) | +| `EXPORTER_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..name` → falls back to IP-derived name | +| `EXPORTER_PORT` | uint16 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | facet, group-by, filter | UDP source port from socket | +| `EXPORTER_REGION` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..region`. Classifiers may fill | +| `EXPORTER_ROLE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..role`. Classifiers may fill | +| `EXPORTER_SITE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..site`. Classifiers may fill | +| `EXPORTER_TENANT` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..tenant`. Classifiers may fill | +| `FLOWS` | uint64 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | metric, filter | Always 1 for raw records; sums during rollup aggregation | +| `FLOW_END_USEC` | uint64 | ✓ | ✓ | ◐ | ◐ | — | decoder | raw | time | v5/v7 from header `sysUpTime` + `LastSwitched`. v9 from `LastSwitched`/`flowEndMilliseconds` normalised against `system_init`. IPFIX from `flowEndMilliseconds` family. Not populated for sFlow | +| `FLOW_START_USEC` | uint64 | ✓ | ✓ | ◐ | ◐ | — | decoder | raw | time | Same sources as `FLOW_END_USEC`. Not populated for sFlow | +| `FLOW_VERSION` | string | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | facet, group-by, filter | One of `v5`, `v7`, `v9`, `ipfix`, `sflow` | +| `FORWARDING_STATUS` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9/IPFIX IE 89; IPFIX also from Juniper PEN 2636 `commonPropertiesId`. sFlow synthesises `128` (dropped) when `output_format` is `discarded` | +| `ICMPV4_CODE` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | IPFIX IE 177 `IcmpCodeIpv4` + IE 32 low byte. v9 IE 178 `IcmpCodeValue` + IE 32. sFlow from decoded ICMP header | +| `ICMPV4_TYPE` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | IPFIX IE 176 `IcmpTypeIpv4` + IE 32 high byte. v9 IE 32 `IcmpType` + IE 177 `IcmpTypeValue`. sFlow from decoded ICMP header | +| `ICMPV6_CODE` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | IPFIX IE 179 `IcmpCodeIpv6` + IE 139 low byte. v9 IE 179 `ImpIpv6CodeValue`. sFlow from decoded ICMPv6 header | +| `ICMPV6_TYPE` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | IPFIX IE 178 `IcmpTypeIpv6` + IE 139 high byte. v9 IE 178 `IcmpIpv6TypeValue`. sFlow from decoded ICMPv6 header | +| `IN_IF` | uint32 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9 IE 10 `InputSnmp`; IPFIX IE 10/252. sFlow flow-sample `input` (single index only; LOCAL→0) | +| `IN_IF_BOUNDARY` | uint8 | — | — | — | — | — | enrichment | all | facet, group-by, filter | Per-interface static metadata or interface classifier output. `1`=external, `2`=internal | +| `IN_IF_CONNECTIVITY` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | Per-interface static metadata or interface classifier (e.g., `transit`, `peering`, `customer`) | +| `IN_IF_DESCRIPTION` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..description` or set via classifier `SetDescription()` | +| `IN_IF_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..name` or set via classifier `SetName()` | +| `IN_IF_PROVIDER` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | Static metadata or interface classifier provider tag | +| `IN_IF_SPEED` | uint64 | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..speed` (bps) | +| `IPTOS` | uint8 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9 IE 5 `SrcTos` / IE 55 `DstTos`. IPFIX IE 5/55. sFlow from `SampledIPv4` tos / `SampledIPv6` priority / parsed L3 | +| `IPTTL` | uint8 | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9 IE 52/192 (`Min/MaxTtl`). IPFIX same. sFlow from parsed L3 header | +| `IPV6_FLOW_LABEL` | uint32 | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 31 `FlowLabelIpv6`. sFlow from parsed IPv6 header | +| `IP_FRAGMENT_ID` | uint32 | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9 IE 54 `Ipv4Ident`. IPFIX IE 54 `FragmentIdentification`. sFlow from parsed IPv4 header | +| `IP_FRAGMENT_OFFSET` | uint16 | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 88 `FragmentOffset`. sFlow from parsed IPv4 header | +| `MPLS_LABELS` | string | — | — | ◐ | ◐ | ◐ | decoder | raw | filter | v9 IE 70-79 `MplsLabel1..10`. IPFIX IE 70 `MplsTopLabelStackSection` + 71-79 `MplsLabelStackSection2..10`. sFlow from MPLS in `SampledHeader`. Comma-separated decimal labels | +| `NEXT_HOP` | IP | ✓ | ✓ | ◐ | ◐ | ◐ | both | all | facet, group-by, filter | v9 IE 15/18/62/63; IPFIX same. sFlow `ExtendedRouter`/`ExtendedGateway`. Enrichment overlay via `net_providers` chain (default `[flow, routing]`) | +| `OBSERVATION_TIME_MILLIS` | uint64 | — | — | ◐ | — | — | decoder | raw | time | v9 IE 323 `ObservationTimeMilliseconds`. IPFIX has no canonical mapping in this build | +| `OUT_IF` | uint32 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9 IE 14 `OutputSnmp`; IPFIX IE 14/253. sFlow flow-sample `output` (single index only; LOCAL→0) | +| `OUT_IF_BOUNDARY` | uint8 | — | — | — | — | — | enrichment | all | facet, group-by, filter | Same semantics as `IN_IF_BOUNDARY` | +| `OUT_IF_CONNECTIVITY` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | Static metadata or interface classifier connectivity tag | +| `OUT_IF_DESCRIPTION` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..description` | +| `OUT_IF_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..name` | +| `OUT_IF_PROVIDER` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | Static metadata or interface classifier provider tag | +| `OUT_IF_SPEED` | uint64 | — | — | — | — | — | enrichment | all | facet, group-by, filter | `metadata_static.exporters..if_indexes..speed` (bps) | +| `PACKETS` | uint64 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | all | metric, filter | Counter; scaled by `SAMPLING_RATE` at ingest. sFlow always 1 per sample | +| `PROTOCOL` | uint8 | ✓ | ✓ | ✓ | ✓ | ◐ | decoder | all | facet, group-by, filter | v5/v7 protocol_number; v9 IE 4; IPFIX IE 4 `ProtocolIdentifier`. sFlow from `SampledIPv4`/`SampledIPv6` or parsed L3 | +| `RAW_BYTES` | uint64 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | raw | metric | Pre-sampling byte count from the exporter | +| `RAW_PACKETS` | uint64 | ✓ | ✓ | ✓ | ✓ | ✓ | decoder | raw | metric | Pre-sampling packet count from the exporter | +| `SAMPLING_RATE` | uint64 | ✓ (header) | — | ◐ | ◐ | ✓ | decoder | raw | metric | v5 from header `sampling_interval`. v7 has no rate (treated as unsampled). v9/IPFIX from IE 34/305/306 or Sampling Options template. sFlow per-sample rate | +| `SRC_ADDR` | IP | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 8/27. sFlow `SampledHeader`/`SampledIPv4`/`SampledIPv6`. Raw-only | +| `SRC_ADDR_NAT` | IP | — | — | ◐ | ◐ | — | decoder | raw | facet, group-by, filter | v9 IE 225/281; IPFIX `postNATsourceIPv4/IPv6Address` | +| `SRC_AS` | uint32 | ✓ | ✓ | ◐ | ◐ | ◐ | both | all | facet, group-by, filter | decoder IE 16 / sFlow `ExtendedGateway` `src_as`. Enrichment chain: `asn_providers` (default `[flow, routing, geoip]`); per-CIDR `enrichment.networks..asn` overrides | +| `SRC_AS_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `format_as_name(SRC_AS, attrs.asn_name)` → `AS{n} {name}`; falls back to `AS0 Unknown ASN` or `AS0 Private IP Address Space` | +| `SRC_COUNTRY` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | GeoIP MMDB on `SRC_ADDR` → optional override from `enrichment.networks..country` | +| `SRC_GEO_CITY` | string | — | — | — | — | — | enrichment | raw | facet, group-by, filter | GeoIP city MMDB. Raw-only | +| `SRC_GEO_LATITUDE` | string | — | — | — | — | — | enrichment | raw | filter, hidden | GeoIP coordinates. Raw-only; hidden in default table view | +| `SRC_GEO_LONGITUDE` | string | — | — | — | — | — | enrichment | raw | filter, hidden | GeoIP coordinates. Raw-only; hidden in default table view | +| `SRC_GEO_STATE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | GeoIP subdivision. Preserved in rollups | +| `SRC_MAC` | MAC | — | — | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9 IE 56/81; IPFIX same. sFlow from `SampledHeader` datalink or `SampledEthernet` | +| `SRC_MASK` | uint8 | ✓ | ✓ | ◐ | ◐ | ◐ | both | raw | facet, group-by, filter | v9 IE 9/29; sFlow `ExtendedRouter`. Enrichment overlay via `net_providers` (default `[flow, routing]`) plus per-CIDR overrides | +| `SRC_NET_NAME` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..name` (static) merged with network sources by ascending prefix length | +| `SRC_NET_REGION` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..region` from static + network sources | +| `SRC_NET_ROLE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..role` from static + network sources | +| `SRC_NET_SITE` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..site` from static + network sources | +| `SRC_NET_TENANT` | string | — | — | — | — | — | enrichment | all | facet, group-by, filter | `enrichment.networks..tenant` from static + network sources | +| `SRC_PORT` | uint16 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | raw | facet, group-by, filter | v9/IPFIX IE 7. sFlow from `SampledIPv4`/`SampledIPv6` or transport parse. Raw-only | +| `SRC_PORT_NAT` | uint16 | — | — | ◐ | ◐ | — | decoder | raw | facet, group-by, filter | v9 IE 227; IPFIX `postNAPTsourceTransportPort` | +| `SRC_PREFIX` | IP | ✓ | ✓ | ◐ | — | — | decoder | raw | filter | v5/v7 derived from `SRC_ADDR` & `SRC_MASK`. v9 IE 44 (`Ipv4SrcPrefix`). IPFIX has no canonical mapping; sFlow none | +| `SRC_VLAN` | uint16 | — | — | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | v9 IE 58; IPFIX IE 58/243 (`VlanId`/`Dot1qVlanId`). sFlow only via `ExtendedSwitch` (NOT from 802.1Q tag in `SampledHeader`) | +| `TCP_FLAGS` | uint8 | ✓ | ✓ | ◐ | ◐ | ◐ | decoder | all | facet, group-by, filter | OR of all TCP control bits seen in the flow. v9/IPFIX IE 6. sFlow from parsed TCP header in `SampledHeader` | + +The two virtual facets (`ICMPV4`, `ICMPV6`) aren't in this table because they don't exist in the canonical schema — they are synthesised string facets that filter on `ICMPV4_TYPE`/`ICMPV4_CODE` (or v6) under the hood. See the previous section. + +## What's next + +- [Configuration](/docs/network-flows/configuration) — `netflow.yaml` reference. +- [Retention and Querying](/docs/network-flows/retention-and-querying) — How the four tiers store data and which fields they preserve. +- [Visualisation](/docs/network-flows/visualization/sankey-and-table) — Reading the dashboard. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to know your data is right. diff --git a/docs/Network Flows/IP Intelligence/Custom MMDB Database.mdx b/docs/Network Flows/IP Intelligence/Custom MMDB Database.mdx new file mode 100644 index 0000000000..6c855eb669 --- /dev/null +++ b/docs/Network Flows/IP Intelligence/Custom MMDB Database.mdx @@ -0,0 +1,167 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "Custom MMDB Database" +learn_status: "Published" +learn_rel_path: "Network Flows/IP Intelligence" +keywords: [mmdb, custom database, bring your own, ipinfo, ip intelligence] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/ip-intelligence/custom-mmdb-database" +slug: "/network-flows/ip-intelligence/custom-mmdb-database" +--- + + +# Custom MMDB Database + + + + + +Plugin: netflow-plugin +Module: custom-mmdb + + + +## Overview + +The plugin reads any MMDB file that conforms to the standard schema -- this catch-all +integration covers IPInfo, custom-built internal MMDBs, vendor-specific feeds, or +any provider that publishes MMDB data. + +The plugin reads `country.iso_code`, `city.names.en`, `subdivisions[].iso_code`, +`location.latitude`, `location.longitude`, `autonomous_system_number`, and +`autonomous_system_organization`. Vendor-specific extra fields are ignored. + +For the full IP-intelligence concept, see +[IP Intelligence](https://learn.netdata.cloud/docs/network-flows/enrichment/ip-intelligence). + + +You produce or download an MMDB file. Place it on the agent host. Point the +plugin at it via `netflow.yaml`. The plugin reloads on file change every 30 +seconds. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Not auto-detected. You must configure paths explicitly. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### A standards-compliant MMDB file + +The MMDB file must use the [standard MMDB schema](https://maxmind.github.io/MaxMind-DB/). +Validate with `mmdblookup` from the `libmaxminddb-tools` package before deploying. + +Common sources: IPInfo (`ipinfo.io`), custom internal builds via the `mmdbwriter` +Go tool, or vendor-specific feeds. + + + +### Configuration + +#### Options + +Point `enrichment.geoip.asn_database` and/or `enrichment.geoip.geo_database` at +your MMDB file paths. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enrichment.geoip.asn_database | List of MMDB paths providing AS data. Multiple files allowed; later entries override on overlap. | [] | no | +| enrichment.geoip.geo_database | List of MMDB paths providing geographic data. | [] | no | +| enrichment.geoip.optional | When true, missing files become startup warnings instead of fatal errors. | false | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### IPInfo MMDB + +Using IPInfo's MMDB feed (subscription required). + +```yaml +enrichment: + geoip: + asn_database: + - /opt/mmdb/ipinfo-asn.mmdb + geo_database: + - /opt/mmdb/ipinfo-city.mmdb + optional: false + +``` +###### Internal custom MMDB + +Built in-house with `mmdbwriter`. Combines public BGP data with internal CIDR labels. + +
+Config + +```yaml +enrichment: + geoip: + asn_database: + - /etc/netdata/internal-asn.mmdb + geo_database: + - /etc/netdata/internal-geo.mmdb + optional: false + +``` +
+ + + +### Lookups silently return empty + +The MMDB schema is non-standard or the IP types don't match (some custom builds +use `string` instead of `array` for ASN). Validate with `mmdblookup -f file.mmdb -i 8.8.8.8` +and confirm the standard fields are present. + + +### Plugin fails to start with optional=false + +File missing or unreadable at the configured path. Check permissions; the netdata +user must be able to read the file. + + + diff --git a/docs/Network Flows/IP Intelligence/DB-IP IP Intelligence.mdx b/docs/Network Flows/IP Intelligence/DB-IP IP Intelligence.mdx new file mode 100644 index 0000000000..44d1623806 --- /dev/null +++ b/docs/Network Flows/IP Intelligence/DB-IP IP Intelligence.mdx @@ -0,0 +1,174 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "DB-IP IP Intelligence" +learn_status: "Published" +learn_rel_path: "Network Flows/IP Intelligence" +keywords: [geoip, asn, dbip, db-ip, mmdb, ip intelligence, flow enrichment] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/ip-intelligence/db-ip-ip-intelligence" +slug: "/network-flows/ip-intelligence/db-ip-ip-intelligence" +--- + + +# DB-IP IP Intelligence + + + + + +Plugin: netflow-plugin +Module: dbip + + + +## Overview + +DB-IP is the **default** IP intelligence source for the Netdata netflow plugin. Its +MMDB-format databases are bundled with native packages (DEB, RPM) under +`/usr/share/netdata/topology-ip-intel/`. Refreshing pulls newer data from +`download.db-ip.com` via the bundled `topology-ip-intel-downloader`. + +Populates `SRC_COUNTRY`, `DST_COUNTRY`, `SRC_GEO_STATE`, `DST_GEO_STATE`, +`SRC_GEO_CITY`, `DST_GEO_CITY`, `SRC_GEO_LATITUDE`, `DST_GEO_LATITUDE`, +`SRC_GEO_LONGITUDE`, `DST_GEO_LONGITUDE`, plus the AS-number and AS-name fields +when included in the resolution chain. + +For the full IP-intelligence concept (MMDB format, lookup priority, internal-IP +handling, hot reload semantics), see +[IP Intelligence](https://learn.netdata.cloud/docs/network-flows/enrichment/ip-intelligence). + + +Files are read on plugin start and reloaded automatically every 30 seconds when +their mtime or size changes. Lookups happen in-process; there is no per-flow network +call. Auto-detection scans `${NETDATA_CACHE_DIR}/topology-ip-intel/` first, falling +back to the stock copy under `${NETDATA_STOCK_DATA_DIR}/topology-ip-intel/`. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Native packages ship the stock DB-IP MMDB files; the plugin auto-detects them at startup. No configuration required for the default install. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### DB-IP MMDB files + +Ships with native packages. For source builds, run the bundled downloader once +to populate `/var/cache/netdata/topology-ip-intel/`: + +```bash +sudo /usr/sbin/topology-ip-intel-downloader +``` + +Subsequent refreshes (e.g., monthly cron) re-fetch from db-ip.com. + + + +### Configuration + +#### Options + +Configure DB-IP under `enrichment.geoip` in `netflow.yaml`. Empty `asn_database` +and `geo_database` enable auto-detection. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enrichment.geoip.asn_database | List of MMDB paths providing AS data. Empty = auto-detect under cache/stock dirs. | [] (auto-detect) | no | +| enrichment.geoip.geo_database | List of MMDB paths providing geo data. Empty = auto-detect. | [] (auto-detect) | no | +| enrichment.geoip.optional | When true, missing or unreadable MMDBs are warnings, not fatal. Auto-detected files default to optional. | false (true when auto-detected) | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Default (auto-detect stock files) + +Native package install. No explicit configuration; the plugin finds the stock or cache copy automatically. + +```yaml +enrichment: + geoip: + asn_database: [] + geo_database: [] + optional: true + +``` +###### Explicit DB-IP paths + +Override auto-detection by pointing to specific DB-IP MMDBs (for example, after running the downloader to a non-standard location). + +
+Config + +```yaml +enrichment: + geoip: + asn_database: + - /var/cache/netdata/topology-ip-intel/topology-ip-asn.mmdb + geo_database: + - /var/cache/netdata/topology-ip-intel/topology-ip-geo.mmdb + optional: false + +``` +
+ + + +### Internal IPs appearing in random countries + +GeoIP databases have no entry for RFC 1918 / private space. The stock DB-IP +build tags private ranges so `*_AS_NAME` renders as "AS0 Private IP Address Space" +with empty country. With third-party MMDBs, results may vary. Declare your +internal CIDRs under `enrichment.networks` to override -- see +[Static metadata](https://learn.netdata.cloud/docs/network-flows/enrichment/static-metadata). + + +### Stale databases + +The plugin does not alert on staleness. Check file mtime: `ls -la /var/cache/netdata/topology-ip-intel/`. +Schedule a weekly cron of `topology-ip-intel-downloader` to keep data fresh. + + + diff --git a/docs/Network Flows/IP Intelligence/IP Intelligence.mdx b/docs/Network Flows/IP Intelligence/IP Intelligence.mdx new file mode 100644 index 0000000000..fe2fe2ffda --- /dev/null +++ b/docs/Network Flows/IP Intelligence/IP Intelligence.mdx @@ -0,0 +1,33 @@ +--- +sidebar_position: "150" +sidebar_label: "IP Intelligence" + +hide_table_of_contents: true +learn_status: "AUTOGENERATED" +slug: "/network-flows/ip-intelligence" +learn_link: "https://learn.netdata.cloud/docs/network-flows/ip-intelligence" +--- + +# IP Intelligence + +import { Grid, Box } from '@site/src/components/Grid_integrations'; + + + + + + + + + + + + + + + + + + + + diff --git a/docs/Network Flows/IP Intelligence/IPtoASN.mdx b/docs/Network Flows/IP Intelligence/IPtoASN.mdx new file mode 100644 index 0000000000..2559bf0882 --- /dev/null +++ b/docs/Network Flows/IP Intelligence/IPtoASN.mdx @@ -0,0 +1,149 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "IPtoASN" +learn_status: "Published" +learn_rel_path: "Network Flows/IP Intelligence" +keywords: [iptoasn, asn, bgp, public asn, ip intelligence] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/ip-intelligence/iptoasn" +slug: "/network-flows/ip-intelligence/iptoasn" +--- + + +# IPtoASN + + + + + +Plugin: netflow-plugin +Module: iptoasn + + + +## Overview + +[IPtoASN](https://iptoasn.com/) is a free public IP-to-ASN database derived from +BGP RIB snapshots. Daily updates, no license required. Use it as a free, open +alternative to MaxMind ASN data when license cost or terms matter. + +IPtoASN provides ASN data only -- no geographic data. Pair with DB-IP, MaxMind, +or another geo source for country/city enrichment. + +For the full IP-intelligence concept, see +[IP Intelligence](https://learn.netdata.cloud/docs/network-flows/enrichment/ip-intelligence). + + +The bundled `topology-ip-intel-downloader` supports IPtoASN as an ASN provider, +fetching the latest TSV and converting it to MMDB format the plugin can read. +Configure the downloader to use IPtoASN with `--asn iptoasn:combined`. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Not auto-detected as the default ASN source -- the plugin auto-detects DB-IP. To use IPtoASN as ASN, run the downloader explicitly. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### Run the downloader with IPtoASN as ASN source + +IPtoASN is a TSV file. The bundled downloader knows how to fetch and convert +it to MMDB: + +```bash +sudo /usr/sbin/topology-ip-intel-downloader \ + --asn iptoasn:combined \ + --geo dbip:city-lite +``` + +This produces ASN data from IPtoASN and geographic data from DB-IP. Schedule +this in cron (daily for ASN; weekly is enough for geo). + + + +### Configuration + +#### Options + +Once the downloader has produced MMDB files in the cache directory, the plugin +auto-detects them. To pin the path explicitly, set `enrichment.geoip.asn_database`. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enrichment.geoip.asn_database | Path to the IPtoASN-derived MMDB. Empty = auto-detect from cache directory. | [] | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### After running the downloader with IPtoASN + +Auto-detection picks up the cache copy. + +```yaml +enrichment: + geoip: + asn_database: [] + geo_database: [] + optional: true + +``` + + +### ASN names not appearing + +IPtoASN's data does not always carry a human-readable ASN organization name. +The plugin renders `AS{n}` (without a name) for those records. This is data-source- +level, not a plugin issue. Use MaxMind GeoLite2-ASN if you need richer name data. + + +### Outdated ASN attribution + +IPtoASN is rebuilt daily from BGP. Cron the downloader at least daily to keep +ASN attribution current with real-world routing changes. + + + diff --git a/docs/Network Flows/IP Intelligence/MaxMind GeoIP GeoLite2.mdx b/docs/Network Flows/IP Intelligence/MaxMind GeoIP GeoLite2.mdx new file mode 100644 index 0000000000..3d3fd4dd88 --- /dev/null +++ b/docs/Network Flows/IP Intelligence/MaxMind GeoIP GeoLite2.mdx @@ -0,0 +1,168 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "MaxMind GeoIP / GeoLite2" +learn_status: "Published" +learn_rel_path: "Network Flows/IP Intelligence" +keywords: [maxmind, geoip2, geolite2, geoip, asn, mmdb, ip intelligence] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "40" +learn_link: "https://learn.netdata.cloud/docs/network-flows/ip-intelligence/maxmind-geoip-geolite2" +slug: "/network-flows/ip-intelligence/maxmind-geoip-geolite2" +--- + + +# MaxMind GeoIP / GeoLite2 + + + + + +Plugin: netflow-plugin +Module: maxmind + + + +## Overview + +MaxMind GeoIP2 (commercial) and GeoLite2 (free tier with license key) MMDB databases +are read directly by the netflow plugin. The plugin uses any MMDB-format file that +exposes the standard schema -- it is not tied to MaxMind specifically, but MaxMind +is the canonical source and the format originator. + +Populates the same `SRC_COUNTRY`, `*_GEO_*`, and AS-name fields as DB-IP. Use this +integration when you have a MaxMind license and prefer their data over the bundled +DB-IP defaults. + +For the full IP-intelligence concept, see +[IP Intelligence](https://learn.netdata.cloud/docs/network-flows/enrichment/ip-intelligence). + + +You download the MaxMind MMDB files yourself (via `geoipupdate` or manual download), +then point the plugin at their paths in `netflow.yaml`. The plugin reloads on file +change every 30 seconds. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +Not auto-detected. You must configure the database paths explicitly. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### MaxMind license + downloaded MMDBs + +For GeoLite2 (free): create a MaxMind account, generate a license key, install +`geoipupdate`, and configure it to fetch `GeoLite2-City.mmdb` and +`GeoLite2-ASN.mmdb`. For GeoIP2 (paid): obtain a subscription and use the same +`geoipupdate` mechanism with your paid license key. + + + +### Configuration + +#### Options + +Override the default DB-IP auto-detection by pointing `asn_database` and `geo_database` +at your MaxMind MMDB files. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| enrichment.geoip.asn_database | Paths to MaxMind ASN MMDB files (typically GeoLite2-ASN.mmdb or GeoIP2-ISP.mmdb). | [] | yes | +| enrichment.geoip.geo_database | Paths to MaxMind geographic MMDB files (typically GeoLite2-City.mmdb or GeoIP2-City.mmdb). | [] | yes | +| enrichment.geoip.optional | When true, missing or unreadable MMDBs are warnings, not fatal at startup. | false | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### GeoLite2 (free tier) + +Standard `geoipupdate` install path. Free tier requires a license key. + +```yaml +enrichment: + geoip: + asn_database: + - /usr/share/GeoIP/GeoLite2-ASN.mmdb + geo_database: + - /usr/share/GeoIP/GeoLite2-City.mmdb + optional: false + +``` +###### GeoIP2 (paid) + +Commercial subscription. Higher accuracy, more frequent updates. + +
+Config + +```yaml +enrichment: + geoip: + asn_database: + - /usr/share/GeoIP/GeoIP2-ISP.mmdb + geo_database: + - /usr/share/GeoIP/GeoIP2-City.mmdb + optional: false + +``` +
+ + + +### License key missing or expired + +`geoipupdate` fails silently and the MMDB files become stale. Set up a working +`geoipupdate` cron and monitor its exit code. + + +### Schema differences between GeoLite2 and GeoIP2 + +Both share the standard MMDB structure but the `Anonymous IP`, `ISP`, and +`Connection Type` databases have GeoIP2-only fields the plugin does not read. +Use `City` for geographic enrichment and `ASN` (GeoLite2) or `ISP` (GeoIP2) +for AS data. + + + diff --git a/docs/Network Flows/Installation.mdx b/docs/Network Flows/Installation.mdx new file mode 100644 index 0000000000..04439d55d7 --- /dev/null +++ b/docs/Network Flows/Installation.mdx @@ -0,0 +1,168 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/installation.md" +sidebar_label: "Installation" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/installation" +slug: "/network-flows/installation" +--- + + +# Installation + +The netflow plugin is **packaged separately from the main Netdata Agent**. You install it on the same host where Netdata runs, after Netdata itself is in place. + +The package name is **`netdata-plugin-netflow`** on both Debian and RPM distributions. It is not installed by the standard `netdata` package or by the netdata-updater on its own — you have to install it explicitly on native-package systems. + +The static install (the kickstart `--static-only` path) bundles the plugin automatically. If you used the kickstart installer with the static option, no extra step is needed. + +## Prerequisites + +- A working Netdata Agent on the host that will receive flow data. +- That host must be reachable on UDP from your routers and switches (default port `2055`). +- Linux. The plugin is Linux-only. + +## Install on Debian / Ubuntu / Mint + +```bash +sudo apt update +sudo apt install netdata-plugin-netflow +sudo systemctl restart netdata +``` + +## Install on RHEL / Fedora / CentOS / Rocky / Alma + +```bash +sudo dnf install netdata-plugin-netflow +sudo systemctl restart netdata +``` + +(`yum install` works on older systems where `dnf` isn't present.) + +## Install on openSUSE + +```bash +sudo zypper install netdata-plugin-netflow +sudo systemctl restart netdata +``` + +## Static install (kickstart) + +If you installed Netdata using: + +```bash +wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh && \ + sh /tmp/netdata-kickstart.sh --static-only +``` + +…the netflow plugin is already installed under `/opt/netdata/usr/libexec/netdata/plugins.d/netflow-plugin`. No extra step. + +To verify: + +```bash +ls /opt/netdata/usr/libexec/netdata/plugins.d/netflow-plugin +``` + +## Source build + +Building from source requires a Rust toolchain (rustc + cargo, version 1.83 or later). When CMake detects Rust, the plugin is built and installed alongside the rest of Netdata. + +```bash +git clone https://github.com/netdata/netdata.git +cd netdata +sudo ./netdata-installer.sh +``` + +**Caveat:** source builds do **not** include the stock GeoIP / IP-intelligence database files. The plugin starts fine without them, but country, city, and AS-name fields will be empty until you run the downloader once: + +```bash +sudo /usr/sbin/topology-ip-intel-downloader +``` + +This populates `/var/cache/netdata/topology-ip-intel/` with the DB-IP-based MMDB files. The plugin auto-detects the cache copy on its next 30-second poll. See [GeoIP enrichment](/docs/network-flows/enrichment-concepts/ip-intelligence) for details and refresh scheduling. + +## What gets installed + +| Path | Purpose | +|---|---| +| `/usr/libexec/netdata/plugins.d/netflow-plugin` | The plugin binary (mode 0750, root:netdata) | +| `/usr/sbin/topology-ip-intel-downloader` | Helper for refreshing the GeoIP / IP-intel MMDBs | +| `/usr/lib/netdata/conf.d/netflow.yaml` | Stock configuration (read-only reference; copy to `/etc/netdata/netflow.yaml` to customise) | +| `/usr/lib/netdata/conf.d/topology-ip-intel.yaml` | IP-intel downloader configuration | +| `/usr/share/netdata/topology-ip-intel/topology-ip-asn.mmdb` | Stock ASN database (DB-IP) | +| `/usr/share/netdata/topology-ip-intel/topology-ip-geo.mmdb` | Stock geographic database (DB-IP) | + +(Paths assume native packages. Static installs put everything under `/opt/netdata/`.) + +## Verify the plugin is running + +After installation and restart: + +```bash +sudo journalctl -u netdata --since "5 minutes ago" | grep -E 'netflow|listener' +``` + +You should see entries indicating that the plugin loaded its config and that the UDP listener bound to its port. + +Quick sanity check: + +```bash +sudo ss -unlp | grep 2055 +``` + +A line for `netflow-plugin` confirms the listener is up. + +## Open Netdata to confirm + +Open the Netdata UI in your browser. The **Network Flows** tab should appear in the top navigation. The plugin's operational charts also appear under the standard charts page in the `netflow` family. + +If the tab doesn't appear, or appears empty: + +- Check that the plugin process is running: `pgrep -fa netflow-plugin`. +- Check Netdata Cloud SSO: the Network Flows function requires authenticated access to the agent's space. +- See [Troubleshooting](/docs/network-flows/troubleshooting). + +## Configuring flow sources + +Installing the plugin enables it. To actually see flow data, you need to configure a router, switch, or software exporter to send NetFlow / IPFIX / sFlow datagrams to this host's UDP port 2055. + +That's the next step: + +- [Quick Start](/docs/network-flows/quick-start) — A 15-minute path to your first flow data. +- [Sources / NetFlow](/docs/network-flows/sources/netflow) — Vendor configurations for NetFlow. +- [Sources / IPFIX](/docs/network-flows/sources/ipfix) — Vendor configurations for IPFIX. +- [Sources / sFlow](/docs/network-flows/sources/sflow) — Vendor configurations for sFlow. + +## Uninstall + +```bash +# Debian / Ubuntu +sudo apt remove netdata-plugin-netflow + +# RHEL / Fedora / CentOS / Rocky / Alma +sudo dnf remove netdata-plugin-netflow + +# openSUSE +sudo zypper remove netdata-plugin-netflow +``` + +Remove the configuration if you also want to clean up: + +```bash +sudo rm /etc/netdata/netflow.yaml /etc/netdata/topology-ip-intel.yaml +``` + +The flow journals at `/var/cache/netdata/flows/` and `/var/cache/netdata/topology-ip-intel/` are not removed by the package manager. Delete them manually if you want to reclaim the disk: + +```bash +sudo rm -rf /var/cache/netdata/flows /var/cache/netdata/topology-ip-intel +``` + +(Warning: this deletes all your historical flow data.) + +## What's next + +- [Quick Start](/docs/network-flows/quick-start) — Configure your first source and see traffic in the dashboard. +- [Configuration](/docs/network-flows/configuration) — Tune the listener, retention, and enrichment. +- [Troubleshooting](/docs/network-flows/troubleshooting) — When something doesn't work. diff --git a/docs/Network Flows/Investigation Playbooks.mdx b/docs/Network Flows/Investigation Playbooks.mdx new file mode 100644 index 0000000000..a9c8fe069c --- /dev/null +++ b/docs/Network Flows/Investigation Playbooks.mdx @@ -0,0 +1,191 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/investigation-playbooks.md" +sidebar_label: "Investigation Playbooks" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "110" +learn_link: "https://learn.netdata.cloud/docs/network-flows/investigation-playbooks" +slug: "/network-flows/investigation-playbooks" +--- + + +# Investigation playbooks + +Step-by-step recipes for common questions, all using the Netdata Network Flows tab. Each playbook fits in a 5-15 minute investigation window. + +## Playbook 1 — "The link is saturated, who's responsible?" + +**The situation.** SNMP shows your Internet link at 95% utilisation. Users complain about slowness. + +**The goal.** Identify the talker(s) consuming the bandwidth. + +**Steps.** + +1. **Open the Network Flows tab** with the default view (Sankey + Table). Set the time range to **the last 15 minutes** — recent enough to be live, wide enough to smooth bursts. + +2. **Filter to the saturated interface.** In the filter ribbon, set: + - `Exporter Name` = the router with the saturated link + - `Output Interface Name` = the interface name (or `Input Interface Name` if you want incoming traffic) + + This eliminates the doubling effect and shows only one direction. + +3. **Change the aggregation to "who's responsible".** Click the group-by selector and change the fields to: + - `Source ASN` → `Destination ASN` (for an Internet-edge link) + + Or for an internal link: + - `Source IP` → `Destination IP` + +4. **Read the Sankey.** The widest band is your top talker pair. Click on the wide band to drill in. + +5. **If a single ASN/IP dominates** — that's your answer. Click the value to add it as a filter and look at the table for the specific 5-tuple details. + +6. **If traffic is evenly distributed** across many sources — the problem is aggregate demand, not a single offender. The link genuinely needs more capacity, or you need traffic shaping. Move to Playbook 3. + +**What to record.** + +- Timestamp range of the investigation +- Top 3 talker pairs and their byte volumes +- Whether this is a one-time spike or sustained +- The URL of the dashboard view (preserves all filters and aggregation) + +**Common findings.** + +- Backup software running during business hours. +- A SaaS sync or cloud upload. +- Misconfigured automation (e.g., logs being shipped to the wrong place). +- New user / new application doing something unexpected. + +## Playbook 2 — "Investigating a specific IP" + +**The situation.** A security alert references an IP address. You need to know what it talked to, when, and how much. + +**The goal.** Construct a timeline and traffic profile for that IP. + +**Steps.** + +1. **Open the Network Flows tab** with **the last 24 hours** as the time range. + +2. **Filter by the IP.** In the filter ribbon: + - For inbound investigation: `Destination IP` = the IP + - For outbound investigation: `Source IP` = the IP + - For both directions: filter both, separately, in two browser tabs + + IP filtering forces tier 0 (raw retention) — the time depth is bounded by your raw-tier retention. If you need to look further back than that, the data isn't there. + +3. **Switch to Time-Series view.** This shows when the IP was active. Look for: + - When did activity start? End? + - Is it constant, periodic, or bursty? + - Does it correlate with a known event (deployment, business hours, maintenance window)? + +4. **Switch back to Sankey + Table.** Change the group-by to surface the relevant context: + - `Destination IP` → `Destination Port` → `Destination Country` (if the suspect is a source) + - `Source IP` → `Source ASN` (if the suspect is a destination) + +5. **Read the table.** The top rows show the IP's most-talked-to peers, ranked by bytes. Look for: + - Unknown external IPs in unexpected geographies. + - Connections on unusual ports (anything not in your normal protocol mix). + - Sustained outbound transfers (potential exfiltration) vs short bursts (likely normal). + +6. **For each suspicious peer, drill in.** Add the peer IP to the filter ribbon, switch back to Time-Series. Confirm the timeline aligns with the original alert. + +**What to record.** + +- Time range of all activity by the IP +- Top destinations and their byte/packet counts +- Whether the activity is consistent with a legitimate use (backup, SaaS sync) or anomalous +- The URL of each dashboard view used in the investigation + +**Caveats.** + +- IP filter forces tier 0; older data may not be available. +- If the IP is internal and you haven't declared it under `enrichment.networks`, GeoIP may misrepresent its country. +- If the IP is a NAT public address, multiple internal hosts may be hidden behind it. Cross-check with NAT translation logs. + +## Playbook 3 — "Justifying a link upgrade" + +**The situation.** A WAN circuit is at 80% utilisation during peak hours. Finance wants justification before approving an upgrade. + +**The goal.** Produce a defensible trend showing growth and projecting the date of saturation. + +**Steps.** + +1. **Open the Network Flows tab** with **the last 30 days** as the time range. (Adjust based on tier-1/5/60 retention. If your retention is shorter, use whatever you have.) + +2. **Filter to the WAN interface.** Set `Exporter Name` and `Output Interface Name` (or input — pick one direction). This removes the doubling effect. + +3. **Switch to Time-Series view.** The chart now shows ~30 days of bandwidth on the link. The bucket size auto-adjusts to roughly 1 hour at this range. + +4. **Identify the trend.** Look at the daily peaks (one curve cycle = one day). The peak should be growing month-over-month. Eyeball the slope. + +5. **Identify the growth driver.** Switch back to Sankey + Table, group by `Destination ASN` or `Application` (port). Compare top consumers from the start of the period to the end. New entries that weren't there 30 days ago are growth drivers. + +6. **Compute the upgrade need.** Take the current peak (e.g., 80% of 100 Mbps = 80 Mbps), project forward at the observed monthly growth rate (e.g., 10%/month = ~30%/quarter), and find when it crosses 100% (or 70% if you want headroom). + + Example: if peak grows from 70 Mbps to 80 Mbps over 30 days, that's roughly 14% monthly growth. At that rate it crosses 100 Mbps in ~2 months and 200 Mbps would buy you ~1 year. + +**What to record.** + +- Trend chart (screenshot or shareable URL) +- Growth driver: the specific applications / services consuming the new bandwidth +- Projected saturation date and recommended upgrade timeline +- Sampling rate of the exporter (so the numbers can be interpreted) + +**Caveats.** + +- Always note the sampling rate. A change in sampling rate during the analysis window invalidates the trend. +- A large spike one day shouldn't drive the projection. Use weekly peaks (averaged across same-day-of-week) for stability. +- If your retention is shorter than 30 days, use what you have but caveat the projection. + +## Playbook 4 — "Scoping a security alert" + +**The situation.** Your IDS / EDR / SIEM fired an alert: an internal host communicated with a known-malicious external IP. You have the internal IP, the external IP, and a rough time window. + +**The goal.** Determine the scope and timeline. Did other internal hosts talk to the same external IP? When did it start? How much data was exchanged? + +**Steps.** + +1. **Open the Network Flows tab** with the time range covering 24 hours before the alert through now. + +2. **Filter by the external IP.** In the filter ribbon: `Destination IP` = the external IP. + + This forces tier 0. Time depth is your raw-tier retention. + +3. **Switch to Time-Series view.** When did communication start? Is it ongoing? Did it correlate with the alert time? + +4. **Switch to Sankey + Table.** Group by `Source IP`. The result is "every internal IP that talked to this external IP, ranked by bytes". + + - If only one internal host appears, scope is contained. + - If multiple appear, you have a broader scope. Investigate each. + +5. **For the alerted internal host**, swap the filter: `Source IP` = the internal host (remove the external filter). Group by `Destination IP` → `Destination Country` → `Destination ASN`. Look for other suspicious peers. + +6. **Reverse-direction check.** Switch the filter to the external IP as `Source IP` (now you're looking at incoming traffic from it). Internal hosts that received connections from the external IP show up — useful for inbound C2 / probe analysis. + +7. **Geographic check.** Switch to Country Map. The location of the external IP gives a quick "where" — useful to compare against what your threat intelligence said. + +**What to record.** + +- All internal IPs that communicated with the external IP, time ranges, byte counts +- Other suspicious destinations the alerted host talked to in the same window +- Whether traffic is ongoing or stopped +- The dashboard URL of each view (for the incident report) + +**Caveats.** + +- Sampled flows can miss small connections. Beaconing at low rates may not be visible at 1-in-1000 sampling. If you sample, your security investigation has a floor. +- An external IP behind a CDN may be one of many destinations served by that infrastructure. ASN-level analysis (`Destination ASN`) is often more informative than IP. +- The malicious IP being public doesn't mean the internal host was compromised — false positives in threat intel are common. Cross-check with the host's logs. + +## A note on the dashboard + +All the playbooks above use the same controls: time range, filters, group-by fields, view switcher. Once you're comfortable with these four, every investigation becomes a permutation. Mastering the tool means knowing which permutation fits the question. + +The URL preserves all your selections — copy it and paste into your incident-management ticket so anyone reviewing has the exact same view you saw. + +## What's next + +- [Sankey and Table](/docs/network-flows/visualization/sankey-and-table) — Full reference for the default view. +- [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) — How to narrow effectively. +- [Time-Series](/docs/network-flows/visualization/time-series) — Trends over the time range. +- [Anti-patterns](/docs/network-flows/anti-patterns) — What "wrong" looks like and why. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — Confirming your numbers before acting on them. diff --git a/docs/Network Flows/Network Flows.mdx b/docs/Network Flows/Network Flows.mdx new file mode 100644 index 0000000000..71b82ad202 --- /dev/null +++ b/docs/Network Flows/Network Flows.mdx @@ -0,0 +1,72 @@ +--- +sidebar_label: "Network Flows" +sidebar_position: "110" +hide_table_of_contents: true +learn_status: "AUTOGENERATED" +slug: "/network-flows" +learn_link: "https://learn.netdata.cloud/docs/network-flows" +--- + +# Network Flows + +import { Grid, Box } from '@site/src/components/Grid_integrations'; + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/Network Flows/Network Identity Sources/AWS IP Ranges.mdx b/docs/Network Flows/Network Identity Sources/AWS IP Ranges.mdx new file mode 100644 index 0000000000..5d65fdd7c3 --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/AWS IP Ranges.mdx @@ -0,0 +1,177 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "AWS IP Ranges" +learn_status: "Published" +learn_rel_path: "Network Flows/Network Identity Sources" +keywords: [aws, amazon, cloud, ip ranges, vpc, ec2, prefix list] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources/aws-ip-ranges" +slug: "/network-flows/network-identity-sources/aws-ip-ranges" +--- + + +# AWS IP Ranges + + + + + +Plugin: netflow-plugin +Module: aws-ip-ranges + + + +## Overview + +AWS publishes a continuously updated JSON file listing every public IP prefix used +by AWS services -- per region, per service. This integration fetches that file +periodically, transforms it via a jq expression, and uses the result to label flow +records destined to / from AWS with `*_NET_TENANT="amazon"` plus a per-region tag. + +The result: traffic to/from AWS shows up clearly in dashboards as "amazon", with +per-region and per-service breakdown if you customize the jq transform. + +For the full network-identity concept (merge order, jq transform, TLS verification), +see [Network Identity](https://learn.netdata.cloud/docs/network-flows/enrichment/network-identity). + + +The plugin issues a periodic GET to `https://ip-ranges.amazonaws.com/ip-ranges.json`, +parses the JSON body, runs the configured jq transform via the [jaq](https://github.com/01mf02/jaq) +library, and merges the resulting prefix-labeled rows into the network-attributes trie. + + +This integration is only supported on the following platforms: + +- Linux + +This integration supports multiple instances configured side-by-side. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Add an entry under enrichment.network_sources to enable. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### Outbound HTTPS to AWS + +The agent host must be able to reach `https://ip-ranges.amazonaws.com/ip-ranges.json`. +No AWS credentials needed -- the file is public. + + + +### Configuration + +#### Options + +Add a named entry under `enrichment.network_sources`. The `name` you choose appears +in flow records via the `*_NET_TENANT` field (when your jq transform sets it). + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| url | AWS publishes the master file at this URL. | https://ip-ranges.amazonaws.com/ip-ranges.json | yes | +| interval | How often to fetch. AWS updates the file roughly every 15 minutes; daily is enough for most uses. | 60s (loop floor) | no | +| timeout | Per-request timeout. | 60s | no | +| transform | jq expression that converts the AWS response into objects with `prefix` and label fields. | . | yes | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Tag all AWS prefixes by region and service + +Sets tenant=amazon, region=`aws-region`, role=`service-name`. + +```yaml +enrichment: + network_sources: + aws: + url: "https://ip-ranges.amazonaws.com/ip-ranges.json" + interval: 24h + timeout: 60s + transform: | + (.prefixes + .ipv6_prefixes)[] | { + prefix: (.ip_prefix // .ipv6_prefix), + tenant: "amazon", + region: .region, + role: (.service | ascii_downcase) + } + +``` +###### AWS S3 only + +Filter to a single AWS service for narrower tagging. + +
+Config + +```yaml +enrichment: + network_sources: + aws-s3: + url: "https://ip-ranges.amazonaws.com/ip-ranges.json" + interval: 24h + transform: | + (.prefixes + .ipv6_prefixes)[] + | select(.service == "S3") + | { + prefix: (.ip_prefix // .ipv6_prefix), + tenant: "amazon", + role: "s3", + region: .region + } + +``` +
+ + + +### Empty result from the transform is treated as failure + +If the jq filter happens to produce nothing (e.g., AWS responds with no prefixes), +the source backs off as if it errored. Check the journal for `network-sources` warnings. + + +### TLS verification cannot be disabled + +`tls.skip_verify: true` is rejected by validation. Use `tls.ca_file` for +custom-CA paths if needed. + + + diff --git a/docs/Network Flows/Network Identity Sources/Azure IP Ranges.mdx b/docs/Network Flows/Network Identity Sources/Azure IP Ranges.mdx new file mode 100644 index 0000000000..5dc60e7345 --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/Azure IP Ranges.mdx @@ -0,0 +1,159 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "Azure IP Ranges" +learn_status: "Published" +learn_rel_path: "Network Flows/Network Identity Sources" +keywords: [azure, microsoft, cloud, ip ranges, service tags] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources/azure-ip-ranges" +slug: "/network-flows/network-identity-sources/azure-ip-ranges" +--- + + +# Azure IP Ranges + + + + + +Plugin: netflow-plugin +Module: azure-ip-ranges + + + +## Overview + +Azure publishes "Service Tags" data describing IP ranges per region and per service. +The Azure publication mechanism is **less convenient** than AWS / GCP -- the +authoritative URL contains a date stamp that changes weekly, so you cannot use +a single stable URL. + +For automated fetching, you have two options: +1. Mirror the file in your own infrastructure (a script that resolves the latest + URL via the Azure CLI, downloads, and serves at a stable internal URL). +2. Skip Azure IP Ranges entirely and rely on GeoIP / ASN data for Azure + attribution (Azure ASN is 8075). + +For the full network-identity concept, see +[Network Identity](https://learn.netdata.cloud/docs/network-flows/enrichment/network-identity). + + +Periodic HTTPS GET against your stable mirror URL, jq transform, merge into +network-attributes trie. The plugin does not handle Azure's date-stamped URL +rotation -- you provide a stable URL via your own mirror. + + +This integration is only supported on the following platforms: + +- Linux + +This integration supports multiple instances configured side-by-side. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Set up your own URL mirror, then add an entry under enrichment.network_sources. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### A stable URL for the Azure Service Tags JSON + +Azure's authoritative URL embeds a date stamp that changes weekly. A simple +workaround: a cron job that calls +`az network list-service-tags --location global -o json` (Azure CLI) and +writes the result to a stable path on an internal HTTP server. The plugin then +fetches from that stable URL. + + +#### Outbound HTTPS to your mirror + +No Azure credentials needed by the plugin itself; credentials only matter on +the side that does the upstream Azure CLI call. + + + +### Configuration + +#### Options + +Add a named entry under `enrichment.network_sources` pointing at your mirror URL. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| url | Stable URL to your locally-mirrored Azure Service Tags JSON. | | yes | +| transform | jq expression mapping the values[] array to per-prefix objects. | . | yes | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Internal mirror of Azure Service Tags + +Tag every Azure prefix with tenant=azure plus region/service. + +```yaml +enrichment: + network_sources: + azure: + url: "https://internal.example/azure-service-tags.json" + interval: 24h + transform: | + .values[] + | .id as $id + | .properties.region as $region + | (.properties.systemService // "") as $service + | .properties.addressPrefixes[] + | { + prefix: ., + tenant: "azure", + region: ($region // ""), + role: ($service | ascii_downcase) + } + +``` + + +### Empty results + +The Azure Service Tags JSON has nested structure (`values[].properties.addressPrefixes[]`). +If your jq doesn't unwrap correctly, every fetch yields zero rows and the source +backs off. Test the jq locally with `jq < azure-service-tags.json`. + + + diff --git a/docs/Network Flows/Network Identity Sources/GCP IP Ranges.mdx b/docs/Network Flows/Network Identity Sources/GCP IP Ranges.mdx new file mode 100644 index 0000000000..b95acb8f72 --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/GCP IP Ranges.mdx @@ -0,0 +1,137 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "GCP IP Ranges" +learn_status: "Published" +learn_rel_path: "Network Flows/Network Identity Sources" +keywords: [gcp, google cloud, cloud, ip ranges, prefix list] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources/gcp-ip-ranges" +slug: "/network-flows/network-identity-sources/gcp-ip-ranges" +--- + + +# GCP IP Ranges + + + + + +Plugin: netflow-plugin +Module: gcp-ip-ranges + + + +## Overview + +Google Cloud publishes its public IP prefixes at `https://www.gstatic.com/ipranges/cloud.json`, +updated periodically. This integration fetches the file and labels flow records +to/from Google Cloud with `*_NET_TENANT="gcp"` plus per-scope and per-service tags. + +For the full network-identity concept, see +[Network Identity](https://learn.netdata.cloud/docs/network-flows/enrichment/network-identity). + + +Periodic HTTPS GET, jq transform, merge into network-attributes trie. Same mechanism +as AWS IP Ranges, different URL and JSON shape. + + +This integration is only supported on the following platforms: + +- Linux + +This integration supports multiple instances configured side-by-side. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Add an entry under enrichment.network_sources to enable. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### Outbound HTTPS to Google + +The agent host must be able to reach `https://www.gstatic.com/ipranges/cloud.json`. +No GCP credentials needed -- the file is public. + + + +### Configuration + +#### Options + +Add a named entry under `enrichment.network_sources`. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| url | GCP publishes the master file here. | https://www.gstatic.com/ipranges/cloud.json | yes | +| transform | jq expression mapping `prefixes[]` to `prefix` + label objects. | . | yes | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Tag all GCP prefixes by service and scope + +Sets tenant=gcp, role=`service`, region=`scope`. + +```yaml +enrichment: + network_sources: + gcp: + url: "https://www.gstatic.com/ipranges/cloud.json" + interval: 24h + transform: | + .prefixes[] | { + prefix: (.ipv4Prefix // .ipv6Prefix), + tenant: "gcp", + role: .service, + region: .scope + } + +``` + + +### Customer-only ranges + +GCP also publishes a `goog.json` file (broader: includes Google services beyond +cloud). Use `cloud.json` for compute IP attribution; `goog.json` if you also +want to tag Google's other services. + + + diff --git a/docs/Network Flows/Network Identity Sources/Generic JSON-over-HTTP IPAM.mdx b/docs/Network Flows/Network Identity Sources/Generic JSON-over-HTTP IPAM.mdx new file mode 100644 index 0000000000..f0fe8789a0 --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/Generic JSON-over-HTTP IPAM.mdx @@ -0,0 +1,228 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "Generic JSON-over-HTTP IPAM" +learn_status: "Published" +learn_rel_path: "Network Flows/Network Identity Sources" +keywords: [ipam, cmdb, infoblox, bluecat, phpipam, custom, prefix list] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "40" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources/generic-json-over-http-ipam" +slug: "/network-flows/network-identity-sources/generic-json-over-http-ipam" +--- + + +# Generic JSON-over-HTTP IPAM + + + + + +Plugin: netflow-plugin +Module: generic-ipam + + + +## Overview + +The catch-all integration. Any IPAM, CMDB, or service that exposes prefix metadata +via an HTTP-fetchable JSON endpoint can plug into Netdata's flow enrichment via this +mechanism. Examples: Infoblox WAPI, BlueCat REST API, phpIPAM, internal-built CMDB +endpoints, ServiceNow CMDB queries, custom Lambda functions producing JSON. + +You define the URL, the HTTP method, headers (for auth), and a jq transform that +converts the response into objects with `prefix` + label fields. + +For the full network-identity concept, see +[Network Identity](https://learn.netdata.cloud/docs/network-flows/enrichment/network-identity). + + +Periodic HTTPS GET (or POST) to a configured URL with optional headers, optional +custom CA / mTLS, jq transform of the response, merge into network-attributes trie. + + +This integration is only supported on the following platforms: + +- Linux + +This integration supports multiple instances configured side-by-side. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Add an entry per IPAM source under enrichment.network_sources. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### An HTTP/HTTPS endpoint returning JSON + +The endpoint must produce a parseable JSON document. The plugin only supports +GET and POST. There is no pagination, no cursor following, no OAuth flow -- +if your IPAM needs those, wrap it in an internal aggregator. + + +#### Authentication via headers + +The plugin has no built-in auth helpers. Set whatever the API needs -- bearer +tokens, basic-auth header, custom API-key headers -- via `headers:`. Store +tokens carefully; they're written into the YAML. + + + +### Configuration + +#### Options + +Add a named entry under `enrichment.network_sources`. The keys below are the +full set of options. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| url | HTTP/HTTPS endpoint. | | yes | +| method | GET or POST. | GET | no | +| interval | Refresh interval (loop floors at 60s). | 60s | no | +| timeout | Per-request timeout. | 60s | no | +| headers | Map of additional HTTP request headers (e.g., authentication). | \{} | no | +| transform | jq expression converting response to \{prefix, name?, role?, site?, region?, country?, state?, city?, tenant?, asn?, asn_name?} stream. | . | yes | +| tls.enable | Use custom TLS settings (custom CA, mTLS). | false | no | +| tls.ca_file | PEM file with the CA bundle. | | no | +| tls.cert_file | PEM file with the client certificate (mTLS). | | no | +| tls.key_file | PEM file with the client private key. | | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### phpIPAM with API token + +phpIPAM exposes `/api//subnets/`. Replace `` with your phpIPAM app name. Use the standard transform. + +```yaml +enrichment: + network_sources: + phpipam: + url: "https://ipam.example/api/netdata/subnets/" + headers: + token: "abcdef..." + interval: 10m + transform: | + .data[] | { + prefix: (.subnet + "/" + (.mask|tostring)), + name: .description, + tenant: (.custom_tenant // ""), + site: (.location.name // "") + } + +``` +###### Custom internal CMDB (POST with body) + +When your CMDB requires POST with a query body. Define `method` and append the body via headers/url. The plugin's body support is limited -- prefer GET endpoints when possible. + +
+Config + +```yaml +enrichment: + network_sources: + cmdb: + url: "https://cmdb.example/query/networks" + method: POST + headers: + Authorization: "Bearer ..." + Content-Type: "application/json" + interval: 30m + transform: | + .results[] | { + prefix: .cidr, + tenant: .organization, + site: .datacenter, + role: .purpose + } + +``` +
+ +###### Internal IPAM with mTLS + +When the IPAM is behind your internal PKI. + +
+Config + +```yaml +enrichment: + network_sources: + corp_ipam: + url: "https://ipam.corp/api/networks" + tls: + enable: true + ca_file: /etc/netdata/ssl/corp-ca.pem + cert_file: /etc/netdata/ssl/netdata.crt + key_file: /etc/netdata/ssl/netdata.key + interval: 10m + transform: | + .[] | { + prefix: .cidr, + name: .label, + tenant: .tenant + } + +``` +
+ + + +### Endpoint requires pagination + +The plugin does not paginate. Either raise the page size to cover your inventory, +or wrap the endpoint with an internal aggregator that returns all results at one URL. + + +### TLS verification cannot be disabled + +`tls.skip_verify` and `tls.verify: false` are rejected by validation. Use +`tls.ca_file` to trust internal CAs. + + +### Empty result back-off + +An empty jq result is treated as a fetch failure. If your IPAM legitimately +returns no prefixes (quiet state), the source backs off as if it errored. +Workaround: have the upstream return at least one synthetic prefix. + + + diff --git a/docs/Network Flows/Network Identity Sources/NetBox.mdx b/docs/Network Flows/Network Identity Sources/NetBox.mdx new file mode 100644 index 0000000000..d4e04a7d0a --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/NetBox.mdx @@ -0,0 +1,198 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "NetBox" +learn_status: "Published" +learn_rel_path: "Network Flows/Network Identity Sources" +keywords: [netbox, ipam, dcim, source of truth, prefix list] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "50" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources/netbox" +slug: "/network-flows/network-identity-sources/netbox" +--- + + +# NetBox + + + + + +Plugin: netflow-plugin +Module: netbox + + + +## Overview + +[NetBox](https://netboxlabs.com/oss/netbox/) is the most widely deployed open-source +IPAM / DCIM. Many networks already curate prefix metadata there -- tenant, site, +role, VRF -- and want flow data to inherit those labels automatically rather than +duplicating them in `netflow.yaml`. + +This integration polls NetBox's Prefixes API at a configurable interval, transforms +the response with jq, and labels flow records with the prefix metadata. + +For the full network-identity concept, see +[Network Identity](https://learn.netdata.cloud/docs/network-flows/enrichment/network-identity). + + +Periodic HTTPS GET to a NetBox API endpoint with a Bearer token in the +`Authorization` header. jq transform produces per-prefix objects with the labels +you want -- typically `tenant.name`, `site.name`, `role.name`, `description`. + +NetBox paginates results -- there is **no automatic pagination** in this plugin. +For inventories larger than the default page size (50), wrap NetBox with a +server-side aggregator that returns the full list at one URL. + + +This integration is only supported on the following platforms: + +- Linux + +This integration supports multiple instances configured side-by-side. + + +### Default Behavior + +#### Auto-Detection + +Disabled by default. Add an entry under enrichment.network_sources with your NetBox URL and API token. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### NetBox API token with read scope on Prefixes + +In NetBox, create or use a service account, generate an API token, scope it +read-only to the Prefixes endpoint. The token goes in the `Authorization` header. + + +#### A bulk endpoint or aggregator + +The plugin does not paginate. If your NetBox has more prefixes than fit in the +default page (`?limit=50`), either raise `limit` (`?limit=10000`) or expose +an internal endpoint that aggregates all pages and serves them at one URL. + + + +### Configuration + +#### Options + +Add a named entry under `enrichment.network_sources` pointing at your NetBox. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| url | NetBox Prefixes API endpoint, with `?limit=` if needed. | | yes | +| headers.Authorization | NetBox API token, prefixed with "Token ". | | yes | +| interval | How often to refresh. NetBox is your source of truth -- 5-15 minutes is typical. | 60s | no | +| transform | jq expression mapping `.results[]` (NetBox's response shape) to per-prefix objects. | . | yes | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### NetBox with API token and standard label set + +Tags prefixes with tenant, site, role, and the NetBox description. + +```yaml +enrichment: + network_sources: + netbox: + url: "https://netbox.example.internal/api/ipam/prefixes/?limit=10000" + headers: + Authorization: "Token abcdef0123456789" + interval: 5m + timeout: 30s + transform: | + .results[] | { + prefix: .prefix, + tenant: (.tenant.name // ""), + site: (.site.name // ""), + role: (.role.name // ""), + name: .description + } + +``` +###### NetBox with mTLS to internal CA + +When NetBox is behind your internal PKI; use tls.ca_file plus client cert. + +
+Config + +```yaml +enrichment: + network_sources: + netbox: + url: "https://netbox.example.internal/api/ipam/prefixes/?limit=10000" + headers: + Authorization: "Token abcdef0123456789" + interval: 5m + tls: + enable: true + ca_file: /etc/netdata/ssl/internal-ca.pem + cert_file: /etc/netdata/ssl/netdata.crt + key_file: /etc/netdata/ssl/netdata.key + transform: | + .results[] | { + prefix: .prefix, + tenant: (.tenant.name // ""), + site: (.site.name // ""), + role: (.role.name // ""), + name: .description + } + +``` +
+ + + +### Only first page of results loaded + +NetBox paginates by default at 50 results. The plugin does not follow `next` +links. Use `?limit=10000` (or the actual count) on the URL, or expose an +aggregating endpoint server-side. + + +### Token missing or wrong scope + +NetBox returns 403 silently consumed by the plugin's HTTP error path. Watch +the journal for `network-sources` warnings; verify with curl: +`curl -H "Authorization: Token " https://netbox/api/ipam/prefixes/`. + + + diff --git a/docs/Network Flows/Network Identity Sources/Network Identity Sources.mdx b/docs/Network Flows/Network Identity Sources/Network Identity Sources.mdx new file mode 100644 index 0000000000..2f3f60a6f4 --- /dev/null +++ b/docs/Network Flows/Network Identity Sources/Network Identity Sources.mdx @@ -0,0 +1,37 @@ +--- +sidebar_position: "160" +sidebar_label: "Network Identity Sources" + +hide_table_of_contents: true +learn_status: "AUTOGENERATED" +slug: "/network-flows/network-identity-sources" +learn_link: "https://learn.netdata.cloud/docs/network-flows/network-identity-sources" +--- + +# Network Identity Sources + +import { Grid, Box } from '@site/src/components/Grid_integrations'; + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/Network Flows/Overview.mdx b/docs/Network Flows/Overview.mdx new file mode 100644 index 0000000000..3c9d857ac9 --- /dev/null +++ b/docs/Network Flows/Overview.mdx @@ -0,0 +1,167 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/README.md" +sidebar_label: "Overview" +learn_status: "Published" +learn_rel_path: "Network Flows" +description: "Collect, enrich, and visualize NetFlow, IPFIX, and sFlow data with the Netdata Agent." +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/overview" +slug: "/network-flows/overview" +--- + + +# Network Flows + +Netdata can collect, store, and visualise network flow data from your routers and switches. You see who is talking to whom on your network, how much data they exchanged, over what protocols, and to which countries — without inspecting packet contents and without an external database. + +This section is for network engineers, security analysts, and IT managers who want to understand what's happening on the wire. The same dashboard answers all three audiences from the same data. + +## What flow data is + +A network flow is a *summary of a conversation*. Routers and switches watch packets as they pass through, group them by source IP, destination IP, source port, destination port, and protocol, and produce one record per flow when the conversation ends or after a timeout. That record contains: + +- The endpoints (IPs, ports, protocol) +- How much data moved (bytes, packets) +- When the flow started and ended +- Optional metadata: AS numbers, interface indexes, TCP flags, ToS, MAC addresses, VLAN IDs + +Think of flow data like an itemised phone bill. You can see who called whom, when, and for how long. You **cannot** read the conversation. That trade-off is the entire value proposition: low storage, no privacy intrusion, complete coverage of all traffic — but no payload visibility. + +## What you can answer + +- Who is using the most bandwidth right now? Last week? Last month? +- Where does our traffic go — by country, ASN, port, or protocol? +- A specific IP appeared in a security alert — what did it talk to, when, and how much? +- Is this a normal pattern for this time of day, or has something changed? +- Should we upgrade this Internet link, and when? +- Which interfaces of which routers are saturated? + +## What you cannot answer + +- **Is the application slow?** Flow data has no payload, no response times, no error messages. Use APM or application logs. +- **What's the latency?** Flow records show duration, not round-trip time. Duration is dominated by timeout configuration, not network performance. Use ICMP probes or hardware telemetry. +- **What did the user actually do?** Flow data sees ports and IPs, not user actions or URLs. +- **Did this packet arrive late?** Flow data is aggregated; sub-second jitter and microbursts are invisible. + +If those are your questions, flow data is the wrong tool. You probably need application performance monitoring (APM), logging, or packet capture. + +## Two things to know on day one + +These two facts are not Netdata-specific. They're how flow data works on every collector. Understanding them up-front saves a lot of head-scratching when you first see the dashboard. + +### Traffic appears doubled by default + +A router exports flow records for both ingress and egress on every monitored interface. A single packet entering interface A and leaving interface B produces two records: one tagged ingress on A, one tagged egress on B. + +If you sum every flow record without filtering, you see roughly **2× the actual traffic**. With a second router on the same path, **4×**. + +To see real numbers: filter by one exporter, one interface, in one direction. The dashboard makes this easy. See the [Anti-patterns page](/docs/network-flows/anti-patterns) for the full framing. + +### Conversations are mirrored + +A bidirectional conversation (host A talks to host B, B replies to A) produces at least two flow records — one for each direction. They're real, distinct flows. But on a Sankey diagram, country map, or sorted top-N table without direction filtering, you see both ends of every conversation. That's correct, but it can look like the same traffic appears twice. + +When you see "traffic from your country to a foreign country" *and* "traffic from that foreign country to your country" of similar volume, you're looking at one conversation, not two. + +## What ships with the plugin + +The Netdata netflow plugin decodes: + +- **NetFlow v5** (legacy, IPv4-only) +- **NetFlow v7** (rare, Cisco Catalyst 5000) +- **NetFlow v9** (the modern Cisco / Juniper / FortiGate / Arista format) +- **IPFIX** (RFC 7011, the IETF-standardised successor to NetFlow v9) +- **sFlow v5** (the packet-sampling protocol most switches use) + +A single UDP listener (default `0.0.0.0:2055`) accepts all five. The plugin auto-detects each datagram's protocol from its header. + +Each flow record is enriched at ingestion with: + +- **Country, state, city, coordinates, ASN, AS name** — from a stock GeoIP database (DB-IP-based; refreshable) +- **Exporter name and labels** — from your static-metadata configuration +- **Interface name, description, speed, provider, connectivity, boundary** — from your static-metadata configuration +- **Network labels** for your own CIDRs (name, role, site, region, tenant) +- **Classifier-derived attributes** for rule-based tagging (Akvorado-compatible expression language) +- **Live BGP attributes** (AS path, communities, next-hop) — from BMP, BioRIS, or static prefix configuration +- **Decapsulated inner-packet fields** for SRv6 / VXLAN traffic + +Flow records land in a four-tier journal: raw + 1-minute + 5-minute + 1-hour rollups, with independent retention per tier. The dashboard auto-picks the best tier for each query. + +## What sampling does to your numbers + +Many routers sample. They export one packet in N — typically 1-in-100 to 1-in-2000. Netdata multiplies bytes and packets by the sampling rate at ingestion, so the numbers you see are estimates of actual traffic. + +This works correctly **only if all your exporters use the same sampling rate**. With mixed rates, the multiplication is per-flow and the aggregate becomes a blend of estimates that's hard to interpret. The clean path: keep sampling rates uniform across your network, or run unsampled where the flow rate allows. + +Sampling at 1-in-1000 also misses small flows. A single-packet flow has a 99.9% chance of not being seen at all. If you need to detect small, rare events (security beaconing, scanning), use unsampled or 1-in-100 on critical exporters. + +## What the dashboard looks like + +Six visualisations, all driven by the same query engine: + +- **Sankey + Table** — the default. Top-N flows aggregated by 1-10 fields you pick. Best for "who's responsible". +- **Time-Series** — the same top-N over time. Best for "how does this change". +- **Country map / state map / city map** — geographic views. Best for "where". +- **Globe** — a 3D rendering of the city-level data. Visual demo, less useful for analysis. + +A filter ribbon between the visualisation and the table lets you narrow data by any combination of fields. Selections persist in the URL — copy and share to give a colleague exactly your view. + +Default settings on first open: last 15 minutes, top-25 flows by bytes, grouped as `Source ASN → Protocol → Destination ASN`. + +Default fields are tuned to surface meaningful traffic at a glance. From there, you adjust the time range, change the aggregation, add filters, and dig in. + +## Where to start + +Pick the page that matches your situation: + +- **You're setting up the plugin for the first time** — [Installation](/docs/network-flows/installation), then [Quick Start](/docs/network-flows/quick-start). +- **You have data, you want to find a bandwidth hog or trace an IP** — [Investigation Playbooks](/docs/network-flows/investigation-playbooks). +- **You want to make sure your data is trustworthy** — [Validation and Data Quality](/docs/network-flows/validation-and-data-quality). +- **You want to avoid the most common mistakes** — [Anti-patterns](/docs/network-flows/anti-patterns). +- **You want to understand a specific feature in depth** — see the section index below. + +## Section index + +**Setup and configuration** + +- [Installation](/docs/network-flows/installation) — Package names, install commands, file locations +- [Quick Start](/docs/network-flows/quick-start) — Configure your first router, see traffic in 15 minutes +- [Configuration](/docs/network-flows/configuration) — `netflow.yaml` reference + +**Sources** + +- [NetFlow](/docs/network-flows/sources/netflow) — v5, v7, v9 +- [IPFIX](/docs/network-flows/sources/ipfix) — IETF-standardised, biflow-capable +- [sFlow](/docs/network-flows/sources/sflow) — packet-sampling, fundamentally different + +**Enrichment** + +- [GeoIP](/docs/network-flows/enrichment-concepts/ip-intelligence) — Country, city, AS-name lookups +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Naming exporters, interfaces, your networks +- [Classifiers](/docs/network-flows/enrichment-concepts/classifiers) — Rule-based tagging +- [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution) — Where AS numbers and names come from +- [BMP routing](/docs/network-flows/enrichment-concepts/bgp-routing) — Live BGP feed for routing attributes +- [BioRIS](/docs/network-flows/enrichment-concepts/bgp-routing) — RIPE RIS via gRPC +- [Network sources](/docs/network-flows/enrichment-concepts/network-identity) — HTTP-fetched prefix metadata +- [Decapsulation](/docs/network-flows/enrichment-concepts/decapsulation) — SRv6 and VXLAN inner-packet extraction + +**Reference** + +- [Field reference](/docs/network-flows/field-reference) — All 91 fields and which protocols populate each +- [Retention and querying](/docs/network-flows/retention-and-querying) — The four-tier model and how queries pick a tier +- [Sizing and capacity planning](/docs/network-flows/sizing-and-capacity-planning) — Hardware, throughput, storage estimates + +**Visualisation** + +- [Sankey and Table](/docs/network-flows/visualization/sankey-and-table) — The default view +- [Time-Series](/docs/network-flows/visualization/time-series) — Top-N over time +- [Maps and Globe](/docs/network-flows/visualization/maps-and-globe) — Geographic views +- [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) — Narrowing the data +- [Plugin Health Charts](/docs/network-flows/visualization/plugin-health-charts) — Operational metrics for the plugin itself + +**Operations** + +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — Cross-checks and silent failures +- [Investigation Playbooks](/docs/network-flows/investigation-playbooks) — Recipes for common questions +- [Anti-patterns](/docs/network-flows/anti-patterns) — Common mistakes and how to avoid them +- [Troubleshooting](/docs/network-flows/troubleshooting) — When something doesn't work diff --git a/docs/Network Flows/Quick Start.mdx b/docs/Network Flows/Quick Start.mdx new file mode 100644 index 0000000000..8dc5128746 --- /dev/null +++ b/docs/Network Flows/Quick Start.mdx @@ -0,0 +1,181 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/quick-start.md" +sidebar_label: "Quick Start" +learn_status: "Published" +learn_rel_path: "Network Flows" +description: "Get network flow monitoring running in five minutes." +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/quick-start" +slug: "/network-flows/quick-start" +--- + + +# Quick Start + +Get flow monitoring running in 15 minutes. The path: install the plugin, configure your first router, open the dashboard, and read it correctly. + +## Before you start + +- The Netdata Agent is running on the host that will collect flow data. +- The [netflow plugin is installed](/docs/network-flows/installation) on that host. +- You can configure flow export on at least one router or switch. +- The router can reach the agent's IP on UDP port 2055. + +If the plugin isn't installed yet, follow the [Installation page](/docs/network-flows/installation) first. + +## Step 1 — Configure your router + +Pick the closest match to your platform. The configurations below set sensible defaults: 60-second active timeout (industry best practice), 60-second template refresh (so a collector restart recovers in under a minute), and monitoring on both directions of an interface. + +### Cisco IOS / IOS-XE (Flexible NetFlow, v9) + +``` +flow exporter NETDATA + destination 10.0.0.10 ! Netdata agent IP + source GigabitEthernet0/0/0 ! source interface + transport udp 2055 + export-protocol netflow-v9 + template data timeout 60 +! +flow record NETDATA-RECORD + match ipv4 source address + match ipv4 destination address + match transport source-port + match transport destination-port + match ipv4 protocol + match interface input + collect interface output + collect counter bytes + collect counter packets + collect timestamp sys-uptime first + collect timestamp sys-uptime last +! +flow monitor NETDATA-MONITOR + record NETDATA-RECORD + exporter NETDATA + cache timeout active 60 + cache timeout inactive 15 +! +interface GigabitEthernet0/0/1 + ip flow monitor NETDATA-MONITOR input + ip flow monitor NETDATA-MONITOR output +``` + +### Juniper JunOS (J-Flow v9) + +``` +set forwarding-options sampling instance NETDATA family inet output flow-server 10.0.0.10 port 2055 +set forwarding-options sampling instance NETDATA family inet output flow-server 10.0.0.10 version9 template ipv4-template +set services flow-monitoring version9 template ipv4-template flow-active-timeout 60 +set services flow-monitoring version9 template ipv4-template flow-inactive-timeout 15 +set services flow-monitoring version9 template ipv4-template template-refresh-rate seconds 60 +set interfaces ge-0/0/1 unit 0 family inet sampling input +set interfaces ge-0/0/1 unit 0 family inet sampling output +``` + +### Arista EOS (sFlow) + +``` +sflow run +sflow source-interface Loopback0 +sflow destination 10.0.0.10 2055 +sflow polling-interval 30 +sflow sample dangerous 2000 +! +interface Ethernet1 + sflow enable +``` + +EOS treats sample rates below 16 384 as "aggressive" — the `dangerous` keyword is required to opt in. For higher-rate interfaces, drop the `dangerous` keyword and use 16 384 or above. + +### Linux host (`softflowd`, NetFlow v9) + +For Linux servers, hypervisors, or any host that doesn't natively speak NetFlow: + +```bash +sudo softflowd -i eth0 -n 10.0.0.10:2055 -v 9 -t maxlife=60 -t expint=15 +``` + +For more vendors and details, see [Sources / NetFlow](/docs/network-flows/sources/netflow), [IPFIX](/docs/network-flows/sources/ipfix), and [sFlow](/docs/network-flows/sources/sflow). + +## Step 2 — Open the dashboard + +In your browser, open the Netdata UI and click the **Network Flows** tab. + +By default you'll see: + +- A Sankey diagram on top, with a sortable table beneath +- The default time range — last 15 minutes (Netdata's global picker) +- Top-25 flows by bytes +- Aggregated as **Source ASN → Protocol → Destination ASN** + +Within 60-90 seconds of the router being configured, flow records should start appearing. + +## Step 3 — Read the dashboard correctly + +Before drawing any conclusion, read this. It's the single biggest source of confusion when people first look at flow data. + +### Traffic looks doubled + +Routers normally export both ingress and egress flow records on every monitored interface. A packet that enters interface A and leaves interface B produces **two** records — one ingress on A, one egress on B. + +If you look at total bandwidth without filtering, you see roughly **2× the real traffic**. Add a second router on the same path and you see 4×. + +**To see real bandwidth on a specific link**, filter to one exporter and one direction: + +1. In the filter ribbon: `Exporter Name = `. +2. Add: `Input Interface Name = ` (for incoming) **or** `Output Interface Name = ` (for outgoing). Pick one. Not both. + +That's the actual traffic on that link in that direction. + +### Conversations look mirrored + +Each bidirectional conversation produces two flow records — one for the request direction, one for the response. The Sankey, country map, and time-series all show both. When you see traffic between Country X and Country Y *and* traffic between Country Y and Country X of similar volume, that's the same conversation, not two. + +This is correct behaviour. To see only one direction of a conversation, filter by `Source ASN` (your network) for outbound or `Destination ASN` for inbound. + +## Step 4 — Verify it's working + +If the Sankey is empty after 60-90 seconds, work through this: + +1. **Datagrams arriving at the host.** + + ```bash + sudo tcpdump -i any -nn -c 20 'udp port 2055' + ``` + + If you see packets, the network path is fine. If not, check the router's exporter status, the firewall, and the source IP the router uses. + +2. **Listener bound on the host.** + + ```bash + sudo ss -unlp | grep 2055 + ``` + + Should show `netflow-plugin` listening. If not, see [Troubleshooting](/docs/network-flows/troubleshooting). + +3. **Plugin actually decoding.** + + Open the standard Netdata charts page and find `netflow.input_packets`. If `udp_received` is rising but `parsed_packets` isn't, datagrams are arriving but failing to decode. Check `parse_errors` and `template_errors` to narrow down. See [Plugin Health Charts](/docs/network-flows/visualization/plugin-health-charts). + +4. **Plugin log lines.** + + ```bash + sudo journalctl -u netdata --since "5 minutes ago" | grep -i netflow + ``` + +## What's next + +You now have flow data flowing in. The natural next steps: + +- [Configuration](/docs/network-flows/configuration) — Tune retention so older data is preserved (the default 7-day shared retention is rarely enough). +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Give your routers and your internal networks friendly names and labels. Without this, dashboards show raw IPs. +- [Investigation Playbooks](/docs/network-flows/investigation-playbooks) — Concrete recipes for the questions flow data is good at answering. +- [Anti-patterns](/docs/network-flows/anti-patterns) — Mistakes to avoid as you develop confidence with the data. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to confirm your numbers are correct. + +For more sources or vendors: + +- [NetFlow](/docs/network-flows/sources/netflow) — More vendor configurations, sampling caveats. +- [IPFIX](/docs/network-flows/sources/ipfix) — When and why to prefer IPFIX over NetFlow v9. +- [sFlow](/docs/network-flows/sources/sflow) — Different protocol, different semantics. diff --git a/docs/Network Flows/Retention and Querying.mdx b/docs/Network Flows/Retention and Querying.mdx new file mode 100644 index 0000000000..71071da473 --- /dev/null +++ b/docs/Network Flows/Retention and Querying.mdx @@ -0,0 +1,156 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/retention-querying.md" +sidebar_label: "Retention and Querying" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "80" +learn_link: "https://learn.netdata.cloud/docs/network-flows/retention-and-querying" +slug: "/network-flows/retention-and-querying" +--- + + +# Retention and Querying + +Netdata stores flow data in four tiers. The tier model is transparent — you do not pick a tier when you query, the dashboard picks for you. Understanding how it picks helps you interpret what you're seeing and avoid surprises when older data isn't there. + +## The four tiers + +| Tier | Bucket | On-disk dir | YAML key | +|---|---|---|---| +| Raw | per-flow | `flows/raw/` | `raw` | +| 1-minute | 60 s | `flows/1m/` | `minute_1` | +| 5-minute | 300 s | `flows/5m/` | `minute_5` | +| 1-hour | 3600 s | `flows/1h/` | `hour_1` | + +The raw tier stores every flow record as it arrived. The other three are rollup tiers — they aggregate raw flows into time-bucketed groups by identity (exporter, ASN, country, ports — see below). + +## What survives the rollup + +Rollup tiers (1m, 5m, 1h) deliberately drop a few fields to keep cardinality manageable. **The dropped fields are: `SRC_ADDR`, `DST_ADDR`, `SRC_PORT`, `DST_PORT`, `SRC_GEO_CITY`, `DST_GEO_CITY`, `SRC_GEO_LATITUDE`, `DST_GEO_LATITUDE`, `SRC_GEO_LONGITUDE`, `DST_GEO_LONGITUDE`.** + +Everything else survives — country, state, ASN, AS path, BGP communities, exporter and interface labels, protocol, TCP flags, ToS/DSCP, ICMP type/code, MPLS labels, VLANs, MACs, next-hop, post-NAT addresses, and the bytes/packets sums. So rollups are perfectly fine for most country / ASN / interface / protocol questions, but useless if you need to ask "which IP". + +This is why filtering or grouping by IP/port/city/lat/lon forces the query to the raw tier — there is no other tier that has those fields. + +## How the dashboard picks a tier + +For every query the dashboard sends to the plugin, the planner makes a single decision: which tier (or tiers) can satisfy this? + +**Rules:** + +1. **Any IP/port/city/lat/lon filter or group-by → raw tier.** No exception. The rollup tiers don't have those fields. +2. **A non-empty full-text search → raw tier.** Full-text search runs as a regex against the raw journal payload, which only the raw tier carries. +3. **Otherwise, pick the coarsest tier that satisfies the time range and bucket-count requirement.** + - Time-Series view needs at least 100 buckets in the window. So: + - under 100 minutes → 1-minute tier + - 100 minutes to 8h20m → 5-minute tier + - 8h20m and longer → 1-hour tier + - Table / Sankey / Maps don't have a bucket-count constraint, but the configured query-window guardrails (`query_1m_max_window` default 6h, `query_5m_max_window` default 24h) skip a tier when the window is too wide. + +When the planner picks a tier and the time range crosses tier-aligned boundaries, the query is **stitched** — head fragment in a finer tier, aligned middle in the chosen tier, tail fragment in a finer tier. You don't see this; the results merge cleanly. It exists so wide windows that don't quite align to one-hour boundaries still work. + +The plugin reports the chosen tier in the response stats (`query_tier` = `0`, `1`, `5`, or `60`). The dashboard uses this for diagnostic banners. + +## What "no data" actually means + +If you ask for a 30-day window with an IP filter and tier-0 retention is 24 hours, you get an empty response. No error, no banner reading "data has expired" — just an empty result set. The dashboard renders this as "No data". + +The reason is a layered fallback in the planner: if a span asks for tier 0 and the files for that span have been rotated out, the planner tries the smaller tiers (1m, 5m, 1h), but those don't have IP fields, so they cannot satisfy a query that filters on IP. Result: the span returns no flows. + +Other spans within the same query that don't need raw data may still return flows. So it's also possible to see partial coverage — half the time range filled, half empty. + +For Time-Series, "no data" appears as zero values in the affected buckets, not as a special "missing" indicator. The chart still draws; the empty regions are flat lines at zero. + +## What forces tier 0 in practice + +Quick reference for "why is my query slow / showing less time?": + +- Adding `SRC_ADDR`, `DST_ADDR`, `SRC_PORT`, or `DST_PORT` as a filter +- Adding any of those fields to the group-by +- Switching to the city map (it uses `SRC_GEO_CITY`/`DST_GEO_CITY` plus latitudes/longitudes) +- Typing anything into the global search ribbon + +If you see the time depth in your dashboard suddenly shrink after you applied a filter, you've hit the raw-tier limit. + +## Default retention and the most common misconfiguration + +The default `size_of_journal_files: 10GB` and `duration_of_journal_files: 7d` apply to **every tier independently**. With defaults, all four tiers (raw, 1m, 5m, 1h) are capped at 10GB / 7d. + +This is rarely what you want. The whole point of having rollup tiers is to keep them around longer than raw. A more useful production profile: + +```yaml +journal: + size_of_journal_files: 100GB # top-level inherited by tiers without an override + duration_of_journal_files: 7d + tiers: + raw: + size_of_journal_files: 200GB + duration_of_journal_files: 24h + minute_1: + duration_of_journal_files: 14d + minute_5: + duration_of_journal_files: 30d + hour_1: + duration_of_journal_files: 365d + size_of_journal_files: null # time-only, no size cap on the long tail +``` + +This gives you 24 hours of full-detail forensics, 14 days of 1-minute trends, 30 days of 5-minute snapshots, and a year of hourly aggregates. + +See [Sizing and Capacity Planning](/docs/network-flows/sizing-and-capacity-planning) for how to estimate the actual disk footprint per tier from your flow rate. + +## How queries work, briefly + +The dashboard sends one of two query modes to the plugin: + +- **`flows`** — the normal aggregation request. Returns top-N groups, sums of bytes and packets, optional facet counts. +- **`autocomplete`** — for the filter ribbon. Returns up to 100 facet values matching the user's term. Matching policy is per-field: text fields use substring matching, IP and numeric fields use prefix. Term is capped at 256 bytes. Runs against in-memory facet snapshots and on-disk FST sidecars; never scans tier files. Resulting filters apply as exact equality, not substring. + +A `flows` query carries: + +- A time range (`after` / `before`, or `last`). +- A list of `group_by` fields (up to 10). +- A list of `selections` — per-field IN-lists for filtering. +- Optional `facets` to enrich the response with per-facet value counts. +- A `top_n` (one of 25, 50, 100, 200, 500). +- A `sort_by` (`bytes` or `packets`). +- An optional regex `query` (full-text search; forces tier 0). +- A `view` (`table-sankey`, `timeseries`, `country-map`, `state-map`, `city-map`). + +Defaults if you don't specify: time range = last 15 minutes, `group_by = ["SRC_AS_NAME", "PROTOCOL", "DST_AS_NAME"]`, `top_n = 25`, `sort_by = bytes`, `view = table-sankey`. + +The plugin enforces a hard timeout of **30 seconds** per query. If your query is too wide, narrow the time range, add a filter that lets a higher tier serve it, or reduce the group-by depth. + +## Group-by limits and overflow + +Two configuration limits guard against pathological queries: + +- `query_max_groups` (default `50000`) — total distinct groups in an aggregation. Past this, results overflow into a single `__overflow__` bucket and the response carries a warning. +- `query_facet_max_values_per_field` (default `5000`) — distinct values returned per facet field. + +If you see `__overflow__` rows, your query is too wide for the current limit. Either narrow the filter, drop a high-cardinality `group_by` field, or raise the limit (carefully — the limit exists for memory reasons). + +## Full-text search + +The global search ribbon supports full-text search. It runs as a **regex** match against the raw journal payload. A search of `8.8.8.8` is the regex `8.8.8.8`, where each `.` matches any byte — so it can match unrelated text. To match the literal string, escape with backslashes: `8\.8\.8\.8`. + +Any non-empty full-text search forces the query to tier 0. Time depth is therefore limited by raw-tier retention. + +## URL sharing + +The dashboard URL preserves all of: time range, view, top-N, sort, group-by, selections, full-text search. Copy the URL and share it — the recipient sees exactly what you see, provided they have access to the same Netdata Cloud space. + +## Things that surprise people + +- **An IP filter shrinks the time depth.** This is correct behaviour, but the dashboard doesn't always make it obvious. If your time range is wider than tier-0 retention, drop the IP filter to see the broader rollup data. +- **The city map can't go back as far as the country map.** The city map needs the city/lat/lon fields (raw-only); the country map only needs `SRC_COUNTRY`/`DST_COUNTRY` (preserved in rollups). +- **`__overflow__` is a real value.** It will show up in result tables, sankey diagrams, and group-by listings. It means "everything that didn't fit in the top groups for this query" — narrow the filter or raise the limit. +- **30-second timeout is hard.** A query that runs to the timeout returns whatever it has so far with a warning. Don't expect more than 30s of work per query. +- **Tier files use short names** (`1m`, `5m`, `1h` on disk) but YAML uses the explicit names (`minute_1`, `minute_5`, `hour_1`). Mind the difference. + +## What's next + +- [Configuration](/docs/network-flows/configuration) — `netflow.yaml` reference, including per-tier retention overrides. +- [Sizing and Capacity Planning](/docs/network-flows/sizing-and-capacity-planning) — Disk and CPU estimates from your flow rate. +- [Field Reference](/docs/network-flows/field-reference) — Which fields exist and which survive into rollups. +- [Visualisation](/docs/network-flows/visualization/sankey-and-table) — How the dashboard uses the tier model to render views. diff --git a/docs/Network Flows/Sizing and Capacity Planning.mdx b/docs/Network Flows/Sizing and Capacity Planning.mdx new file mode 100644 index 0000000000..c7cdbe8d6b --- /dev/null +++ b/docs/Network Flows/Sizing and Capacity Planning.mdx @@ -0,0 +1,141 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/sizing-capacity.md" +sidebar_label: "Sizing and Capacity Planning" +learn_status: "Published" +learn_rel_path: "Network Flows" +description: "Storage estimation, memory guidance, and performance benchmarks." +sidebar_position: "90" +learn_link: "https://learn.netdata.cloud/docs/network-flows/sizing-and-capacity-planning" +slug: "/network-flows/sizing-and-capacity-planning" +--- + + +# Sizing and Capacity Planning + +Use the following benchmarks and formulas to estimate the CPU, memory, and storage requirements for your Network Flows deployment. + +## What was measured + +These numbers come from a release-mode benchmark on an Intel i9-12900K workstation with a Seagate FireCuda 530 NVMe SSD (ext4). The benchmark runs the full ingest pipeline — raw journal plus the 1-minute, 5-minute, and 1-hour tiers — writing to real disk-backed journals. **Enrichment is not loaded** (no GeoIP/MMDB, no static metadata, no classifiers); enrichment adds CPU on top of the figures below. + +Cardinality is synthetic: low-cardinality cycles 256 unique flow records, high-cardinality cycles 4 096 unique records. Real exporter traffic falls between the two. + +CPU is reported as percent of one core (100% = one core fully consumed). The post-decode ingest path is currently single-threaded, so the practical ceiling per agent is bounded by one core's worth of CPU. + +## Practical headline + +On this hardware class, plan for: + +| Scenario | Practical ceiling | +|---|---| +| High-cardinality, all three protocols, four storage tiers, no enrichment, including UDP receive and decode | **~20-25 000 flows/s** | +| Low-cardinality | comfortably above 60 000 flows/s | + +The 20-25k figure is conservative and includes decode cost (~10 µs/flow on top of the post-decode numbers below). + +## Detailed measurements (post-decode, paced) + +These tables show the cost of pre-decoded flows traversing the full ingest pipeline. To get the full UDP-to-disk cost, add roughly 10 µs/flow for protocol decoding. + +### Low cardinality + +| offered flows/s | NetFlow v9 CPU | IPFIX CPU | sFlow CPU | RAM peak | +|---:|---:|---:|---:|---| +| 1 000 | 1.3% | 1.0% | 1.5% | ~25 MiB | +| 10 000 | 12.6% | 11.5% | 16.9% | ~75 MiB | +| 30 000 | 35.7% | 32.9% | 46.2% | ~85 MiB | +| 60 000 | 70.3% | 64.1% | 87.1% | ~85 MiB | + +All three protocols deliver 100% of offered rate at every tested point. Saturation is above 60 000 flows/s and was not reached in this matrix. + +### High cardinality + +| offered flows/s | NetFlow v9 achieved | IPFIX achieved | sFlow achieved | CPU at saturation | +|---:|---:|---:|---:|---| +| 10 000 | 10 000 | 9 970 | 9 990 | 28-37% | +| 20 000 | 20 000 | 19 970 | 19 980 | 56-74% | +| 30 000 | 29 331 | 29 985 | 29 257 | 84-98% | +| 40 000 | 29 087 | 35 771 | 30 543 | 99% (plateau) | +| 60 000 | 26 475 | 28 835 | 30 227 | 99% (plateau) | + +Saturation is around 30 000 flows/s on this host. Beyond the knee, the achieved rate plateaus at roughly the saturation value while the offered rate grows. + +:::warning +These are host-specific reference points. Actual throughput depends on your CPU clock, disk speed, real flow cardinality, the number of populated fields, and any enrichment you enable (GeoIP, classifiers, static networks, ASN providers, BMP routing). +::: + +## Storage + +Storage is governed by two things, not by the flow rate alone: + +- **Retention policy per tier** — caps how long each tier is kept and how much disk it can use. +- **Cardinality and dedup** — flow records are indexed and key-value pairs are deduplicated. Low-cardinality traffic stores fewer bytes per flow than high-cardinality traffic, because repeated values share dictionary entries. + +Because the journals are not append-only logs, `flow_rate × bytes_per_flow × time` is not a valid estimator. + +### Empirical measurement on this hardware class + +A 15-minute run of paced ingest at 10 000 flows/s with the full pipeline active (raw + 1m + 5m + 1h tiers, real disk-backed journals) produced: + +| | Low cardinality (256 unique records) | High cardinality (4 096 unique records) | +|---|---:|---:| +| Flows ingested | 9.00 million | 8.97 million | +| On-disk total | 6.46 GiB | 7.29 GiB | +| Bytes per stored flow | **771** | **872** | +| Write amplification (real I/O / logical encoded) | 1.79× | 2.00× | +| Raw tier (final) | 6.45 GiB | 7.13 GiB | +| 1-minute tier | 8 MiB | 112 MiB | +| 5-minute tier | 8 MiB | 40 MiB | +| 1-hour tier | 0 (rollup not reached in 15 min) | 16 MiB | + +Two key observations: + +- **Dedup is effective.** High cardinality stores only 13% more per flow despite 16× more unique field combinations. Real exporter traffic, which has heavy repetition (same src/dst/protocol patterns), will compress closer to the low-cardinality figure. +- **Raw is 99% of the on-disk cost** at 15 minutes. The rollup tiers are small in absolute size because each rollup row aggregates many raw flows. + +### Bounding storage for capacity planning + +Set retention limits explicitly and let them bound the disk footprint: + +- raw: typically 24 hours +- 1-minute tier: 14 days +- 5-minute tier: 30 days +- 1-hour tier: 365 days + +Configure per-tier `size_of_journal_files` (hard cap) and `duration_of_journal_files` (time cap). The plugin enforces whichever limit is hit first. + +For your own measurement, run the plugin against representative traffic for at least 15 minutes and inspect `du -sh` on each tier directory. The `bench_storage_footprint_child` test in this repository ships the same measurement harness used to produce the table above. + +## Memory + +Memory consumption is dominated by: + +- **Active journal rows** — flow records currently being accumulated before they are flushed to disk +- **Field indexes** — structures that map field values (IPs, ASNs, ports) for fast filtering +- **Facet indexes** — structures that power the filter sidebar +- **GeoIP MMDB** — the IP-intelligence database (DB-IP-based by default) loaded into memory for enrichment + +The plugin exposes memory charts you can monitor: + +- `netflow.memory_resident_bytes` — total memory in use +- `netflow.memory_allocator_bytes` — memory from the system allocator +- `netflow.memory_accounted_bytes` — memory broken down by component (indexes, GeoIP, facets) +- `netflow.memory_tier_index_bytes` — memory used by tiered storage indexes +- `netflow.decoder_scopes` — protocol decoder memory usage + +## Disk I/O + +The plugin writes flow records to journal files continuously. Writes are dominated by the raw tier; the rollup tiers add a small amount on top. SSDs are recommended for collectors that handle thousands of flows per second — the index updates and frequent fsync calls benefit substantially from low-latency storage. + +Read operations only happen during queries. There is no background read activity in steady state. + +:::tip +For production deployments, monitor the `netflow.memory_resident_bytes` chart and set a threshold alert. If resident memory grows steadily without stabilising, check your cardinality and consider reducing retention or increasing the sync interval. +::: + +## What's next + +- [Configuration](/docs/network-flows/configuration) — Per-tier retention configuration and tuning knobs. +- [Retention and Querying](/docs/network-flows/retention-and-querying) — How tiers are picked at query time. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to confirm the numbers in your environment. +- [Plugin Health Charts](/docs/network-flows/visualization/plugin-health-charts) — Monitoring the plugin itself. diff --git a/docs/Network Flows/Sources/IPFIX.mdx b/docs/Network Flows/Sources/IPFIX.mdx new file mode 100644 index 0000000000..34a8571846 --- /dev/null +++ b/docs/Network Flows/Sources/IPFIX.mdx @@ -0,0 +1,144 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "IPFIX" +learn_status: "Published" +learn_rel_path: "Network Flows/Sources" +keywords: [ipfix, netflow v10, flows, network flows, flow collector, rfc 7011] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/sources/ipfix" +slug: "/network-flows/sources/ipfix" +--- + + +# IPFIX + + + + + +Plugin: netflow-plugin +Module: ipfix + + + +## Overview + +Collects IPFIX (NetFlow v10) records from one or more exporters and stores them in tiered +journal files. IPFIX extends NetFlow v9 with variable-length fields, vendor-specific +information elements, and template withdrawal. Each record exposes the same core fields +as NetFlow plus any additional IEs the exporter provides. + +For full documentation including vendor configuration examples (Cisco, Juniper, Arista, +ASA NSEL), biflow handling, sampling caveats, and verification steps, see +[IPFIX](https://learn.netdata.cloud/docs/network-flows/sources/ipfix) and the +[Network Flows Overview](https://learn.netdata.cloud/docs/network-flows/). + + +The plugin listens on the same UDP socket as NetFlow. IPFIX messages are identified by +version number 10 and decoded using cached templates. Decoded records are enriched and +appended to disk-backed journal tiers. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +The plugin starts when enabled in netflow.yaml and listens on the configured UDP port. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### IPFIX-capable exporter + +A router, switch, or firewall configured to export IPFIX datagrams to the +Netdata agent's UDP listener. + + + +### Configuration + +#### Options + +IPFIX shares the same `netflow.yaml` configuration file as NetFlow and sFlow. +Enable IPFIX via the `protocols.ipfix` option. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| listener.listen | UDP endpoint for IPFIX datagrams. | 0.0.0.0:2055 | no | +| protocols.ipfix | Enable IPFIX decoding. | yes | no | +| journal.journal_dir | Directory for journal files (relative to NETDATA_CACHE_DIR). | flows | no | +| journal.size_of_journal_files | Maximum total size of all journal files. | 10GB | no | +| journal.duration_of_journal_files | Maximum age of journal files. | 7d | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### IPFIX collection + +Listen for IPFIX records on the standard port. + +```yaml +enabled: true +listener: + listen: "0.0.0.0:2055" +protocols: + v5: false + v7: false + v9: false + ipfix: true + sflow: false + +``` + + +### Verifying flow data is arriving and diagnosing failures + +See [Troubleshooting](https://learn.netdata.cloud/docs/network-flows/troubleshooting) for +the full diagnostic recipe. For IPFIX specifically, watch the `template_errors` dimension +on `netflow.input_packets` -- IPFIX is template-driven and data records arriving before +their templates are dropped. See also +[Validation and Data Quality](https://learn.netdata.cloud/docs/network-flows/validation). + + + diff --git a/docs/Network Flows/Sources/NetFlow.mdx b/docs/Network Flows/Sources/NetFlow.mdx new file mode 100644 index 0000000000..7d5310ccb7 --- /dev/null +++ b/docs/Network Flows/Sources/NetFlow.mdx @@ -0,0 +1,165 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "NetFlow" +learn_status: "Published" +learn_rel_path: "Network Flows/Sources" +keywords: [netflow, netflow v5, netflow v7, netflow v9, cisco, flows, network flows, flow collector] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/sources/netflow" +slug: "/network-flows/sources/netflow" +--- + + +# NetFlow + + + + + +Plugin: netflow-plugin +Module: netflow + + + +## Overview + +Collects NetFlow v5, v7, and v9 records from one or more exporters (routers, switches, firewalls) +and stores them in tiered journal files. Each record exposes source and destination IP, ports, +protocol, bytes, packets, ToS, TCP flags, and ingress/egress interface indices. +Enrichment adds GeoIP country/city/ASN, static metadata, and classifier tags. + +For full documentation including vendor configuration examples, sampling caveats, template +handling and verification steps, see [NetFlow](https://learn.netdata.cloud/docs/network-flows/sources/netflow) +and the [Network Flows Overview](https://learn.netdata.cloud/docs/network-flows/). + + +The plugin listens on a configurable UDP socket for NetFlow datagrams. +NetFlow v5 and v7 records are decoded directly. NetFlow v9 records are decoded using +dynamic templates cached from the exporter. Decoded records are enriched in-memory +and appended to disk-backed journal tiers (raw, 1-minute, 5-minute, 1-hour rollups). + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +The plugin starts when enabled in netflow.yaml and listens on the configured UDP port. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### NetFlow-capable exporter + +A router, switch, or firewall configured to export NetFlow v5, v7, or v9 datagrams to the +Netdata agent's UDP listener. + + + +### Configuration + +#### Options + +The plugin is configured via `netflow.yaml` in the Netdata configuration directory. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| listener.listen | UDP endpoint for NetFlow datagrams. | 0.0.0.0:2055 | no | +| protocols.v5 | Enable NetFlow v5 decoding. | yes | no | +| protocols.v7 | Enable NetFlow v7 decoding. | yes | no | +| protocols.v9 | Enable NetFlow v9 decoding. | yes | no | +| journal.journal_dir | Directory for journal files (relative to NETDATA_CACHE_DIR). | flows | no | +| journal.size_of_journal_files | Maximum total size of all journal files. | 10GB | no | +| journal.duration_of_journal_files | Maximum age of journal files. | 7d | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### Basic NetFlow v5/v9 collection + +Listen on the standard NetFlow port for v5 and v9 records. + +```yaml +enabled: true +listener: + listen: "0.0.0.0:2055" +protocols: + v5: true + v9: true + +``` +###### NetFlow v9 only with extended retention + +Accept only v9 records and keep 30 days of journal data. + +
+Config + +```yaml +enabled: true +listener: + listen: "0.0.0.0:2055" +protocols: + v5: false + v7: false + v9: true +journal: + journal_dir: flows + size_of_journal_files: 50GB + duration_of_journal_files: 30d + +``` +
+ + + +### Verifying flow data is arriving and diagnosing failures + +See [Troubleshooting](https://learn.netdata.cloud/docs/network-flows/troubleshooting) for +the full diagnostic recipe -- including UDP path checks, template-error analysis, +and the "looks like a bug but isn't" section (doubling, mirroring, internal-IP geolocation). +See also [Validation and Data Quality](https://learn.netdata.cloud/docs/network-flows/validation) +and [Anti-patterns](https://learn.netdata.cloud/docs/network-flows/anti-patterns). + + + diff --git a/docs/Network Flows/Sources/Sources.mdx b/docs/Network Flows/Sources/Sources.mdx new file mode 100644 index 0000000000..637a994543 --- /dev/null +++ b/docs/Network Flows/Sources/Sources.mdx @@ -0,0 +1,29 @@ +--- +sidebar_position: "170" +sidebar_label: "Sources" + +hide_table_of_contents: true +learn_status: "AUTOGENERATED" +slug: "/network-flows/sources" +learn_link: "https://learn.netdata.cloud/docs/network-flows/sources" +--- + +# Sources + +import { Grid, Box } from '@site/src/components/Grid_integrations'; + + + + + + + + + + + + + + + + diff --git a/docs/Network Flows/Sources/sFlow.mdx b/docs/Network Flows/Sources/sFlow.mdx new file mode 100644 index 0000000000..b52ca0170f --- /dev/null +++ b/docs/Network Flows/Sources/sFlow.mdx @@ -0,0 +1,147 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml" +sidebar_label: "sFlow" +learn_status: "Published" +learn_rel_path: "Network Flows/Sources" +keywords: [sflow, sflow v5, sampled flows, flows, network flows, flow collector, inmon] +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml FILE" +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/sources/sflow" +slug: "/network-flows/sources/sflow" +--- + + +# sFlow + + + + + +Plugin: netflow-plugin +Module: sflow + + + +## Overview + +Collects sFlow v5 datagrams from one or more agents and stores them in tiered journal files. +sFlow provides statistically sampled packet headers, interface counters, or extended +gateway data. Each flow record exposes source and destination IP, ports, protocol, bytes, +packets, and sampling rate information. + +For full documentation including how sFlow differs fundamentally from NetFlow (packet +sampling vs aggregated flows), vendor configuration examples (Arista, Juniper, Aruba CX, +Ruckus, hsflowd), and the limits of sampled data, see +[sFlow](https://learn.netdata.cloud/docs/network-flows/sources/sflow) and the +[Network Flows Overview](https://learn.netdata.cloud/docs/network-flows/). + + +The plugin listens on the same UDP socket as NetFlow. sFlow datagrams are identified by +their distinct header format and decoded per the sFlow v5 specification. Decoded records +are enriched and appended to disk-backed journal tiers. + + +This integration is only supported on the following platforms: + +- Linux + +This integration runs as a single instance per Netdata Agent. + + +### Default Behavior + +#### Auto-Detection + +The plugin starts when enabled in netflow.yaml and listens on the configured UDP port. + +#### Limits + +The default configuration for this integration does not impose any limits. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + +## Setup + + +### Prerequisites + +#### sFlow-capable agent + +A switch, router, or host-based sFlow agent (such as Host sFlow) configured to send +sFlow v5 datagrams to the Netdata agent's UDP listener. + + + +### Configuration + +#### Options + +sFlow shares the same `netflow.yaml` configuration file as NetFlow and IPFIX. +Enable sFlow via the `protocols.sflow` option. + + +
+Config options + + + +| Option | Description | Default | Required | +|:-----|:------------|:--------|:---------:| +| listener.listen | UDP endpoint for sFlow datagrams. | 0.0.0.0:2055 | no | +| protocols.sflow | Enable sFlow decoding. | yes | no | +| journal.journal_dir | Directory for journal files (relative to NETDATA_CACHE_DIR). | flows | no | +| journal.size_of_journal_files | Maximum total size of all journal files. | 10GB | no | +| journal.duration_of_journal_files | Maximum age of journal files. | 7d | no | + + +
+ + + +#### via File + +The configuration file name for this integration is `netflow.yaml`. + + +You can edit the configuration file using the [`edit-config`](/docs/netdata-agent/configuration#edit-configuration-files) script from the +Netdata [config directory](/docs/netdata-agent/configuration#locate-your-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netflow.yaml +``` + +##### Examples + +###### sFlow collection + +Listen for sFlow v5 datagrams on the standard port. + +```yaml +enabled: true +listener: + listen: "0.0.0.0:2055" +protocols: + v5: false + v7: false + v9: false + ipfix: false + sflow: true + +``` + + +### Verifying sFlow is arriving and diagnosing failures + +See [Troubleshooting](https://learn.netdata.cloud/docs/network-flows/troubleshooting) for +the full diagnostic recipe. sFlow-specific gotchas: counter samples are not surfaced +(only flow samples), bytes/packets are statistical estimates that won't match SNMP +byte-for-byte, and VLAN information comes from `ExtendedSwitch` records only -- not +from 802.1Q tags inside the sampled header. See also +[Validation and Data Quality](https://learn.netdata.cloud/docs/network-flows/validation) +and the sFlow section of [Anti-patterns](https://learn.netdata.cloud/docs/network-flows/anti-patterns). + + + diff --git a/docs/Network Flows/Troubleshooting.mdx b/docs/Network Flows/Troubleshooting.mdx new file mode 100644 index 0000000000..1df35a90bd --- /dev/null +++ b/docs/Network Flows/Troubleshooting.mdx @@ -0,0 +1,246 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/troubleshooting.md" +sidebar_label: "Troubleshooting" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "130" +learn_link: "https://learn.netdata.cloud/docs/network-flows/troubleshooting" +slug: "/network-flows/troubleshooting" +--- + + +# Troubleshooting + +Concrete recipes for the most common failures, organised by symptom. Most issues are diagnosable from the [plugin health charts](/docs/network-flows/visualization/plugin-health-charts), the Netdata journal logs, and a couple of OS-level commands. + +## The plugin doesn't start + +The plugin won't come up at all, or starts and immediately exits. + +**Symptoms:** +- Netdata reports `netflow-plugin` as not running, or restart-looping. +- Nothing in the Network Flows tab. +- An error in `journalctl -u netdata`. + +**Likely causes:** + +| Cause | What to check | +|---|---| +| YAML typo or unknown key | `journalctl -u netdata --since "5 minutes ago" \| grep -E 'failed to load configuration\|netflow'`. The plugin uses strict YAML — any unknown key fails parsing. | +| Required GeoIP DB missing (`optional: false`) | Same log search. Look for `failed to load database`. Either fix the path or set `optional: true`. | +| Listen address conflict | Look for `failed to bind`. Another process is on the configured port (default 2055). | +| Validation error | Look for `must be greater than 0` and similar. The plugin validates the full config at startup. | +| `enabled: false` was set | Look for `netflow plugin disabled by config`. The plugin honours this and shuts down cleanly — looks like "not running" if you don't read the log. | + +**Recovery:** + +```bash +# Read the failure +sudo journalctl -u netdata --since "5 minutes ago" | grep -E 'netflow|failed to|error' + +# Validate the YAML (use an online linter or `yamllint`) +yamllint /etc/netdata/netflow.yaml + +# After fixing, restart +sudo systemctl restart netdata +``` + +## The plugin starts, but no flows appear + +The plugin is running, but the Network Flows tab is empty. + +**First check:** is anything reaching the plugin? + +```bash +sudo tcpdump -i any -nn -c 50 'udp port 2055' +``` + +- **No packets in 30 seconds** — exporter not sending, or firewall blocking. Check the exporter's status (`show flow exporter` on Cisco, equivalents elsewhere) and the network path. The plugin can't help here; the data isn't reaching it. +- **Packets arriving** — keep going. + +**Second check:** is the listener bound? + +```bash +sudo ss -unlp | grep -E ':2055|netflow' +``` + +If nothing matches, the plugin isn't listening. See "doesn't start" above. + +**Third check:** what do the plugin's own counters say? + +Open `netflow.input_packets` on the standard Netdata charts page. The dimensions tell the story: + +- `udp_received > 0`, `parsed_packets == 0` — datagrams arriving, none decoding successfully. Wrong protocol on the listener, or all datagrams malformed. +- `udp_received > 0`, `parsed_packets > 0`, but no per-protocol counter (`netflow_v9`, `ipfix`, etc.) is moving — the protocol you're sending may be disabled in the plugin config. Check `protocols.v9`, `protocols.ipfix`, etc. in `netflow.yaml`. +- `parse_errors` rising in lockstep with `udp_received` — datagrams aren't valid for the protocols the plugin supports. Capture a sample (`tcpdump -w sample.pcap`) and inspect with Wireshark. + +## Partial data — some flows are dropped + +Counters show received traffic but you suspect data loss. + +**Template errors (NetFlow v9, IPFIX):** + +```bash +# Watch the template_errors dimension +# In the dashboard: netflow.input_packets > template_errors +``` + +If it's climbing, the exporter is sending data records before their templates. Either: + +- The exporter restarted and the plugin's template cache is stale. Wait for the exporter to send the next template (typically every 30-60 seconds, depending on its config), or restart the exporter to force an immediate template refresh. +- Templates are sent rarely (Cisco's default template refresh is 30 minutes). After a plugin restart, you'll see template errors for that long. **Fix on the router side**: lower the template refresh interval to 60 seconds. +- The exporter is using template IDs that collide with another exporter's templates. Most common cause: two exporters NATted behind the same public IP. Place the plugin inside the NAT boundary or give each exporter a distinct address. + +**UDP kernel drops:** + +The plugin doesn't count these. Check at the OS level: + +```bash +sudo ss -uam sport = :2055 # check 'd' columns for drops +cat /proc/net/udp | head -20 # RcvbufErrors column +``` + +If drops are occurring, the kernel UDP receive buffer is too small for the burst rate. Tune: + +```bash +sudo sysctl -w net.core.rmem_max=33554432 +sudo sysctl -w net.core.rmem_default=8388608 +sudo sysctl -w net.core.netdev_max_backlog=250000 +``` + +Persist in `/etc/sysctl.d/99-netflow.conf`. + +**Per-protocol switch off:** + +```yaml +protocols: + v5: false # are you accidentally rejecting v5 datagrams? +``` + +## Data is wrong — numbers don't match expectations + +**Volume looks doubled:** + +This is the most common report. With one router, traffic appears 2× because every packet generates an ingress record AND an egress record. With two routers on the same path, 4×. Filter to one exporter + one direction (input interface OR output interface) to see real volume. See [Anti-patterns](/docs/network-flows/anti-patterns). + +**Bandwidth doesn't match SNMP:** + +Several legitimate causes: + +- **Doubling**, as above. Filter properly before comparing. +- **Sampling rate not honoured.** The plugin auto-multiplies bytes by the sampling rate, but if the exporter doesn't carry the rate (NetFlow v7 has no field for it; v5 sometimes sends 0 instead of the actual rate; v9 may not send the Sampling Options Template), the result is undercounted. +- **Mixed sampling rates across exporters.** If your dashboard aggregates exporters with different rates, the result blends estimates and isn't comparable to any single SNMP measurement. +- **SNMP includes layer-2 traffic** (ARP, STP, LLDP, routing protocols) that flow data filters out. Expect SNMP to be 5-15% higher than flow on a healthy collector. More than that, investigate. + +See [Validation and Data Quality](/docs/network-flows/validation-and-data-quality). + +**Internal IPs in random countries:** + +GeoIP databases don't have entries for RFC 1918 / private space. The plugin doesn't skip private IPs — it just hands the IP to the database and uses what comes back. For the stock DB-IP build, private ranges are tagged so they render as "AS0 Private IP Address Space" with empty country. For other MMDBs, private ranges may resolve to weird countries. + +**Fix:** declare your internal CIDRs under `enrichment.networks` with country / role / name labels. See [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata). + +**AS resolution chain misbehaving:** + +If `SRC_AS` / `DST_AS` are zero everywhere despite the exporter sending them, check the `asn_providers` chain: + +- `[geoip, ...]` — `geoip` is a terminal short-circuit. The chain stops at `geoip` (it returns 0). Reorder: `[flow, routing, geoip]`. +- `[]` (empty) — no validation rejects this. Every AS is forced to 0. + +See [ASN resolution](/docs/network-flows/enrichment-concepts/asn-resolution). + +**Decapsulation eating non-tunnel traffic:** + +If you've enabled `decapsulation_mode: vxlan` and traffic that isn't VXLAN suddenly disappears from the L2-section path, that's by design — the decap is destructive on non-matching traffic. Standard NetFlow / IPFIX records (no IE 104 / IE 315) are unaffected. + +## Performance issues + +**High CPU:** + +```bash +top -p $(pgrep -f netflow-plugin) +``` + +If `netflow-plugin` is using a lot of CPU: + +- Check `netflow.input_packets` — high `udp_received` rate? You're at the limit of what one core can do for the post-decode hot path. Each instance is single-process; you can't scale horizontally on one host. +- If `udp_received` is moderate but CPU is high, classifier rules with complex regex might be the cause. Check `enrichment.classifier_cache_duration` — if too short, classifiers re-evaluate too often. +- Investigate with `perf top` or similar to find the hot function. + +See [Sizing and Capacity Planning](/docs/network-flows/sizing-and-capacity-planning) for measured throughput limits on this hardware class. + +**Memory growth:** + +```bash +# Watch the resident memory chart over time +# netflow.memory_resident_bytes - rss dimension +``` + +- If `rss` climbs and `netflow.memory_accounted_bytes` shows `unaccounted` growing, that's an unattributed allocation — could be allocator fragmentation, possibly a leak. +- If `tier_indexes` or `open_tiers` is the climbing dimension, ingest is outpacing tier flushes. Check `netflow.materialized_tier_ops` for `flushes` rate and `*_errors`. +- If `netflow.decoder_scopes` is growing without bound, your exporter is rotating template IDs. Investigate per-router behaviour. + +**Disk fill:** + +```bash +sudo du -sh /var/cache/netdata/flows/* +``` + +Default retention is `10GB / 7d` per tier — the same budget applies to all four tiers, so total can reach roughly 40 GB plus some. If your config left this default and your collector is busy, expect to hit it. See [Configuration](/docs/network-flows/configuration) for per-tier overrides — most production deployments need them. + +## Things that look like bugs but aren't + +- **Traffic appears 2×.** Standard ingress + egress monitoring. Filter to one direction. +- **Bidirectional conversations show twice.** A→B and B→A are real, distinct flows. Filter to one direction or one ASN to see one side. +- **Internal IPs in odd countries.** GeoIP doesn't know about your private space. Declare it explicitly. +- **City map empty over long windows.** City + lat/lon are tier-0-only. Default tier-0 retention is short. Use the country map for long ranges. +- **`__overflow__` row in results.** Your aggregation produced more groups than `query_max_groups`. Narrow the filter or reduce group-by depth. +- **30-second query timeout.** Hard limit. Narrow time range, add filters, or reduce group-by depth. +- **Sampled byte counts not exact.** sFlow is statistical by design; even NetFlow with sampling is an estimate. Cross-check against SNMP for sanity, accept some divergence. +- **`enabled: false` makes the plugin look crashed.** It's intentional — the plugin tells the parent to stop respawning it. Look for the "disabled by config" line in the journal. + +## Diagnostic command quick reference + +```bash +# What's happening +sudo journalctl -u netdata --since "10 minutes ago" | grep -iE 'netflow|geoip|bmp|bioris|network-sources' + +# What's arriving on the wire +sudo tcpdump -i any -nn -c 50 'udp port 2055' + +# Is the listener bound +sudo ss -unlp | grep 2055 + +# UDP kernel drops +sudo ss -uam sport = :2055 +cat /proc/net/udp + +# Disk usage by tier +sudo du -sh /var/cache/netdata/flows/* + +# Process resources +top -p $(pgrep -f netflow-plugin) + +# Capture a sample for offline analysis +sudo tcpdump -w /tmp/netflow-sample.pcap -c 200 'udp port 2055' +``` + +## When to file an issue + +Collect this before opening a bug report: + +- Plugin version (`netdata --version` from the running daemon). +- A sample of `netflow.input_packets` chart for the failure window — all dimensions visible. +- A sample of `netflow.memory_resident_bytes` if performance-related. +- A captured pcap (`tcpdump -w` from the agent's interface) reproducing the issue. +- Sanitised `netflow.yaml` (redact internal IPs, customer names, secrets). +- Relevant log lines from `journalctl -u netdata`. + +Open issues against [github.com/netdata/netdata](https://github.com/netdata/netdata) with `area/collectors/netflow` in the title. + +## What's next + +- [Plugin Health Charts](/docs/network-flows/visualization/plugin-health-charts) — The charts referenced above. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — How to spot silent data corruption. +- [Anti-patterns](/docs/network-flows/anti-patterns) — Why some "weird" results are actually normal. +- [Configuration](/docs/network-flows/configuration) — Tuning that affects most of the symptoms above. diff --git a/docs/Network Flows/Validation and Data Quality.mdx b/docs/Network Flows/Validation and Data Quality.mdx new file mode 100644 index 0000000000..894dafedb4 --- /dev/null +++ b/docs/Network Flows/Validation and Data Quality.mdx @@ -0,0 +1,136 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/validation.md" +sidebar_label: "Validation and Data Quality" +learn_status: "Published" +learn_rel_path: "Network Flows" +sidebar_position: "100" +learn_link: "https://learn.netdata.cloud/docs/network-flows/validation-and-data-quality" +slug: "/network-flows/validation-and-data-quality" +--- + + +# Validation and Data Quality + +Flow data is statistical. It can be wrong in subtle ways that the dashboard cannot detect — silent UDP drops, undocumented sampling rate changes, exporters that stopped sending. This page is the routine you should run when you set up the plugin, when something looks suspicious, and periodically thereafter. + +The goal: distinguish "the data is correct" from "the data looks plausible but isn't". + +## The biggest risks are silent failures + +The most dangerous failures don't generate alerts. They look like data is flowing — just less of it, or skewed, or scaled wrong. Six common silent failures: + +1. **UDP datagram drops** — kernel drops happen when the receive buffer fills. Plugin sees fewer datagrams than the network sent. Counters are smaller; nothing logs the drop. +2. **Sampling rate misinterpretation** — exporter samples 1-in-1000, no one documented it. Bytes look 1000× smaller than reality. +3. **Sampling rate change** — someone reconfigures a router. Trends show a phantom 10× spike. No alert fires. +4. **Wrong interfaces being exported** — flow export was enabled on three of five interfaces. Some traffic is invisible. +5. **Template loss after collector restart** — v9 / IPFIX records arrive but cannot be decoded until the next template arrives. Counts dip silently. +6. **Stale GeoIP / ASN database** — country and AS-name fields drift away from reality over weeks. + +For each, the system appears to be working. The only way to detect them is cross-validation against an independent source. + +## The minimum viable validation routine + +Run this once after deployment, then quarterly, plus whenever something looks off. + +### 1. SNMP cross-check (every 5 minutes if you have an SNMP collector handy) + +Compare flow-derived bandwidth on a specific interface to the SNMP `ifInOctets` / `ifOutOctets` counter for that same interface. They should be close. + +The flow-derived bandwidth: filter the dashboard to one exporter, one input interface (or one output interface — pick a direction), and read the bytes/s rate. + +The SNMP-derived bandwidth: from your SNMP monitoring (Netdata's snmp.d, your separate SNMP system, or your network team). + +**Acceptable difference: roughly 5-15%.** SNMP includes layer-2 traffic (ARP, STP, LLDP, routing protocols, interface-level multicast) that flow data filters out. Expect SNMP slightly higher. + +**Not acceptable: more than 30% gap.** That indicates one of: + +- UDP drops (kernel-level). Run `sudo ss -uam sport = :2055` and check the `dRcv` column. +- Sampling rate not honoured. The exporter is sampling but not communicating the rate to the plugin (NetFlow v7, NetFlow v5 with rate=0, v9 / IPFIX without the Sampling Options Template). +- Wrong interfaces being exported. Cross-check `show flow exporter` (or vendor equivalent) against your expectations. +- Template loss. Watch `netflow.input_packets > template_errors` on the plugin health charts. + +**Plugin reporting wildly more than SNMP** indicates the doubling effect (see below). + +### 2. Doubling sanity check + +If your dashboard's total bandwidth exceeds the **physical link capacity**, you're double-counting. Standard NetFlow / IPFIX configuration produces two flow records per packet (one ingress, one egress). With multiple monitored routers on the same path, even more. + +Verify by: filter to one exporter and one interface in one direction (input OR output, not both). Compare to SNMP for that same interface. They should agree within 5-15%. The difference between "all flows summed" and "filtered to one direction" is exactly the doubling factor. + +### 3. Sampling rate sanity check + +For each exporter, document: + +- Does it sample? At what rate? +- Does it carry the rate in flow records (NetFlow v9 / IPFIX) or in the header (v5)? +- For NetFlow v9 / IPFIX, does the exporter send a Sampling Options Template? At what frequency? + +If the exporter samples and the plugin doesn't see the rate, bytes are undercounted. + +To verify the plugin sees the rate: query a known flow on the dashboard and look at `RAW_BYTES` and `BYTES`. If they differ, the plugin is multiplying — sampling rate is being honoured. If they're identical, the plugin sees rate 1 (no scaling). + +### 4. Per-exporter health check + +The plugin doesn't publish per-exporter ingest counters today. To verify each exporter is sending: + +- Filter the dashboard to one exporter at a time. Check the byte rate. A healthy edge router during business hours should show non-zero traffic. +- An exporter that abruptly drops to zero is offline (silently). The plugin won't tell you — your monitoring practice has to. + +### 5. Template cache health (NetFlow v9 / IPFIX) + +On the plugin health chart `netflow.input_packets`, watch `template_errors`. In steady state, it should be near zero. A sustained non-zero rate means data records are arriving before their templates — usually because the exporter sends templates rarely (every 30 minutes is common Cisco default) and the plugin's template cache was wiped (restart with no persistence, or first-time setup). + +The plugin persists template state across restarts to `decoder_state_dir`, so a routine restart shouldn't cause this. If it does, check the cache directory permissions. + +### 6. GeoIP / ASN database freshness + +The plugin doesn't publish a "MMDB last loaded" signal. To verify your databases aren't stale: + +```bash +ls -la /var/cache/netdata/topology-ip-intel/ /usr/share/netdata/topology-ip-intel/ +``` + +Files older than ~60 days are likely stale. Refresh: + +```bash +sudo /usr/sbin/topology-ip-intel-downloader +``` + +The plugin polls the files every 30 seconds — a successful refresh picks up automatically without restart. + +### 7. Internal IP enrichment validation + +Before relying on geographic analysis, spot-check that internal IPs are properly handled. Filter to an internal source IP you know and look at the `SRC_COUNTRY` and `SRC_AS_NAME` fields: + +- Empty / "AS0 Private IP Address Space" — correct. +- Some random country — your GeoIP database is returning data for private space. Declare the range under `enrichment.networks` (see [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata)). + +## Quick reference: what to monitor and what alerts to consider + +| Signal | Where | What to alert on | +|---|---|---| +| `udp_received` rate dropped | `netflow.input_packets` chart | Sustained 0 during business hours | +| `template_errors` rising | `netflow.input_packets` chart | Sustained > 1% of `udp_received` | +| `parse_errors` rising | `netflow.input_packets` chart | Sustained > 5% of `udp_received` | +| Memory growing (`unaccounted`) | `netflow.memory_accounted_bytes` | RSS grows linearly without ingest growth | +| `decoder_scopes` unbounded growth | `netflow.decoder_scopes` chart | Monotonic growth over hours | +| Disk full warnings | `netflow.raw_journal_ops` `write_errors` | Any non-zero | +| SNMP-flow gap | external | More than 30% on a steady-state link | +| Sampling rate change | router config diff (yours) | Any change to active timeout or sampling | + +## When to file a "data is wrong" investigation + +Start an investigation when **two independent signals disagree**: + +- SNMP says 500 Mbps; flow data says 50 Mbps. Investigate sampling, drops, exporter coverage. +- Flow data shows traffic to a country; threat intelligence says that country's ASN doesn't host known infrastructure. Investigate GeoIP or anycast. +- Last week's top talker disappeared this week. Investigate exporter health, routing changes, business-side changes. + +For each, read the [Anti-patterns](/docs/network-flows/anti-patterns) page first — most "data is wrong" reports are actually expected behaviour misread. + +## What's next + +- [Plugin Health Charts](/docs/network-flows/visualization/plugin-health-charts) — The charts referenced above. +- [Anti-patterns](/docs/network-flows/anti-patterns) — Misreadings to rule out before declaring a bug. +- [Investigation Playbooks](/docs/network-flows/investigation-playbooks) — Concrete recipes for common questions. +- [Troubleshooting](/docs/network-flows/troubleshooting) — Recovery for the symptoms above. diff --git a/docs/Network Flows/Visualization/Filters and Facets.mdx b/docs/Network Flows/Visualization/Filters and Facets.mdx new file mode 100644 index 0000000000..297c828510 --- /dev/null +++ b/docs/Network Flows/Visualization/Filters and Facets.mdx @@ -0,0 +1,87 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/filters-facets.md" +sidebar_label: "Filters and Facets" +learn_status: "Published" +learn_rel_path: "Network Flows/Visualization" +sidebar_position: "40" +learn_link: "https://learn.netdata.cloud/docs/network-flows/visualization/filters-and-facets" +slug: "/network-flows/visualization/filters-and-facets" +--- + + +# Filters and Facets + +The filter ribbon (between the visualisation and the table) is how you narrow flow data to the subset you want. Filters apply to every view — Sankey, table, time-series, country/state/city maps, globe — at once. + +## What you can filter on + +Around 80 fields are available as facets. They're a subset of the full 91-field schema: + +- **Excluded** — metric fields (`BYTES`, `PACKETS`, `FLOWS`, `RAW_BYTES`, `RAW_PACKETS`, `SAMPLING_RATE`), timestamp fields (`FLOW_START_USEC`, `FLOW_END_USEC`, `OBSERVATION_TIME_MILLIS`), and the four geo coordinate fields (latitude/longitude). These don't make sense as categorical filters. +- **Two virtual facets**: `ICMPV4` and `ICMPV6` — synthesised on the fly from `PROTOCOL` plus the type and code fields. Filtering on `ICMPV4 = "Echo Request"` gives you that ICMP message type without writing two separate filters. + +Everything else — IPs, ports, protocol, AS numbers and names, country, state, city, exporter labels, interfaces, MACs, VLANs, NAT addresses, TCP flags, ToS, etc. — is filterable. + +## Filter logic + +Within a single field: **OR**. Selecting `PROTOCOL = TCP` and `PROTOCOL = UDP` shows TCP-or-UDP. + +Across different fields: **AND**. Adding `SRC_COUNTRY = US` to the above shows TCP-or-UDP from the US. + +## No negative match (yet) + +You cannot directly say "everything except X". The workaround is to select all values and remove the unwanted one — works for low-cardinality fields like `PROTOCOL` (a handful of values). For high-cardinality fields like `SRC_AS_NAME`, the autocomplete only surfaces the top 100 values, so there's no practical way to "select all and remove". + +This is a real limitation. Negative match is a known feature gap. + +## Autocomplete + +Type into a facet field and the dashboard suggests existing values from your live data. The list: + +- Shows up to **100 matching values**, sorted alphabetically. +- Matching policy is per-field. Free-form text fields (`SRC_AS_NAME`, `EXPORTER_NAME`, `IN_IF_DESCRIPTION`, MAC addresses, AS paths, BGP communities, country/city/state names) match by **substring**, so typing `Akamai` finds `AS20940 Akamai International`. IPs and short numeric fields (ports, protocols, ASN numbers, interface speeds) match by **prefix**, so typing `10.0.` narrows to that range. +- Runs against an **in-memory snapshot of the live journal** plus on-disk FST sidecars for promoted high-cardinality fields. Autocomplete never reads the raw flow tiers, and is fast even on busy collectors. +- The autocomplete `term` is hard-capped at 256 bytes; longer requests are rejected. + +For high-cardinality fields, autocomplete is the only practical way to discover values. You can't scroll a list of millions of IP addresses, but you can find one by typing what you remember. + +**Autocomplete and regular filtering are different paths.** When you select a value from the dropdown, the resulting filter is **exact equality**, not substring. The dropdown only helps you discover values; the filter that gets applied is `key = value` (or `key in [values]`) and uses indexes — never a substring scan over flow data. + +## Full-text search + +The search box at the top of the filter ribbon performs a regex match against the raw journal payload bytes. Notes: + +- The search is **regex**, not literal. `8.8.8.8` is a regex where `.` matches any byte — so it can match `8a8b8c8`, `888x888`, etc. To match the literal string, escape with backslashes: `8\.8\.8\.8`. +- The match is **byte-level** against the journal payload, so it can find substrings inside enriched fields (AS names, exporter names, country codes). +- Any non-empty search **forces tier 0**. The full-text search only works against the raw journal — it doesn't apply to the rollup tiers. Time depth is therefore bounded by raw-tier retention. +- The plugin's "fast aggregation" path is also disabled when full-text search is active, because aggregation needs to scan every record. Expect somewhat slower responses than tier-based aggregation queries. + +For "find anything containing this string in any field", the search is the right tool. For "filter by an exact value of a specific field", use the facet on that field — it's faster and doesn't trigger tier-0 mode. + +## URL preservation + +Every filter and selection is preserved in the dashboard URL. Copy the URL and share it; the recipient sees the exact same view, provided they have access to the same Netdata Cloud space. The dashboard also remembers your last selections per session, so you'll land on the same configuration when you return. + +A practical note: filters use a structured representation (per-field IN-list) that's easy to encode as a JSON payload but awkward to URL-encode. The dashboard handles this transparently for sharing — but anyone scripting their own queries against the function should use JSON-payload requests to the function, not GET-style args. + +## Facet limits + +`query_facet_max_values_per_field` (default `5000`) caps how many distinct values a single facet can return per query. Past that, the facet stops accumulating; the response carries an indicator. Useful when you have an extremely high-cardinality field — autocomplete still surfaces the top 100, but the full list is bounded. + +You can raise this limit in `netflow.yaml`. Higher values use more memory at query time. + +## Things that go wrong + +- **Search for `192.168.1.1` matches unrelated rows.** Regex semantics: each `.` is "any byte". Escape: `192\.168\.1\.1`. +- **Time depth shrinks unexpectedly after typing in search.** Full-text search forces tier 0. Clear the search to use rollup tiers and longer time ranges. +- **Negative match isn't there.** Workaround: select-all-minus-one for low-cardinality fields. For high-cardinality fields, no good workaround exists today. +- **Filter on an ICMP virtual facet seems slower than expected.** `ICMPV4` / `ICMPV6` virtual facets aren't optimised by the journal index — they're evaluated per-record. The query still returns; the cost shows up as longer wall time on busy collectors. +- **`query_max_groups` exceeded.** Result rows after the limit fold into `__overflow__`. Narrow the filter or reduce group-by depth. +- **GET-style args don't carry selections.** When integrating the function call yourself, send a JSON payload — the dashboard does this automatically. + +## What's next + +- [Sankey and Table](/docs/network-flows/visualization/sankey-and-table) — The view that filters drive most often. +- [Retention and Querying](/docs/network-flows/retention-and-querying) — Why filters can shift the tier the query uses. +- [Field Reference](/docs/network-flows/field-reference) — Which fields are available as facets. +- [Investigation Playbooks](/docs/network-flows/investigation-playbooks) — Practical filter-driven workflows. diff --git a/docs/Network Flows/Visualization/Maps and Globe.mdx b/docs/Network Flows/Visualization/Maps and Globe.mdx new file mode 100644 index 0000000000..830e303b76 --- /dev/null +++ b/docs/Network Flows/Visualization/Maps and Globe.mdx @@ -0,0 +1,117 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/maps-globe.md" +sidebar_label: "Maps and Globe" +learn_status: "Published" +learn_rel_path: "Network Flows/Visualization" +sidebar_position: "30" +learn_link: "https://learn.netdata.cloud/docs/network-flows/visualization/maps-and-globe" +slug: "/network-flows/visualization/maps-and-globe" +--- + + +# Maps and Globe + +Four geographic views, all driven by the same aggregation engine as the Sankey and Time-Series: + +- **Country map** — countries connected by edges weighted by traffic +- **State map** — same, at state/province granularity +- **City map** — same, at city level (down to street-level granularity, depending on your GeoIP database) +- **Globe** — a 3D view of city-level connections rendered as arcs over the globe + +Use these to spot geographic patterns at a glance — unexpected destinations, asymmetric traffic, CDN routing. + +![Country map, top 500](https://github.com/user-attachments/assets/f9f09cf2-40c5-4bda-bf56-19b04b6cddf1) + +Country map with top-N pushed to 500, so practically every country with traffic shows up. Edge thickness is bandwidth aggregated per country pair. + +## How they work + +For each map view, the dashboard: + +1. Forces a specific aggregation. You don't pick `group_by` for these views — the view picks for you. +2. Runs the aggregation across your time range and filters. +3. Renders the top-N (25/50/100/200/500) results as edges on the map. +4. Same aggregation drives the side-panel list of countries / cities. The list and the map are two views of the same data. + +The forced aggregations are: + +| View | Forced group-by | +|---|---| +| Country map | `SRC_COUNTRY`, `DST_COUNTRY` | +| State map | `SRC_COUNTRY`, `SRC_GEO_STATE`, `DST_COUNTRY`, `DST_GEO_STATE` | +| City map | `SRC_COUNTRY`, `SRC_GEO_STATE`, `SRC_GEO_CITY`, latitude, longitude (source + destination) | +| Globe | Same as city map | + +Edge width is proportional to your sort metric (bytes or packets). The geographic coordinates needed to draw cities and arcs come from the response itself — they're already enriched into each flow record by the time the dashboard renders. You don't need a separate city-coordinates database in the dashboard. + +## Country and state vs city / globe + +The country map and state map can use the rollup tiers. They're cheap over long time windows. + +The city map and the globe **need raw-tier data**. City, latitude, and longitude are dropped from the rollup tiers (1m / 5m / 1h) to keep cardinality manageable. So: + +- Country / state map over the last 30 days — fine, uses the 1-hour tier. +- City map over the last 30 days — likely empty. Tier 0 retention defaults to 7 days (shared budget across all tiers); often less in practice. + +If your city map looks empty over a long window, try the country map first to confirm data is arriving, then narrow the time range until the city map fills in. + +## Tooltips + +Hover over a country, state, city, or arc to see a tooltip. The tooltip shows the same fields as the underlying row — endpoints, byte and packet counts. Click does **not** drill down to a different view; the maps are read-only with respect to navigation. To change perspective (e.g., "show me traffic for this country only"), use the filter ribbon to add a `SRC_COUNTRY` or `DST_COUNTRY` selection. + +![State map zoomed over the US, hovering an Attica↔California link](https://github.com/user-attachments/assets/6f124a7c-e12f-453f-8599-59e48bc839e8) + +State map with top-N at 500, zoomed over the US. The tooltip on the link between Attica (Greece) and California shows bidirectional traffic — bytes and packets in each direction. + +![City map zoomed over Europe](https://github.com/user-attachments/assets/e752e1e3-4f6a-4366-b2e2-6af04d4bc2fe) + +City map with top-N at 500, zoomed over Europe. Dozens of European cities appear connected by edges weighted by bandwidth. + +![Globe view over the Atlantic, US ↔ EU links](https://github.com/user-attachments/assets/c83a963d-797f-44f8-9ae1-e9aba7e16eec) + +Globe view, top-N at 500, rotated over the Atlantic. The 3D projection shows US cities and EU cities at the curvy edges, with arcs (bandwidth-thickness) bridging them. + +## Things to know + +### GeoIP is required + +Without a GeoIP database, country / state / city / coordinate fields are empty and the maps are blank. The default install includes a stock DB-IP database — see [GeoIP enrichment](/docs/network-flows/enrichment-concepts/ip-intelligence). Source builds need the operator to run the downloader once. + +### Internal IPs in random countries + +If you see "traffic from China" or "traffic to Russia" coming from your own network, that's almost always GeoIP misidentifying internal IPs. The fix is to declare your internal CIDRs explicitly under `enrichment.networks` with a country override. See [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata). Don't trust GeoIP for RFC 1918 / RFC 6598 / link-local addresses. + +### CDN traffic shifts + +Your traffic to a SaaS provider may resolve to one country today and another tomorrow because the CDN's routing changed. This is normal CDN behaviour, not a security incident. ASN-based aggregation is more stable for cloud / CDN traffic than country-based — see the [Anti-patterns page](/docs/network-flows/anti-patterns) "Geographic firewall of shame". + +### Mirroring + +Bidirectional conversations show up as two arcs (A→B and B→A). With the default 25 top-N, that means about 12 actual conversations get rendered, not 25. To see one direction only, filter on a specific source or destination. + +### Globe vs City Map + +The globe and city map use the same data. The globe is purely a different rendering of the same response — useful for visual presentation, less useful for analysis (the 3D projection makes precise reading harder than a 2D map). + +## What controls are available + +- **Time range** — Netdata's global time picker +- **Filters** — facet selections + autocomplete + full-text search +- **Top-N** — 25 / 50 / 100 / 200 / 500 +- **Sort by** — bytes or packets (determines edge weight and the side-list ranking) +- **Group-by** — locked to the view-specific aggregation; not user-configurable for maps + +## Things that go wrong + +- **City map empty.** Time range exceeds tier-0 retention. Narrow the range, or use country/state map for a wider view. +- **Random countries appearing for internal traffic.** Declare your internal CIDRs in `enrichment.networks`. +- **Ireland or Singapore showing up unexpectedly.** Probably AWS/GCP/Azure shifting CDN routing. ASN-based aggregation is more stable. +- **A whole country disappears.** Your filter excluded it. Check the filter ribbon. +- **No data on globe but city map works.** Both should fail or succeed identically — they consume the same response. If they diverge, that's a dashboard bug worth reporting. + +## What's next + +- [GeoIP enrichment](/docs/network-flows/enrichment-concepts/ip-intelligence) — Required for any geographic visualization. +- [Static metadata](/docs/network-flows/enrichment-concepts/static-metadata) — Declare your internal networks to override GeoIP for RFC 1918. +- [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) — Narrowing geographic views. +- [Anti-patterns](/docs/network-flows/anti-patterns) — Why "alert on traffic to country X" is fragile. diff --git a/docs/Network Flows/Visualization/Plugin Health Charts.mdx b/docs/Network Flows/Visualization/Plugin Health Charts.mdx new file mode 100644 index 0000000000..a0b18ffdd5 --- /dev/null +++ b/docs/Network Flows/Visualization/Plugin Health Charts.mdx @@ -0,0 +1,106 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/dashboard-cards.md" +sidebar_label: "Plugin Health Charts" +learn_status: "Published" +learn_rel_path: "Network Flows/Visualization" +sidebar_position: "50" +learn_link: "https://learn.netdata.cloud/docs/network-flows/visualization/plugin-health-charts" +slug: "/network-flows/visualization/plugin-health-charts" +--- + + +# Plugin Health Charts + +The netflow plugin publishes its own operational charts under the `netdata.netflow.*` chart context. These appear on the standard Netdata charts page (alongside system metrics like CPU and memory), **not** inside the Network Flows tab. They are how you monitor the plugin itself: is it receiving data, are templates flowing, is memory growing, is disk being written. + +This is also where you look first when something seems wrong — long before opening the Network Flows tab. + +All charts update every 1 second. + +## The charts + +| Chart | Type | What it shows | +|---|---|---| +| `netflow.input_packets` | line, packets/s | Datagrams: received, parsed, errored, per-protocol counts | +| `netflow.input_bytes` | line, bytes/s | UDP byte rate | +| `netflow.raw_journal_ops` | line, ops/s | Raw journal: writes, sync calls, errors | +| `netflow.raw_journal_bytes` | line, bytes/s | Raw journal logical bytes written | +| `netflow.materialized_tier_ops` | line, ops/s | Rollup tiers: rows produced per tier, flushes, errors | +| `netflow.materialized_tier_bytes` | stacked, bytes/s | Rollup tier byte rate, broken down by tier | +| `netflow.open_tiers` | stacked, rows | Rows currently open per tier | +| `netflow.journal_io_ops` | line, ops/s | Decoder-state persist operations and errors | +| `netflow.journal_io_bytes` | line, bytes/s | Decoder-state persist byte rate | +| `netflow.decoder_scopes` | line, scopes | Distinct (exporter, observation domain) scopes the decoder tracks | +| `netflow.memory_resident_bytes` | line, bytes | Process RSS, peak RSS, breakdown | +| `netflow.memory_resident_mapping_bytes` | stacked, bytes | RSS broken down by what's in it | +| `netflow.memory_allocator_bytes` | line, bytes | Allocator-internal stats | +| `netflow.memory_accounted_bytes` | stacked, bytes | RSS attributed to known components, plus `unaccounted` | +| `netflow.memory_tier_index_bytes` | stacked, bytes | Tier-index memory drilldown | + +## Reading the most useful charts + +### `netflow.input_packets` + +The single most important chart. Five families of dimensions: + +- **`udp_received`** — datagrams pulled off the socket. If this is zero, nothing is reaching the plugin (firewall, no exporter, wrong port). +- **`parse_attempts`, `parsed_packets`** — should track each other on a healthy collector. If `parse_attempts` is high but `parsed_packets` is low, datagrams are arriving but failing to decode. +- **`parse_errors`** — counts datagrams that failed parsing for any reason (truncated, malformed, unsupported version). +- **`template_errors`** — counts data records arriving before their template (v9 / IPFIX). Should be near zero in steady state. A sustained non-zero rate means the exporter is sending templates too rarely or your collector has lost template state. +- **`netflow_v5`, `netflow_v7`, `netflow_v9`, `ipfix`, `sflow`** — per-protocol successful counts. Useful to identify "which protocol is actually arriving". + +### `netflow.decoder_scopes` + +Cardinality of decoder state. Reports how many distinct `(exporter, observation domain)` template caches the plugin currently holds. Watch for unbounded growth — an exporter that frequently rotates template IDs (rare but real) will inflate this without bound. + +### `netflow.materialized_tier_*` + +Show the rollup pipeline working. `*_rows` should track ingest. `flushes` should tick steadily; if it stops, tiering is stalled. + +### `netflow.memory_resident_bytes` and `netflow.memory_accounted_bytes` + +If RSS climbs over time: + +- Check `netflow.memory_accounted_bytes` to see where it's going. +- The `unaccounted` dimension is `RSS - sum(known components)`. **A growing `unaccounted` is your leak signal.** +- `tier_indexes` and `open_tiers` are normal sources of growth — they should track ingest rate. +- `geoip_asn` and `geoip_geo` are mmap'd MMDB files. Their size grows as the kernel pages the file in under read pressure. + +### `netflow.memory_resident_mapping_bytes` + +This one breaks RSS down by what's mapped. Useful when you want to attribute "this process is using 800 MB" — heap, journals (per tier), MMDB files, anonymous mappings, etc. + +## What's NOT in these charts + +A few signals that aren't published today: + +- **Per-exporter ingest counter.** No per-source rate dimension. Decoder-scope cardinality tells you how many sources, not how busy each one is. +- **UDP socket drops.** Kernel-level drops (full receive buffer, NIC drops) are not surfaced. Use the OS-level metrics: `cat /proc/net/udp` (column `RcvbufErrors`) or `ss -uam`. +- **Template cache hit ratio.** `template_errors` counts misses; there's no corresponding "hits" counter to form a ratio. +- **GeoIP staleness signal.** No "MMDB last loaded" timestamp or version. The mapping memory dimensions tell you if a database is loaded, not how old it is. +- **Per-tier query latency.** These charts cover ingest and storage; query-side performance isn't observable. +- **BioRIS counters.** They're collected internally but not published as chart dimensions today. + +If you need any of these, mention it in an issue — they're not hard to add but haven't been needed enough yet. + +## How to use these charts for diagnosis + +| Symptom | Look at | What it means | +|---|---|---| +| Network Flows tab is empty | `netflow.input_packets` `udp_received` | Zero = no datagrams arriving (firewall? wrong port?). Non-zero with `parsed_packets` zero = wrong protocol or all datagrams malformed. | +| Sudden drop in flows | per-protocol dimensions | Identifies which protocol stopped (helps narrow whether it's a router, a router class, or all routers). | +| Templates failing | `template_errors` rising | Exporter not sending templates often enough; collector lost cache; cache mismatch after firmware update. | +| Cache growing without bound | `decoder_scopes` rising over hours | Exporter churn or unstable template IDs. Investigate per-router behaviour. | +| Memory pressure | `netflow.memory_resident_bytes`, `netflow.memory_accounted_bytes` | If `rss` climbs and `unaccounted` is the dimension growing → unattributed allocation, possibly a leak. If `tier_indexes` or `open_tiers` climbs → ingest backpressure, flushing stalled. | +| Disk write stalls | `netflow.raw_journal_ops` `write_errors`, `sync_errors` | Disk full, permission denied, fs error. | +| Decoder state not persisting | `netflow.journal_io_ops` | `decoder_state_persist_calls` should tick periodically. `*_errors` should be 0. | + +## Where these are NOT shown + +These charts are **not** in the Network Flows tab. Look for them on the standard Netdata charts page, in the family `netflow`. The Network Flows tab itself shows traffic data, not plugin health. + +## What's next + +- [Troubleshooting](/docs/network-flows/troubleshooting) — Concrete diagnostic workflows. +- [Validation and Data Quality](/docs/network-flows/validation-and-data-quality) — Cross-checking plugin counters against SNMP. +- [Configuration](/docs/network-flows/configuration) — Tuning that affects what these charts show. diff --git a/docs/Network Flows/Visualization/Sankey and Table.mdx b/docs/Network Flows/Visualization/Sankey and Table.mdx new file mode 100644 index 0000000000..e36a6db5f5 --- /dev/null +++ b/docs/Network Flows/Visualization/Sankey and Table.mdx @@ -0,0 +1,120 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/summary-sankey.md" +sidebar_label: "Sankey and Table" +learn_status: "Published" +learn_rel_path: "Network Flows/Visualization" +sidebar_position: "10" +learn_link: "https://learn.netdata.cloud/docs/network-flows/visualization/sankey-and-table" +slug: "/network-flows/visualization/sankey-and-table" +--- + + +# Sankey and Table + +The Network Flows tab opens with two views stacked: a Sankey diagram on top, a sortable table beneath. Both render the same data, the same top-N aggregation, the same field selection. Selecting fields, filtering, and sorting affects both at once. + +This is your default view. Most investigative workflows start here. + +![Sankey top 25 with table, filtered by source IP](https://github.com/user-attachments/assets/314e7195-3fff-4bc7-a01a-0d59ded513d7) + +Sankey + table side by side, with a `SRC_ADDR` filter applied. The table at the bottom shows the same 25 rows as the diagram above; clicking a column header re-sorts both. The filter ribbon at the top is how you narrow to one source. + +## What you see by default + +When you first open the tab: + +- **Time range**: last 15 minutes (Netdata's global time picker default — it applies to metrics, logs, flows, and topology together) +- **View**: Sankey + Table +- **Top-N**: 25 (selectable: 25 / 50 / 100 / 200 / 500) +- **Sort by**: bytes (alternative: packets) +- **Aggregation fields**: `Source ASN → Protocol → Destination ASN` +- **No filters applied** — the dashboard remembers your last selections, so on subsequent visits you'll land on whatever you had open + +The Sankey shows the top 25 conversations between Source ASN, Protocol, and Destination ASN, weighted by bytes. The table below shows the same 25 rows, with bytes and packets columns appended. + +## How to read the Sankey + +A Sankey diagram has columns of nodes and weighted bands flowing between them. + +- Each **column** corresponds to one of your selected aggregation fields, in the order you specified. +- Each **node** is one distinct value in that column (e.g., one ASN, or one country). +- Each **band** is one row in the underlying top-N — its width is proportional to the bytes (or packets) for that combination. + +With the default 3-column setup (Source ASN → Protocol → Destination ASN), you see the top 25 (Source ASN, Protocol, Destination ASN) tuples by traffic volume. A wide band from `AS65000` to `tcp` to `AS15169` says "AS65000 sent a lot of TCP to AS15169 in this time window". + +You can pick **1 to 10 fields** as columns. Order matters — the Sankey draws bands left-to-right in the order you list. There are roughly 84-85 fields available for aggregation; metric fields (`BYTES`, `PACKETS`, sampling rate, timestamps) and the geo coordinates (latitude/longitude) are not selectable here. + +![Sankey top 25, table collapsed](https://github.com/user-attachments/assets/c846e80f-b1d1-4330-bb46-162d021604fa) + +Same view with the table folded away — useful when you want the Sankey's full vertical real estate. Click the table header to expand it again. + +## Top-N is "top-N grouped tuples" + +When you set top-N to 25, the response contains the **25 top group-by tuples**, ranked by your sort metric. The 26th-largest and beyond are folded into a synthetic `__other__` row that represents "everything else, summed". + +If your aggregation produces enormously many distinct tuples (more than `query_max_groups`, default 50 000), an additional `__overflow__` row appears, summing everything that didn't fit in the in-memory accumulator. Both `__other__` and `__overflow__` are real rows in the response and may show up in the Sankey and the table — they aren't bugs, they're "everything off the bottom of the list". + +To narrow further: filter, or change the aggregation columns. Bumping top-N higher (200, 500) helps for shallow searches; for serious investigation, filter. + +## How to read the Table + +The same data, sortable and column-customisable. + +- One row per top-N tuple +- One column per aggregation field +- Plus `bytes` and `packets` columns, both sortable + +`SRC_AS_NAME` and `DST_AS_NAME` columns get extra width because AS names are long. Latitude / longitude columns are present in the underlying data but **hidden by default** — they're carried through so the city map and globe views can use them, but they aren't useful in tabular form. + +Click any column header to re-sort. Click a value to add a filter on that field. The same filter applies to the Sankey. + +## The filter ribbon + +A filter strip sits between the Sankey and the table. Three things you can do here: + +- **Select facet values** — click a field, pick one or more values. The query updates. +- **Autocomplete** — type into a facet field; the dashboard suggests existing values from the live data. Useful for high-cardinality fields like AS names. +- **Free-text search** — anything you type in the search box runs as a regex against the raw journal data. + +Filter logic is "AND across fields, OR within a field". Selecting `PROTOCOL = TCP` and `PROTOCOL = UDP` shows TCP-or-UDP. Selecting `PROTOCOL = TCP` plus `SRC_COUNTRY = US` shows only US-source TCP. + +There is no negative match. To exclude a value, select all values and remove the unwanted one — works for low-cardinality fields, becomes impractical for high-cardinality ones (the autocomplete cap is 100). + +See [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) for the full mechanics. + +## Choosing aggregation fields + +Some shapes that work well: + +- **Default**: `Source ASN → Protocol → Destination ASN`. The "who, on what, to whom" overview. Good first look. +- **Country flow**: `Source Country → Destination Country`. Cleanest geographic view. Combine with `protocol` for service-level detail. +- **Per-router slice**: `Exporter Name → Input Interface → Destination ASN`. Use when you have per-router questions. +- **Service drill-down**: `Destination Port → Source ASN`. Who's hitting your services. +- **Internal/external split**: `IN_IF_BOUNDARY → DST_COUNTRY → Destination ASN`. After labelling your boundaries via static metadata. + +The order of fields determines the visual flow. Reorder to change which dimension is "left" and "right" in the Sankey. + +## Things to know + +### Doubling + +Without filtering, aggregate volume on a single router is roughly 2× the actual traffic — every packet generates two flow records (one ingress, one egress). To see real volume on a specific link, filter to one exporter and one direction (input interface OR output interface, not both). See [Anti-patterns](/docs/network-flows/anti-patterns) for the full framing. + +### Sharing your view + +The dashboard URL preserves all state — time range, filters, aggregation fields, top-N, sort. Copy the URL and share with anyone who has access to the same Netdata Cloud space. + +### Limits + +- `top_n` clamps to one of \{25, 50, 100, 200, 500}. +- Maximum 10 group-by fields. More are silently truncated. +- Maximum 50 000 distinct group tuples per query (`query_max_groups`); over that, surplus folds into `__overflow__`. +- The query itself has a 30-second hard timeout. + +## What's next + +- [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) — Filtering mechanics in detail. +- [Time-Series](/docs/network-flows/visualization/time-series) — How traffic evolves over the time window. +- [Maps and Globe](/docs/network-flows/visualization/maps-and-globe) — Geographic views. +- [Field Reference](/docs/network-flows/field-reference) — Which fields are available for aggregation. +- [Anti-patterns](/docs/network-flows/anti-patterns) — How to read the numbers correctly. diff --git a/docs/Network Flows/Visualization/Time-Series.mdx b/docs/Network Flows/Visualization/Time-Series.mdx new file mode 100644 index 0000000000..9c2ab2b526 --- /dev/null +++ b/docs/Network Flows/Visualization/Time-Series.mdx @@ -0,0 +1,99 @@ +--- +custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/time-series.md" +sidebar_label: "Time-Series" +learn_status: "Published" +learn_rel_path: "Network Flows/Visualization" +sidebar_position: "20" +learn_link: "https://learn.netdata.cloud/docs/network-flows/visualization/time-series" +slug: "/network-flows/visualization/time-series" +--- + + +# Time-Series + +The Time-Series view plots traffic over time. Same top-N selection as the Sankey + Table, but rendered as a stacked chart across the time range you've selected. + +Use it for: anomaly detection, trending, comparing now to last week, capacity planning. Use the Sankey/Table view for: "what's the breakdown right now". + +![Time-Series top 25 with table](https://github.com/user-attachments/assets/0bb2637d-632b-4e97-900f-b14155ab0771) + +Stacked chart on top, table at the bottom. Each colored band is one of the 25 top groups, summed into time buckets. The table holds the same 25 rows with totals across the whole window. + +## How it works + +The view runs the same aggregation as the Sankey + Table: + +1. The plugin scans the journal across your time range, aggregating by your group-by fields, summing bytes and packets. +2. It picks the top-N groups by your sort metric (bytes or packets) over the **whole** window. +3. It re-scans the journal and accumulates those top-N groups into time buckets. +4. The result is a stacked chart with one dimension per top-N group. + +The top-N is computed once over the entire window — not per bucket. A flow that's huge for 5 minutes and absent the rest of the time may not make the top-N if a steady mid-volume flow accumulates more total bytes over the same window. If you want to see those bursts, narrow the time range or filter to the conversation you're investigating. + +## Bucket size + +The view auto-picks a bucket size based on the time range: + +| Time range | Tier used | Bucket size | +|---|---|---| +| ≤ ~100 minutes | 1-minute | 60 seconds (the floor) | +| 100 minutes to ~8h20m | 5-minute | 300 seconds | +| ≥ ~8h20m | 1-hour | 3600 seconds | + +The rule: pick the coarsest tier where the time window contains at least 100 buckets, with bucket size `max(tier_bucket, 60)`. For longer ranges, the bucket grows in proportion to keep the chart readable (capped at 500 buckets total). + +The window is **rounded outward** to align with bucket boundaries — your "11:23:00 to 11:48:00" request may render as "11:23:00 to 11:48:30" if the bucket size doesn't divide your range evenly. This is intentional; it ensures every record in your window is reachable. + +### Sub-minute zoom + +The minimum bucket size is **60 seconds**. Zoom in past one minute and the chart silently widens to 60-second buckets. There's no warning — sub-minute jitter just smooths out. For sub-second analysis, flow data is the wrong tool ([microbursts are invisible](/docs/network-flows/anti-patterns)). + +## What forces tier 0 (raw) + +Some queries can't use the rollup tiers. They drop to tier 0 and inherit raw-tier retention: + +- Filtering or grouping by `SRC_ADDR`, `DST_ADDR`, `SRC_PORT`, `DST_PORT`, or any geo city / latitude / longitude field +- Any non-empty full-text search + +In those cases the 100-bucket rule still applies, but the source tier is tier 0. Time depth is bounded by raw-tier retention (default: shared 10GB / 7d budget across all tiers — almost always less than 7 days for a busy collector). + +If you've been working at a higher tier and add an IP filter, the time depth on your chart may suddenly shrink — that's the tier switch. + +## "No data" buckets + +Buckets that received no contributing records render as zero. There's no special "missing data" indicator on the chart — the plot is flat at zero in those regions. + +That includes the case where the time range crosses the retention boundary of a tier. Tier 0 (raw) holds the most recent data; older fragments fall back to coarser tiers when available, and emptiness when no tier has the span. + +The dashboard's diagnostic side-panels surface tier coverage in the response stats (`query_tier`, `query_files`, etc.), but the chart itself doesn't visually distinguish "no data" from "zero". + +## Group overflow + +Same overflow semantics as the Sankey + Table view. If your aggregation produces more than `query_max_groups` (default 50 000) distinct group tuples, the surplus is folded into a synthetic `__overflow__` group, which appears as one of the chart's dimensions. Look for the warning in the response stats; narrow your filter or reduce the group-by depth to avoid it. + +## What controls are available + +Same controls as the other views: + +- **Time range** — Netdata's global time picker +- **Filters** — facet selections + autocomplete + full-text search (in the filter ribbon) +- **Top-N** — 25 / 50 / 100 / 200 / 500 +- **Sort by** — bytes or packets (determines what "top" means and what units the chart uses) +- **Group-by fields** — same as Sankey, 1-10 fields. The chart shows one stacked dimension per surviving top-N group + +The default group-by is `Source ASN → Protocol → Destination ASN`, same as Sankey + Table. + +## Things that go wrong + +- **Bursty flow not in top-N.** Top-N is over the whole window. Narrow the time range or filter to that conversation. +- **Sub-minute zoom doesn't render finer.** The 60-second floor is hard. For finer detail, use packet capture. +- **Wide range plus IP filter shows less than expected.** IP filter forced tier 0; raw retention is your bound. +- **Window appears to extend slightly beyond what you asked.** Bucket alignment rounds outward. +- **`__overflow__` shows up as the biggest dimension.** Your group-by is producing more distinct tuples than `query_max_groups` (50 000). Narrow the filter or drop a high-cardinality group-by field. + +## What's next + +- [Sankey and Table](/docs/network-flows/visualization/sankey-and-table) — The default view; same aggregation, point-in-time. +- [Retention and Querying](/docs/network-flows/retention-and-querying) — How tiers map to time ranges. +- [Filters and Facets](/docs/network-flows/visualization/filters-and-facets) — Narrowing the data. +- [Anti-patterns](/docs/network-flows/anti-patterns) — Why time-shifted comparison beats absolute thresholds. diff --git a/docs/Network Flows/Visualization/_category_.json b/docs/Network Flows/Visualization/_category_.json new file mode 100644 index 0000000000..0456cade26 --- /dev/null +++ b/docs/Network Flows/Visualization/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Visualization", + "position": 70 +} diff --git a/docs/Security and Privacy Design/Security and Privacy Design.mdx b/docs/Security and Privacy Design/Security and Privacy Design.mdx index df7ebb6867..92450d1fdc 100644 --- a/docs/Security and Privacy Design/Security and Privacy Design.mdx +++ b/docs/Security and Privacy Design/Security and Privacy Design.mdx @@ -3,7 +3,7 @@ custom_edit_url: "https://github.com/netdata/netdata/edit/master/docs/security-a sidebar_label: "Security and Privacy Design" learn_status: "Published" learn_rel_path: "Security and Privacy Design" -sidebar_position: "160" +sidebar_position: "170" learn_link: "https://learn.netdata.cloud/docs/security-and-privacy-design" slug: "/security-and-privacy-design" --- diff --git a/docs/Welcome to Netdata/Monitor Anything.mdx b/docs/Welcome to Netdata/Monitor Anything.mdx index cb1b56b51b..a2d019e92a 100644 --- a/docs/Welcome to Netdata/Monitor Anything.mdx +++ b/docs/Welcome to Netdata/Monitor Anything.mdx @@ -274,6 +274,7 @@ Need a dedicated integration? [Submit a feature request](https://github.com/netd | [Hitron CODA Cable Modem](/docs/collecting-metrics/collectors/networking/hitron-coda-cable-modem) | Track Hitron CODA cable modem metrics for optimized internet connectivity and performance. | | [InfiniBand](/docs/collecting-metrics/collectors/networking/infiniband) | This integration monitors InfiniBand network inteface statistics. | | [IP Virtual Server](/docs/collecting-metrics/collectors/networking/ip-virtual-server) | This integration monitors IP Virtual Server statistics | +| [IPFIX](/docs/network-flows/sources/ipfix) | Collects IPFIX (NetFlow v10) records from one or more exporters and stores them in tiered journal files. | | [ipfw](/docs/collecting-metrics/collectors/networking/ipfw) | Collect information about FreeBSD firewall. | | [IPv6 Socket Statistics](/docs/collecting-metrics/collectors/networking/ipv6-socket-statistics) | This integration provides IPv6 socket statistics. | | [ISC DHCP](/docs/collecting-metrics/collectors/networking/isc-dhcp) | This collector monitors ISC DHCP lease usage by reading the DHCP client lease database (dhcpd.leases). | @@ -288,6 +289,7 @@ Need a dedicated integration? [Submit a feature request](https://github.com/netd | [net.inet6.ip6.stats](/docs/collecting-metrics/collectors/networking/net.inet6.ip6.stats) | Collect information abou IPv6 stats. | | [net.isr](/docs/collecting-metrics/collectors/networking/net.isr) | Collect information about system softnet stat. | | [Netfilter](/docs/collecting-metrics/collectors/networking/netfilter) | Monitor Netfilter metrics for optimal packet filtering and manipulation. | +| [NetFlow](/docs/network-flows/sources/netflow) | Collects NetFlow v5, v7, and v9 records from one or more exporters (routers, switches, firewalls) and stores them in tiered journal files. | | [Network Connections](/docs/collecting-metrics/collectors/networking/network-connections) | This plugin reads the system's socket tables to enumerate all active network connections, including TCP and UDP sockets in all states, for both IPv4 and IPv6. | | [Network interfaces](/docs/collecting-metrics/collectors/networking/network-interfaces) | Monitor network interface metrics about bandwidth, state, errors and more. | | [Network statistics](/docs/collecting-metrics/collectors/networking/network-statistics) | This integration provides metrics from the `netstat`, `snmp` and `snmp6` modules. | @@ -305,6 +307,7 @@ Need a dedicated integration? [Submit a feature request](https://github.com/netd | [PowerDNS Recursor](/docs/collecting-metrics/collectors/networking/powerdns-recursor) | This collector monitors PowerDNS Recursor instances. | | [RIPE Atlas](/docs/collecting-metrics/collectors/networking/ripe-atlas) | Keep tabs on RIPE Atlas Internet measurement platform metrics for efficient network monitoring and performance. | | [SCTP Statistics](/docs/collecting-metrics/collectors/networking/sctp-statistics) | This integration provides statistics about the Stream Control Transmission Protocol (SCTP). | +| [sFlow](/docs/network-flows/sources/sflow) | Collects sFlow v5 datagrams from one or more agents and stores them in tiered journal files. | | [SNMP devices](/docs/collecting-metrics/collectors/networking/snmp-devices) | This collector discovers and monitors any SNMP-enabled network device. | | [Socket statistics](/docs/collecting-metrics/collectors/networking/socket-statistics) | This integration provides socket statistics. | | [SoftEther VPN Server](/docs/collecting-metrics/collectors/networking/softether-vpn-server) | Monitor SoftEther VPN Server metrics for efficient virtual private network (VPN) management and performance. | diff --git a/ingest/generated_map.yaml b/ingest/generated_map.yaml index 90e0e33bc6..7fa979ea4b 100644 --- a/ingest/generated_map.yaml +++ b/ingest/generated_map.yaml @@ -5739,6 +5739,330 @@ description: null meta_yaml: .nan message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/README.md + sidebar_label: Overview + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: Collect, enrich, and visualize NetFlow, IPFIX, and sFlow data with + the Netdata Agent. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/installation.md + sidebar_label: Installation + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/quick-start.md + sidebar_label: Quick Start + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: Get network flow monitoring running in five minutes. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/bio-rd_-_ripe_ris.md + sidebar_label: bio-rd / RIPE RIS + learn_status: Published + learn_rel_path: Network Flows/BGP Routing + keywords: '[''bioris'', ''bio-rd'', ''ripe ris'', ''bgp'', ''grpc'', ''route information + service'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/bmp_bgp_monitoring_protocol.md + sidebar_label: BMP (BGP Monitoring Protocol) + learn_status: Published + learn_rel_path: Network Flows/BGP Routing + keywords: '[''bmp'', ''bgp'', ''rfc 7854'', ''route monitoring'', ''cisco'', ''juniper'', + ''frr'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/custom_mmdb_database.md + sidebar_label: Custom MMDB Database + learn_status: Published + learn_rel_path: Network Flows/IP Intelligence + keywords: '[''mmdb'', ''custom database'', ''bring your own'', ''ipinfo'', ''ip + intelligence'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/db-ip_ip_intelligence.md + sidebar_label: DB-IP IP Intelligence + learn_status: Published + learn_rel_path: Network Flows/IP Intelligence + keywords: '[''geoip'', ''asn'', ''dbip'', ''db-ip'', ''mmdb'', ''ip intelligence'', + ''flow enrichment'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/iptoasn.md + sidebar_label: IPtoASN + learn_status: Published + learn_rel_path: Network Flows/IP Intelligence + keywords: '[''iptoasn'', ''asn'', ''bgp'', ''public asn'', ''ip intelligence'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/maxmind_geoip_-_geolite2.md + sidebar_label: MaxMind GeoIP / GeoLite2 + learn_status: Published + learn_rel_path: Network Flows/IP Intelligence + keywords: '[''maxmind'', ''geoip2'', ''geolite2'', ''geoip'', ''asn'', ''mmdb'', + ''ip intelligence'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/aws_ip_ranges.md + sidebar_label: AWS IP Ranges + learn_status: Published + learn_rel_path: Network Flows/Network Identity Sources + keywords: '[''aws'', ''amazon'', ''cloud'', ''ip ranges'', ''vpc'', ''ec2'', ''prefix + list'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/azure_ip_ranges.md + sidebar_label: Azure IP Ranges + learn_status: Published + learn_rel_path: Network Flows/Network Identity Sources + keywords: '[''azure'', ''microsoft'', ''cloud'', ''ip ranges'', ''service tags'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/gcp_ip_ranges.md + sidebar_label: GCP IP Ranges + learn_status: Published + learn_rel_path: Network Flows/Network Identity Sources + keywords: '[''gcp'', ''google cloud'', ''cloud'', ''ip ranges'', ''prefix list'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/generic_json-over-http_ipam.md + sidebar_label: Generic JSON-over-HTTP IPAM + learn_status: Published + learn_rel_path: Network Flows/Network Identity Sources + keywords: '[''ipam'', ''cmdb'', ''infoblox'', ''bluecat'', ''phpipam'', ''custom'', + ''prefix list'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/netbox.md + sidebar_label: NetBox + learn_status: Published + learn_rel_path: Network Flows/Network Identity Sources + keywords: '[''netbox'', ''ipam'', ''dcim'', ''source of truth'', ''prefix list'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/ipfix.md + sidebar_label: IPFIX + learn_status: Published + learn_rel_path: Network Flows/Sources + keywords: '[''ipfix'', ''netflow v10'', ''flows'', ''network flows'', ''flow collector'', + ''rfc 7011'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/netflow.md + sidebar_label: NetFlow + learn_status: Published + learn_rel_path: Network Flows/Sources + keywords: '[''netflow'', ''netflow v5'', ''netflow v7'', ''netflow v9'', ''cisco'', + ''flows'', ''network flows'', ''flow collector'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/sflow.md + sidebar_label: sFlow + learn_status: Published + learn_rel_path: Network Flows/Sources + keywords: '[''sflow'', ''sflow v5'', ''sampled flows'', ''flows'', ''network flows'', + ''flow collector'', ''inmon'']' + description: .nan + meta_yaml: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/metadata.yaml + message: DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE FLOWS' metadata.yaml + FILE +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/configuration.md + sidebar_label: Configuration + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: Full reference for netflow.yaml configuration options. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/ip-intelligence.md + sidebar_label: IP Intelligence + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: How GeoIP and ASN data combine to enrich flow records with country, + city, and AS-name labels. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/bgp-routing.md + sidebar_label: BGP Routing + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: How live BGP routes (BMP, BioRIS) feed AS path, communities, and next-hop + into flow records. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/network-identity.md + sidebar_label: Network Identity + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: How external feeds (cloud IP ranges, IPAM systems) label your network + prefixes. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/static-metadata.md + sidebar_label: Static Metadata + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/classifiers.md + sidebar_label: Classifiers + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/asn-resolution.md + sidebar_label: ASN Resolution + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/decapsulation.md + sidebar_label: Decapsulation + learn_status: Published + learn_rel_path: Network Flows/Enrichment Concepts + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/field-reference.md + sidebar_label: Field Reference + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: Complete list of flow fields with per-protocol availability. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/summary-sankey.md + sidebar_label: Sankey and Table + learn_status: Published + learn_rel_path: Network Flows/Visualization + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/time-series.md + sidebar_label: Time-Series + learn_status: Published + learn_rel_path: Network Flows/Visualization + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/maps-globe.md + sidebar_label: Maps and Globe + learn_status: Published + learn_rel_path: Network Flows/Visualization + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/filters-facets.md + sidebar_label: Filters and Facets + learn_status: Published + learn_rel_path: Network Flows/Visualization + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/dashboard-cards.md + sidebar_label: Plugin Health Charts + learn_status: Published + learn_rel_path: Network Flows/Visualization + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/retention-querying.md + sidebar_label: Retention and Querying + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/sizing-capacity.md + sidebar_label: Sizing and Capacity Planning + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: Storage estimation, memory guidance, and performance benchmarks. + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/validation.md + sidebar_label: Validation and Data Quality + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/investigation-playbooks.md + sidebar_label: Investigation Playbooks + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/anti-patterns.md + sidebar_label: Anti-patterns + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/troubleshooting.md + sidebar_label: Troubleshooting + learn_status: Published + learn_rel_path: Network Flows + keywords: null + description: null + meta_yaml: .nan + message: .nan - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/collectors/systemd-journal.plugin/README.md sidebar_label: Systemd Journal Plugin Reference learn_status: Published diff --git a/ingest/one_commit_back_file-dict.yaml b/ingest/one_commit_back_file-dict.yaml index 0f3930351e..88d8c4e174 100644 --- a/ingest/one_commit_back_file-dict.yaml +++ b/ingest/one_commit_back_file-dict.yaml @@ -1272,6 +1272,34 @@ learn_path: /docs/collecting-metrics/secrets-management/secret-stores/aws-secrets-manager - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/go/plugin/agent/secrets/secretstore/backends/gcp/README.md learn_path: /docs/collecting-metrics/secrets-management/secret-stores/google-secret-manager +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/aws_ip_ranges.md + learn_path: /docs/network-flows/network-identity-sources/aws-ip-ranges +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/ipfix.md + learn_path: /docs/network-flows/sources/ipfix +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/netflow.md + learn_path: /docs/network-flows/sources/netflow +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/sflow.md + learn_path: /docs/network-flows/sources/sflow +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/azure_ip_ranges.md + learn_path: /docs/network-flows/network-identity-sources/azure-ip-ranges +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/generic_json-over-http_ipam.md + learn_path: /docs/network-flows/network-identity-sources/generic-json-over-http-ipam +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/custom_mmdb_database.md + learn_path: /docs/network-flows/ip-intelligence/custom-mmdb-database +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/netbox.md + learn_path: /docs/network-flows/network-identity-sources/netbox +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/maxmind_geoip_-_geolite2.md + learn_path: /docs/network-flows/ip-intelligence/maxmind-geoip-geolite2 +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/gcp_ip_ranges.md + learn_path: /docs/network-flows/network-identity-sources/gcp-ip-ranges +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/db-ip_ip_intelligence.md + learn_path: /docs/network-flows/ip-intelligence/db-ip-ip-intelligence +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/iptoasn.md + learn_path: /docs/network-flows/ip-intelligence/iptoasn +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/bio-rd_-_ripe_ris.md + learn_path: /docs/network-flows/bgp-routing/bio-rd-ripe-ris +- custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netflow-plugin/integrations/bmp_bgp_monitoring_protocol.md + learn_path: /docs/network-flows/bgp-routing/bmp-bgp-monitoring-protocol - custom_edit_url: https://github.com/netdata/netdata/edit/master/src/crates/netdata-otel/otel-plugin/README.md learn_path: /docs/collecting-metrics/opentelemetry/opentelemetry-metrics - custom_edit_url: https://github.com/netdata/netdata/edit/master/tests/health_mgmtapi/README.md @@ -1484,6 +1512,52 @@ learn_path: /docs/developer-and-contributor-corner/dynamic-configuration - custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/developer-and-contributor-corner/running-through-cf-tunnels.md learn_path: /docs/developer-and-contributor-corner/running-a-local-dashboard-through-cloudflare-tunnels +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/README.md + learn_path: /docs/network-flows/overview +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/field-reference.md + learn_path: /docs/network-flows/field-reference +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/anti-patterns.md + learn_path: /docs/network-flows/anti-patterns +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/troubleshooting.md + learn_path: /docs/network-flows/troubleshooting +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/sizing-capacity.md + learn_path: /docs/network-flows/sizing-and-capacity-planning +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/investigation-playbooks.md + learn_path: /docs/network-flows/investigation-playbooks +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/validation.md + learn_path: /docs/network-flows/validation-and-data-quality +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/quick-start.md + learn_path: /docs/network-flows/quick-start +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/installation.md + learn_path: /docs/network-flows/installation +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/configuration.md + learn_path: /docs/network-flows/configuration +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/retention-querying.md + learn_path: /docs/network-flows/retention-and-querying +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/classifiers.md + learn_path: /docs/network-flows/enrichment-concepts/classifiers +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/ip-intelligence.md + learn_path: /docs/network-flows/enrichment-concepts/ip-intelligence +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/static-metadata.md + learn_path: /docs/network-flows/enrichment-concepts/static-metadata +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/asn-resolution.md + learn_path: /docs/network-flows/enrichment-concepts/asn-resolution +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/decapsulation.md + learn_path: /docs/network-flows/enrichment-concepts/decapsulation +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/bgp-routing.md + learn_path: /docs/network-flows/enrichment-concepts/bgp-routing +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/enrichment/network-identity.md + learn_path: /docs/network-flows/enrichment-concepts/network-identity +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/filters-facets.md + learn_path: /docs/network-flows/visualization/filters-and-facets +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/summary-sankey.md + learn_path: /docs/network-flows/visualization/sankey-and-table +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/time-series.md + learn_path: /docs/network-flows/visualization/time-series +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/dashboard-cards.md + learn_path: /docs/network-flows/visualization/plugin-health-charts +- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/network-flows/visualization/maps-globe.md + learn_path: /docs/network-flows/visualization/maps-and-globe - custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/install/windows-release-channels.md learn_path: /docs/netdata-agent/installation/windows/switching-install-types-and-release-channels - custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/deployment-guides/README.md