Add collector for SR-IOV network Virtual Function statistics#3544
Add collector for SR-IOV network Virtual Function statistics#3544aharivel wants to merge 1 commit intoprometheus:masterfrom
Conversation
b5ea870 to
f5c2828
Compare
|
Are the mac and pci addresses somewhat stable? Otherwise I'd be worried about the cardinality |
|
@discordianfish PCI address cardinality is bounded and stable (hardware topology). MAC cardinality risk is real but contained to the info gauge where stale series age out naturally — it was already there before this change. |
collector/netvf_linux.go
Outdated
|
|
||
| // parseVFInfo extracts VF information from link messages for testing. | ||
| // sysClassPath is the path to the sysfs class directory used to resolve VF PCI addresses. | ||
| func parseVFInfo(links []rtnetlink.LinkMessage, filter *deviceFilter, logger *slog.Logger, sysClassPath string) []vfMetrics { |
There was a problem hiding this comment.
Is this used anywhere expect the tests?
Also all these function ideally should go into procfs if needed
There was a problem hiding this comment.
Indeed, resolveVFPCIAddress is the sole function that reads sysfs — everything else goes through rtnetlink.
For parseVFInfo: I'll keep it as a test utility since it allows unit testing the VF parsing logic without a live netlink socket. WDYT ?
For resolveVFPCIAddress: agreed it would be a natural fit in prometheus/procfs. I can open a follow-up issue/PR there to contribute it upstream — would you prefer that happens before merging this, or as a separate follow-up?
There was a problem hiding this comment.
Yeah lets move before merging.
If parseVFInfo is only used in tests, you can also define it there
Add a new netvf collector that exposes SR-IOV network VF statistics and
configuration via rtnetlink. The collector queries netlink for
interfaces with Virtual Functions and exposes per-VF metrics:
- node_net_vf_info: VF configuration (MAC, VLAN, link state, spoof
check, trust, PCI address, NUMA node)
- node_net_vf_{receive,transmit}_{packets,bytes}_total: traffic counters
- node_net_vf_{broadcast,multicast}_packets_total: packet type counters
- node_net_vf_{receive,transmit}_dropped_total: drop counters
All metrics include a pci_address label resolved from the sysfs virtfn
symlink, enabling direct correlation with workloads that reference VFs
by PCI BDF address (e.g. OpenStack Nova, libvirt, DPDK).
All metrics also include a numa_node label resolved from the PF's PCI
device sysfs entry, enabling NUMA alignment verification and cross-NUMA
traffic ratio queries in PromQL.
The collector is disabled by default and can be enabled with
--collector.netvf. PF device filtering is supported via
--collector.netvf.device-include/exclude flags.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Bump github.com/jsimonetti/rtnetlink/v2 from 2.1.0 to 2.2.0 - this add the VF stats used for the next commit.
Add a new netvf collector that exposes SR-IOV network VF statistics and configuration via rtnetlink. The collector queries netlink for interfaces with Virtual Functions and exposes per-VF metrics:
check, trust, PCI address)
All metrics include a pci_address label resolved from the sysfs virtfn symlink, enabling direct correlation with workloads that reference VFs by PCI BDF address (e.g. OpenStack Nova, libvirt, DPDK).
Tested on MT2894 Family [ConnectX-6 Lx] and Ethernet Controller E810-XXV with VFs bound to both kernel driver and vfio-pci driver (for direct assignment to Virtual Machines).