Skip to content

Implement Sonic Switch Metrics in sonic-agent#95

Draft
peanball wants to merge 3 commits intomainfrom
enh/sonic-metrics
Draft

Implement Sonic Switch Metrics in sonic-agent#95
peanball wants to merge 3 commits intomainfrom
enh/sonic-metrics

Conversation

@peanball
Copy link
Contributor

@peanball peanball commented Mar 20, 2026

Proposed Changes

Implement the switch metrics on a Prometheus endpoint :9100/metrics, as described in EP-0003.

Notes to Reviewers

There are a few metrics that are borderline useful. Opinions are welcome:

  1. Metrics listed above have been filtered for Ethernet124 as interface. Each active SFP module will create such a set of metrics (32 or 52 ports respectively)!
  2. transceiver DOM thresholds are static for the lifetime of an SFP module
  3. histograms for packet sizes could be interesting but maybe not.
  4. SFP metadata (vendor, serial, etc.) is also stable over the lifetime of the SFP module.
  5. LLDP neighbor information. Might be collected in the operator, not via metrics

Fortunately, all of these are "simple" config entries that we can test drive and remove later if we see them as not useful and using too much metrics storage.

Example Prometheus Metrics Output

Example metrics, retrieved on swi1-wdf4g-3, limited to interface Ethernet124 and non-interface metrics. Long list, folded by default. Unfold to see.
admin@swi1-wdf4g-3:~$ curl -s localhost:9100/metrics | awk '!/Ethernet/ || /Ethernet124/'
# HELP sonic_scrape_duration_seconds Duration of the last metrics scrape in seconds
# TYPE sonic_scrape_duration_seconds gauge
sonic_scrape_duration_seconds 0.043446274
# HELP sonic_switch_info Device metadata as labels, always 1
# TYPE sonic_switch_info gauge
sonic_switch_info{asic="broadcom",firmware="11",hwsku="Accton-AS7726-32X",mac="94:ef:97:94:5e:52",platform="x86_64-accton_as7726_32x-r0"} 1
# HELP sonic_switch_interface_admin_state Admin state of the interface (1=up, 0=down)
# TYPE sonic_switch_interface_admin_state gauge
sonic_switch_interface_admin_state{interface="Ethernet124"} 1
# HELP sonic_switch_interface_anomaly_packets_total Total anomalous packets
# TYPE sonic_switch_interface_anomaly_packets_total counter
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="fragments"} 0
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="jabbers"} 0
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="rx_oversize"} 0
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="tx_oversize"} 0
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="undersize"} 0
sonic_switch_interface_anomaly_packets_total{interface="Ethernet124",type="unknown_protos"} 0
# HELP sonic_switch_interface_bytes_total Total bytes transferred
# TYPE sonic_switch_interface_bytes_total counter
sonic_switch_interface_bytes_total{direction="rx",interface="Ethernet124"} 6.57158677e+08
sonic_switch_interface_bytes_total{direction="tx",interface="Ethernet124"} 1.2311851e+08
# HELP sonic_switch_interface_discards_total Total interface discards
# TYPE sonic_switch_interface_discards_total counter
sonic_switch_interface_discards_total{direction="rx",interface="Ethernet124"} 46588
sonic_switch_interface_discards_total{direction="tx",interface="Ethernet124"} 0
# HELP sonic_switch_interface_dropped_packets_total Total SAI-level dropped packets
# TYPE sonic_switch_interface_dropped_packets_total counter
sonic_switch_interface_dropped_packets_total{direction="rx",interface="Ethernet124"} 0
sonic_switch_interface_dropped_packets_total{direction="tx",interface="Ethernet124"} 0
# HELP sonic_switch_interface_errors_total Total interface errors
# TYPE sonic_switch_interface_errors_total counter
sonic_switch_interface_errors_total{direction="rx",interface="Ethernet124"} 0
sonic_switch_interface_errors_total{direction="tx",interface="Ethernet124"} 0
# HELP sonic_switch_interface_fec_frames_total Total FEC frames
# TYPE sonic_switch_interface_fec_frames_total counter
sonic_switch_interface_fec_frames_total{interface="Ethernet124",type="correctable"} 0
sonic_switch_interface_fec_frames_total{interface="Ethernet124",type="symbol_errors"} 0
sonic_switch_interface_fec_frames_total{interface="Ethernet124",type="uncorrectable"} 0
# HELP sonic_switch_interface_neighbor_info LLDP neighbor metadata as labels, always 1
# TYPE sonic_switch_interface_neighbor_info gauge
sonic_switch_interface_neighbor_info{interface="Ethernet124",neighbor_mac="94:ef:97:94:21:42",neighbor_name="swi2-wdf4g-2",neighbor_port="Ethernet8"} 1
# HELP sonic_switch_interface_oper_state Operational state of the interface (1=up, 0=down)
# TYPE sonic_switch_interface_oper_state gauge
sonic_switch_interface_oper_state{interface="Ethernet124"} 1
# HELP sonic_switch_interface_packets_total Total packets transferred
# TYPE sonic_switch_interface_packets_total counter
sonic_switch_interface_packets_total{direction="rx",interface="Ethernet124",type="broadcast"} 0
sonic_switch_interface_packets_total{direction="rx",interface="Ethernet124",type="multicast"} 162981
sonic_switch_interface_packets_total{direction="rx",interface="Ethernet124",type="non_unicast"} 162981
sonic_switch_interface_packets_total{direction="rx",interface="Ethernet124",type="unicast"} 1.199271e+06
sonic_switch_interface_packets_total{direction="tx",interface="Ethernet124",type="broadcast"} 0
sonic_switch_interface_packets_total{direction="tx",interface="Ethernet124",type="multicast"} 162982
sonic_switch_interface_packets_total{direction="tx",interface="Ethernet124",type="non_unicast"} 162982
sonic_switch_interface_packets_total{direction="tx",interface="Ethernet124",type="unicast"} 953331
# HELP sonic_switch_interface_pfc_packets_total Total PFC packets
# TYPE sonic_switch_interface_pfc_packets_total counter
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="0"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="1"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="2"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="3"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="4"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="5"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="6"} 0
sonic_switch_interface_pfc_packets_total{direction="rx",interface="Ethernet124",priority="7"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="0"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="1"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="2"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="3"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="4"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="5"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="6"} 0
sonic_switch_interface_pfc_packets_total{direction="tx",interface="Ethernet124",priority="7"} 0
# HELP sonic_switch_interface_queue_length Current output queue length
# TYPE sonic_switch_interface_queue_length gauge
sonic_switch_interface_queue_length{interface="Ethernet124"} 0
# HELP sonic_switch_interface_rx_packet_size_bytes RX packet size distribution
# TYPE sonic_switch_interface_rx_packet_size_bytes histogram
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="64"} 0
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="127"} 925589
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="255"} 926310
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="511"} 967493
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="1023"} 967705
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="1518"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="2047"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="4095"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="9216"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="16383"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_bucket{interface="Ethernet124",le="+Inf"} 1.362252e+06
sonic_switch_interface_rx_packet_size_bytes_sum{interface="Ethernet124"} 0
sonic_switch_interface_rx_packet_size_bytes_count{interface="Ethernet124"} 1.362252e+06
# HELP sonic_switch_interface_tx_packet_size_bytes TX packet size distribution
# TYPE sonic_switch_interface_tx_packet_size_bytes histogram
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="64"} 1
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="127"} 1.063977e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="255"} 1.075338e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="511"} 1.116235e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="1023"} 1.116245e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="1518"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="2047"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="4095"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="9216"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="16383"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_bucket{interface="Ethernet124",le="+Inf"} 1.116313e+06
sonic_switch_interface_tx_packet_size_bytes_sum{interface="Ethernet124"} 0
sonic_switch_interface_tx_packet_size_bytes_count{interface="Ethernet124"} 1.116313e+06
# HELP sonic_switch_interfaces_total Number of interfaces by operational status
# TYPE sonic_switch_interfaces_total gauge
sonic_switch_interfaces_total{operational_status="down"} 30
sonic_switch_interfaces_total{operational_status="up"} 2
# HELP sonic_switch_ports_total Total number of physical ports
# TYPE sonic_switch_ports_total gauge
sonic_switch_ports_total 32
# HELP sonic_switch_ready Whether the switch is ready (1) or not (0)
# TYPE sonic_switch_ready gauge
sonic_switch_ready 1
# HELP sonic_switch_temperature_celsius Chassis temperature sensor reading in Celsius
# TYPE sonic_switch_temperature_celsius gauge
sonic_switch_temperature_celsius{sensor="CB_temp(0x4B)"} 27
sonic_switch_temperature_celsius{sensor="CPU_Core_0_temp"} 37
sonic_switch_temperature_celsius{sensor="CPU_Core_1_temp"} 37
sonic_switch_temperature_celsius{sensor="CPU_Core_2_temp"} 37
sonic_switch_temperature_celsius{sensor="CPU_Core_3_temp"} 37
sonic_switch_temperature_celsius{sensor="CPU_Package_temp"} 37
sonic_switch_temperature_celsius{sensor="FB_temp(0x4C)"} 32.5
sonic_switch_temperature_celsius{sensor="MB_FrontMAC_temp(0x49)"} 33.5
sonic_switch_temperature_celsius{sensor="MB_LeftCenter_temp(0x4A)"} 28
sonic_switch_temperature_celsius{sensor="MB_RearMAC_temp(0x48)"} 34.5
sonic_switch_temperature_celsius{sensor="PSU-1 temp sensor 1"} 46
sonic_switch_temperature_celsius{sensor="PSU-2 temp sensor 1"} 44
# HELP sonic_switch_temperature_high_threshold_celsius Chassis temperature sensor high threshold in Celsius
# TYPE sonic_switch_temperature_high_threshold_celsius gauge
sonic_switch_temperature_high_threshold_celsius{sensor="CB_temp(0x4B)"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="CPU_Core_0_temp"} 82
sonic_switch_temperature_high_threshold_celsius{sensor="CPU_Core_1_temp"} 82
sonic_switch_temperature_high_threshold_celsius{sensor="CPU_Core_2_temp"} 82
sonic_switch_temperature_high_threshold_celsius{sensor="CPU_Core_3_temp"} 82
sonic_switch_temperature_high_threshold_celsius{sensor="CPU_Package_temp"} 82
sonic_switch_temperature_high_threshold_celsius{sensor="FB_temp(0x4C)"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="MB_FrontMAC_temp(0x49)"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="MB_LeftCenter_temp(0x4A)"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="MB_RearMAC_temp(0x48)"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="PSU-1 temp sensor 1"} 80
sonic_switch_temperature_high_threshold_celsius{sensor="PSU-2 temp sensor 1"} 80
# HELP sonic_switch_temperature_warning Chassis temperature sensor warning status (1=warning, 0=ok)
# TYPE sonic_switch_temperature_warning gauge
sonic_switch_temperature_warning{sensor="CB_temp(0x4B)"} 0
sonic_switch_temperature_warning{sensor="CPU_Core_0_temp"} 0
sonic_switch_temperature_warning{sensor="CPU_Core_1_temp"} 0
sonic_switch_temperature_warning{sensor="CPU_Core_2_temp"} 0
sonic_switch_temperature_warning{sensor="CPU_Core_3_temp"} 0
sonic_switch_temperature_warning{sensor="CPU_Package_temp"} 0
sonic_switch_temperature_warning{sensor="FB_temp(0x4C)"} 0
sonic_switch_temperature_warning{sensor="MB_FrontMAC_temp(0x49)"} 0
sonic_switch_temperature_warning{sensor="MB_LeftCenter_temp(0x4A)"} 0
sonic_switch_temperature_warning{sensor="MB_RearMAC_temp(0x48)"} 0
sonic_switch_temperature_warning{sensor="PSU-1 temp sensor 1"} 0
sonic_switch_temperature_warning{sensor="PSU-2 temp sensor 1"} 0
# HELP sonic_switch_transceiver_dom_rx_power_dbm Transceiver RX power in dBm
# TYPE sonic_switch_transceiver_dom_rx_power_dbm gauge
sonic_switch_transceiver_dom_rx_power_dbm{interface="Ethernet124",lane="1"} -0.6248210798265337
sonic_switch_transceiver_dom_rx_power_dbm{interface="Ethernet124",lane="2"} 0.19116290447072778
sonic_switch_transceiver_dom_rx_power_dbm{interface="Ethernet124",lane="3"} 0.29789470831855613
sonic_switch_transceiver_dom_rx_power_dbm{interface="Ethernet124",lane="4"} -0.24108863598207259
# HELP sonic_switch_transceiver_dom_temperature_celsius Transceiver temperature in Celsius
# TYPE sonic_switch_transceiver_dom_temperature_celsius gauge
sonic_switch_transceiver_dom_temperature_celsius{interface="Ethernet124"} 37.645
# HELP sonic_switch_transceiver_dom_threshold Transceiver DOM threshold value
# TYPE sonic_switch_transceiver_dom_threshold gauge
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="alarm",sensor="rx_power"} 3.5
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="alarm",sensor="temperature"} 75
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="alarm",sensor="tx_bias"} 90
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="alarm",sensor="tx_power"} 3.5
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="alarm",sensor="voltage"} 3.63
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="warning",sensor="rx_power"} 2.5
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="warning",sensor="temperature"} 70
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="warning",sensor="tx_bias"} 80
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="warning",sensor="tx_power"} 2.5
sonic_switch_transceiver_dom_threshold{direction="high",interface="Ethernet124",level="warning",sensor="voltage"} 3.465
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="alarm",sensor="rx_power"} -12.503
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="alarm",sensor="temperature"} -5
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="alarm",sensor="tx_bias"} 10
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="alarm",sensor="tx_power"} -7.501
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="alarm",sensor="voltage"} 3.05
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="warning",sensor="rx_power"} -11.5
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="warning",sensor="temperature"} 0
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="warning",sensor="tx_bias"} 20
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="warning",sensor="tx_power"} -6.499
sonic_switch_transceiver_dom_threshold{direction="low",interface="Ethernet124",level="warning",sensor="voltage"} 3.135
# HELP sonic_switch_transceiver_dom_tx_bias_milliamps Transceiver TX bias current in milliamps
# TYPE sonic_switch_transceiver_dom_tx_bias_milliamps gauge
sonic_switch_transceiver_dom_tx_bias_milliamps{interface="Ethernet124",lane="1"} 50.564
sonic_switch_transceiver_dom_tx_bias_milliamps{interface="Ethernet124",lane="2"} 51.556
sonic_switch_transceiver_dom_tx_bias_milliamps{interface="Ethernet124",lane="3"} 51.828
sonic_switch_transceiver_dom_tx_bias_milliamps{interface="Ethernet124",lane="4"} 53.384
# HELP sonic_switch_transceiver_dom_voltage_volts Transceiver supply voltage in Volts
# TYPE sonic_switch_transceiver_dom_voltage_volts gauge
sonic_switch_transceiver_dom_voltage_volts{interface="Ethernet124"} 3.264
# HELP sonic_switch_transceiver_info Transceiver static metadata as labels, always 1
# TYPE sonic_switch_transceiver_info gauge
sonic_switch_transceiver_info{interface="Ethernet124",model="S-QSFP-100G-CWDM",serial="F7Z2G4L         ",type="QSFP28 or later",vendor="SWITCH2OPEN     "} 1
# HELP sonic_switch_transceiver_rxlos Transceiver RX loss of signal (1=loss, 0=ok)
# TYPE sonic_switch_transceiver_rxlos gauge
sonic_switch_transceiver_rxlos{interface="Ethernet124",lane="1"} 0
sonic_switch_transceiver_rxlos{interface="Ethernet124",lane="2"} 0
sonic_switch_transceiver_rxlos{interface="Ethernet124",lane="3"} 0
sonic_switch_transceiver_rxlos{interface="Ethernet124",lane="4"} 0
# HELP sonic_switch_transceiver_txfault Transceiver TX fault (1=fault, 0=ok)
# TYPE sonic_switch_transceiver_txfault gauge
sonic_switch_transceiver_txfault{interface="Ethernet124",lane="1"} 0
sonic_switch_transceiver_txfault{interface="Ethernet124",lane="2"} 0
sonic_switch_transceiver_txfault{interface="Ethernet124",lane="3"} 0
sonic_switch_transceiver_txfault{interface="Ethernet124",lane="4"} 0

Initial implementation exposing transceiver metrics, error rates, temperatures, LLDP neighbors,
static metadata.
Replace 20 individual counter metrics for packet size buckets with
two native Prometheus histograms (RX/TX). This maps SAI port stat
fields to cumulative histogram buckets via a new `histogram`
transform in the metrics config.
@github-actions github-actions bot added size/XXL documentation Improvements or additions to documentation enhancement New feature or request labels Mar 20, 2026
@hardikdr hardikdr added the area/metal-automation Automation processes within the Metal project. label Mar 21, 2026
@hardikdr hardikdr added this to Roadmap Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/metal-automation Automation processes within the Metal project. documentation Improvements or additions to documentation enhancement New feature or request size/XXL

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants