This is motivated by two different needs:
- we recently had a rectifier die on us at a customer site (customer-support#898), and we were not able to determine if the rectifier was so dead that we were unable to talk PMBus at it or not, because it was in a state where the PSC sequencer does not automatically attempt to read and record the PMBus status. It would have been nice to be able to use faux-mgs to talk to it while debugging.
- for automated fault management in the control plane, reading the status registers of a PMBus device is an important health endpoint (as described in RFD 589) for that device.
It would be nice if there was a way to read the contents of all of the standard PMBus status registers over the management network. Ideally, this would happen automagically via codegen for any device with a
power = { /* ... */ pmbus = true }
in its declaration in the TOML file.
This could either be done using a new ComponentDetails entry for PMBus status, or a separate MGS API that's PMBus-specific. I'm a bit on the fence as to which is better --- personally, I think hanging it off of component details might be a bit of a shame as it means we would do a bunch more I2C traffic when trying to access other component details from that device --- every PMbus thing already has sensor data in its ComponentDetails as well. But, it is kinda the standard interface for most of this sort of thing.
Since status registers are paged in PMBus, we would want to ensure that we can make it clear to the caller which rail a status register value refers to. This could be done either by having the control plane request a specific rail when reading, or by having the caller read them by device ID and reading every rail on the device, but including the rail names in the response. Doing it by rail kinda seems nicer, especially since ereports for power-related faults will include the rail name, but this would require control-plane-agent to maintain an additional index of devices by rail name in addition to devices by refdes, which probably eats up a bunch more flash/RAM, and means we can't just hang it off ComponentDetails.
Personally, I think it would also make sense to add a new DeviceCapabilities flag for PMBus things, to advertise that they have the standard set of PMBus status registers in addition to sensor values.
This is motivated by two different needs:
It would be nice if there was a way to read the contents of all of the standard PMBus status registers over the management network. Ideally, this would happen automagically via codegen for any device with a
in its declaration in the TOML file.
This could either be done using a new
ComponentDetailsentry for PMBus status, or a separate MGS API that's PMBus-specific. I'm a bit on the fence as to which is better --- personally, I think hanging it off of component details might be a bit of a shame as it means we would do a bunch more I2C traffic when trying to access other component details from that device --- every PMbus thing already has sensor data in itsComponentDetailsas well. But, it is kinda the standard interface for most of this sort of thing.Since status registers are paged in PMBus, we would want to ensure that we can make it clear to the caller which rail a status register value refers to. This could be done either by having the control plane request a specific rail when reading, or by having the caller read them by device ID and reading every rail on the device, but including the rail names in the response. Doing it by rail kinda seems nicer, especially since ereports for power-related faults will include the rail name, but this would require
control-plane-agentto maintain an additional index of devices by rail name in addition to devices by refdes, which probably eats up a bunch more flash/RAM, and means we can't just hang it offComponentDetails.Personally, I think it would also make sense to add a new
DeviceCapabilitiesflag for PMBus things, to advertise that they have the standard set of PMBus status registers in addition to sensor values.