Skip to content

want to read PMBus status registers via MGS #2463

@hawkw

Description

@hawkw

This is motivated by two different needs:

  1. we recently had a rectifier die on us at a customer site (customer-support#898), and we were not able to determine if the rectifier was so dead that we were unable to talk PMBus at it or not, because it was in a state where the PSC sequencer does not automatically attempt to read and record the PMBus status. It would have been nice to be able to use faux-mgs to talk to it while debugging.
  2. for automated fault management in the control plane, reading the status registers of a PMBus device is an important health endpoint (as described in RFD 589) for that device.

It would be nice if there was a way to read the contents of all of the standard PMBus status registers over the management network. Ideally, this would happen automagically via codegen for any device with a

power = { /* ... */ pmbus = true }

in its declaration in the TOML file.

This could either be done using a new ComponentDetails entry for PMBus status, or a separate MGS API that's PMBus-specific. I'm a bit on the fence as to which is better --- personally, I think hanging it off of component details might be a bit of a shame as it means we would do a bunch more I2C traffic when trying to access other component details from that device --- every PMbus thing already has sensor data in its ComponentDetails as well. But, it is kinda the standard interface for most of this sort of thing.

Since status registers are paged in PMBus, we would want to ensure that we can make it clear to the caller which rail a status register value refers to. This could be done either by having the control plane request a specific rail when reading, or by having the caller read them by device ID and reading every rail on the device, but including the rail names in the response. Doing it by rail kinda seems nicer, especially since ereports for power-related faults will include the rail name, but this would require control-plane-agent to maintain an additional index of devices by rail name in addition to devices by refdes, which probably eats up a bunch more flash/RAM, and means we can't just hang it off ComponentDetails.

Personally, I think it would also make sense to add a new DeviceCapabilities flag for PMBus things, to advertise that they have the standard set of PMBus status registers in addition to sensor values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ObserverPSC + 5.5 kW fun.fault-managementEverything related to the Oxide's Fault Management architecture implementationpscRelated to the power shelf controllerservice processorRelated to the service processor.
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions