fix: Add sysfs fallback for RDMA detection on InfiniBand interfaces#9
Conversation
anson627
commented
Dec 11, 2025
|
Welcome @anson627! |
|
Thanks for contributing @anson627 ! Am I understanding this correctly that once the upstream fix is made in Mellanox/rdmamap#15, we won't need a separate fix within DRANET? If yes, and assuming we can wait, we can try to get this upstreamed and then bump up the dependency with the fixed version |
sure, opened a PR Mellanox/rdmamap#16 and will try to get review from Mellanox upstream |
|
@anson627 I dont see that rdma issue has moved. Is this PR still required? |
|
@MikeZappa87 I may need some help to get review from nvidia on Mellanox/rdmamap#16 |
I reached out to him on slack to see if he can review this. |
|
@gauravkghildiyal this seems stuck in a rock and a hardplace. @anson627 if this PR merges and then the PR you have in the other repo merges, nothing breaks correct? |
yes, the current logic is fallback, I've tested with GB200 nodes on Azure |
@MikeZappa87 -- If this is a hardblocker for you, I think it's okay to merge this one. As you described, the change is backward compatible and implemented as a fallback. My intention earlier was to hopefully avoid any potential difference between the implementation in this PR (which checks the existence of a directory within In the interest of making some progress, I consider the implementation here to be "safe enough" (Feel free to LGTM as you find appropriate) /approve |
|
/assign |
My preference would be to have the PR merge in the other repo however, it looks like that repos last PR was over two years ago. I attempted to reach out to a couple maintainers but got nothing. @anson627 fix might be the permenant one. I will try and get the other merged however I will approve for now. |
|
/approve |
|
/lgtm Thanks |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anson627, aojea, gauravkghildiyal, MikeZappa87 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |