Fix PinnedMemoryResource IPC NUMA ID derivation#1699
Fix PinnedMemoryResource IPC NUMA ID derivation#1699Andy-Jost wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
…ce-specific concerns Move _dev_id, device_id, and peer_accessible_by from _MemPool into DeviceMemoryResource. Eliminate _MemPoolOptions and refactor pool initialization into freestanding cdef functions (MP_init_create_pool, MP_init_current_pool, MP_raise_release_threshold) for cross-module visibility. Extract __init__ bodies into inline cdef helpers (_DMR_init, _PMR_init, _MMR_init) for consistency and shorter class definitions. Implements device_id as -1 for PinnedMemoryResource and ManagedMemoryResource since they are not device-bound. Made-with: Cursor
…IDIA#1603) PinnedMemoryResource(ipc_enabled=True) hardcoded host NUMA ID 0, causing failures on multi-NUMA systems where the active device is attached to a different NUMA node. Now derives the NUMA ID from the current device's host_numa_id attribute, and adds an explicit numa_id option for manual override. Removes the _check_numa_nodes warning machinery in favor of proper NUMA node selection. Made-with: Cursor
|
/ok to test e55a26b |
|
| @property | ||
| def device_id(self) -> int: | ||
| """Return -1. Managed memory migrates automatically and is not tied to a specific device.""" | ||
| return -1 |
There was a problem hiding this comment.
Q: Wouldn't this be a breaking change? In the old implementation, device_id was used to initialize _MemPool. _dev_id, which is used to back .device_id, but the new implementation returns .device_id = -1 unconditionally. I understand we meant to say the pages are migratable (not pinned), but maybe there is a better way to restore the capability of querying the preferred location?
There was a problem hiding this comment.
I didn't mean for this to be reviewed yet. Now marked as Draft.
In the old API, device_id has an overloaded meaning. For DeviceMemoryResource, it refers to the device owning memory. In ManagedMemoryResource, it indicates the preferred location if that location is a device. That is not expressive enough because there are four possible types of preferred location:
- A device
- The host
- A host NUMA node
- None
I'm still considering options, but perhaps we should add a new property, preferred_location, for managed pools. There is, unfortunately, no uniform way to refer to all preferred locations, so we might need to resort to something like the following:
ManagedMemoryResource.preferred_location ->
('device_id', 0) # device 0
('host', None) # host preferred
('numa_id', 0) # NUMA node 0
None # no preferred location
I'm not in a position to say whether we need to retain backwards compatibility for this attribute. Do you think so? If yes, then I can retain the original meaning, possibly with a deprecation warning.
There was a problem hiding this comment.
I created #1710 to track this as a separate issue, but I'm including it in this PR.
There was a problem hiding this comment.
I restored the old behavior where MMR.device_id returns the preferred location if it is a device and -1 otherwise. I'd prefer to unconditionally return -1 if the breaking change is tolerable. A middle ground is to issue a deprecation warning. What do you think @leofang?
|
/ok to test 11d22a2 |
11d22a2 to
96b663a
Compare
|
/ok to test 96b663a |
96b663a to
ff4ee01
Compare
|
/ok to test ff4ee01 |
ff4ee01 to
363ddb7
Compare
|
/ok to test 363ddb7 |
…yResource
Extends ManagedMemoryResourceOptions with a preferred_location_type field
("device", "host", "host_numa", or None) enabling NUMA-aware managed memory
pool placement. Adds ManagedMemoryResource.preferred_location property to
query the resolved setting. Fully backwards-compatible: existing code using
preferred_location alone continues to work unchanged.
Made-with: Cursor
363ddb7 to
2dd3d1a
Compare
|
/ok to test 2dd3d1a |
|
|
||
| def __dealloc__(self): | ||
| try: | ||
| self.close() |
There was a problem hiding this comment.
You're not supposed to call methods in __dealloc__: https://cython.readthedocs.io/en/stable/src/userguide/special_methods.html#finalization-methods-dealloc-and-del
In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.
_peer_accessible_byis a Python object_DMR_set_peer_accessible_bylooks like it's modifying the object based on the way it's being invoked.
There was a problem hiding this comment.
Sounds like a job for __del__ and also adding a context manager.
There was a problem hiding this comment.
This was to work around a driver bug (5698116). When a mempool is destroyed and a new one allocated, if the new pool happens to reuse the previous memory address, it incorrectly inherits the peer access attributes of the destroyed pool. This code is all about clearing the peer access attributes on destruction to avoid that.
The real fix is in C++ under resource_handles.* and should only be applied to owned pools created by cuda.core. The changes here, in Python, were a misadventure. This was resetting peer access for the shared default pool, which is not being destroyed but just released.
| return False | ||
|
|
||
|
|
||
| cdef inline _DMR_set_peer_accessible_by(DeviceMemoryResource self, devices): |
There was a problem hiding this comment.
This seems like something that should almost certainly never be invoked in __dealloc__. It's reading from and writing to the Python object whose __dealloc__ method is being called.
There was a problem hiding this comment.
Thanks, yeah this was a mess. I removed all peer-access-clearing logic from Python. The real problem was that certain tests were using the default memory pool, which is persistent and should keep its peer access settings. Those tests incorrectly assumed the beginning state should be "no peer access." I updated those to simply use fresh pools.
- Remove __dealloc__ and close() override from DeviceMemoryResource that cleared peer access before destruction. The C++ RAII deleter already handles this for owned pools (nvbug 5698116 workaround). For non-owned pools (default device pool), clearing peer access on handle disposal was incorrect behavior. - Update peer access tests to use owned pools (DeviceMemoryResourceOptions()) instead of default pools. Default pools are shared and may have stale peer access state from prior tests, causing test failures. Made-with: Cursor
|
/ok to test c7f85ff |
Made-with: Cursor
|
/ok to test 4c36462 |
|
@cpcloud Please take another look |
Summary
PinnedMemoryResource(ipc_enabled=True)hardcoded host NUMA ID0, causing failures on multi-NUMA systems when the active device is on a different NUMA nodepreferred_location_typeoption toManagedMemoryResourceOptionsandpreferred_locationquery property toManagedMemoryResourcenuma_idoption toPinnedMemoryResourceOptionsfor explicit NUMA node selection_MemPoolhierarchy to separate shared pool machinery from device-specific concernsChanges
Bugfix (
_pinned_memory_resource.pyx):ipc_enabled=Trueandnuma_idis not specified, derives the NUMA node from the current device'shost_numa_idattribute (requires active CUDA context)numa_id: int | None = NonetoPinnedMemoryResourceOptionsnuma_idproperty toPinnedMemoryResource_check_numa_nodeswarning machinery in favor of proper NUMA node selectionManagedMemoryResourcepreferred location (_managed_memory_resource.pyx):preferred_location_type: str | Nonefield toManagedMemoryResourceOptionswith values"device","host","host_numa", orNone(legacy behavior)preferred_locationquery property toManagedMemoryResource_resolve_preferred_locationcdef function with CUDA 13 compile-time guarddevice_idproperty restored to backwards-compatible behavior (returns preferred device ordinal or-1)Refactor (
_MemPoolhierarchy):_dev_id,device_id, andpeer_accessible_byfrom_MemPoolintoDeviceMemoryResource_MemPoolOptions; pool initialization refactored into freestandingcdeffunctions (MP_init_create_pool,MP_init_current_pool,MP_raise_release_threshold)__init__bodies into inlinecdefhelpers (_DMR_init,_PMR_init,_MMR_init)ManagedMemoryResource.device_idreturns preferred device ordinal when location is a device, otherwise-1PinnedMemoryResource.device_idalways returns-1Test Coverage
numa_idbehavior matrix: default without IPC, default with IPC, explicit NUMA ID, negative NUMA ID errorManagedMemoryResourcepreferred location: default, device, host, host_numa, validation errors, NUMA auto-resolve failureskip_if_managed_memory_unsupportedupdated to handle CUDA 12 buildsnuma_idvalues