Skip to content

Fix PinnedMemoryResource IPC NUMA ID derivation#1699

Open
Andy-Jost wants to merge 8 commits intoNVIDIA:mainfrom
Andy-Jost:refactor-mempool-hierarchy
Open

Fix PinnedMemoryResource IPC NUMA ID derivation#1699
Andy-Jost wants to merge 8 commits intoNVIDIA:mainfrom
Andy-Jost:refactor-mempool-hierarchy

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Feb 27, 2026

Summary

Changes

Bugfix (_pinned_memory_resource.pyx):

  • When ipc_enabled=True and numa_id is not specified, derives the NUMA node from the current device's host_numa_id attribute (requires active CUDA context)
  • Adds numa_id: int | None = None to PinnedMemoryResourceOptions
  • Adds numa_id property to PinnedMemoryResource
  • Removes _check_numa_nodes warning machinery in favor of proper NUMA node selection

ManagedMemoryResource preferred location (_managed_memory_resource.pyx):

  • Adds preferred_location_type: str | None field to ManagedMemoryResourceOptions with values "device", "host", "host_numa", or None (legacy behavior)
  • Adds preferred_location query property to ManagedMemoryResource
  • Adds _resolve_preferred_location cdef function with CUDA 13 compile-time guard
  • device_id property restored to backwards-compatible behavior (returns preferred device ordinal or -1)

Refactor (_MemPool hierarchy):

  • Moves _dev_id, device_id, and peer_accessible_by from _MemPool into DeviceMemoryResource
  • Eliminates _MemPoolOptions; pool initialization refactored into freestanding cdef functions (MP_init_create_pool, MP_init_current_pool, MP_raise_release_threshold)
  • Extracts __init__ bodies into inline cdef helpers (_DMR_init, _PMR_init, _MMR_init)
  • ManagedMemoryResource.device_id returns preferred device ordinal when location is a device, otherwise -1
  • PinnedMemoryResource.device_id always returns -1

Test Coverage

  • 4 new tests covering the numa_id behavior matrix: default without IPC, default with IPC, explicit NUMA ID, negative NUMA ID error
  • 6 new tests for ManagedMemoryResource preferred location: default, device, host, host_numa, validation errors, NUMA auto-resolve failure
  • skip_if_managed_memory_unsupported updated to handle CUDA 12 builds
  • Updated existing IPC tests to assert numa_id values
  • All existing memory and IPC tests pass (pinned, device, managed, peer access, IPC)

…ce-specific concerns

Move _dev_id, device_id, and peer_accessible_by from _MemPool into
DeviceMemoryResource. Eliminate _MemPoolOptions and refactor pool
initialization into freestanding cdef functions (MP_init_create_pool,
MP_init_current_pool, MP_raise_release_threshold) for cross-module
visibility. Extract __init__ bodies into inline cdef helpers (_DMR_init,
_PMR_init, _MMR_init) for consistency and shorter class definitions.

Implements device_id as -1 for PinnedMemoryResource and
ManagedMemoryResource since they are not device-bound.

Made-with: Cursor
…IDIA#1603)

PinnedMemoryResource(ipc_enabled=True) hardcoded host NUMA ID 0, causing
failures on multi-NUMA systems where the active device is attached to a
different NUMA node. Now derives the NUMA ID from the current device's
host_numa_id attribute, and adds an explicit numa_id option for manual
override. Removes the _check_numa_nodes warning machinery in favor of
proper NUMA node selection.

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Feb 27, 2026
@Andy-Jost Andy-Jost added bug Something isn't working cuda.core Everything related to the cuda.core module labels Feb 27, 2026
@Andy-Jost Andy-Jost self-assigned this Feb 27, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test e55a26b

@github-actions
Copy link

Comment on lines +69 to +72
@property
def device_id(self) -> int:
"""Return -1. Managed memory migrates automatically and is not tied to a specific device."""
return -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Wouldn't this be a breaking change? In the old implementation, device_id was used to initialize _MemPool. _dev_id, which is used to back .device_id, but the new implementation returns .device_id = -1 unconditionally. I understand we meant to say the pages are migratable (not pinned), but maybe there is a better way to restore the capability of querying the preferred location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean for this to be reviewed yet. Now marked as Draft.

In the old API, device_id has an overloaded meaning. For DeviceMemoryResource, it refers to the device owning memory. In ManagedMemoryResource, it indicates the preferred location if that location is a device. That is not expressive enough because there are four possible types of preferred location:

  1. A device
  2. The host
  3. A host NUMA node
  4. None

I'm still considering options, but perhaps we should add a new property, preferred_location, for managed pools. There is, unfortunately, no uniform way to refer to all preferred locations, so we might need to resort to something like the following:

ManagedMemoryResource.preferred_location ->
    ('device_id', 0)  # device 0
    ('host', None)    # host preferred
    ('numa_id', 0)    # NUMA node 0
    None              # no preferred location

I'm not in a position to say whether we need to retain backwards compatibility for this attribute. Do you think so? If yes, then I can retain the original meaning, possibly with a deprecation warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #1710 to track this as a separate issue, but I'm including it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the old behavior where MMR.device_id returns the preferred location if it is a device and -1 otherwise. I'd prefer to unconditionally return -1 if the breaking change is tolerable. A middle ground is to issue a deprecation warning. What do you think @leofang?

@Andy-Jost Andy-Jost marked this pull request as draft March 2, 2026 14:32
@Andy-Jost
Copy link
Contributor Author

/ok to test 11d22a2

@Andy-Jost Andy-Jost force-pushed the refactor-mempool-hierarchy branch from 11d22a2 to 96b663a Compare March 2, 2026 21:47
@Andy-Jost
Copy link
Contributor Author

/ok to test 96b663a

@Andy-Jost Andy-Jost force-pushed the refactor-mempool-hierarchy branch from 96b663a to ff4ee01 Compare March 2, 2026 22:02
@Andy-Jost
Copy link
Contributor Author

/ok to test ff4ee01

@Andy-Jost Andy-Jost marked this pull request as ready for review March 2, 2026 22:03
@Andy-Jost Andy-Jost force-pushed the refactor-mempool-hierarchy branch from ff4ee01 to 363ddb7 Compare March 2, 2026 23:10
@Andy-Jost
Copy link
Contributor Author

/ok to test 363ddb7

…yResource

Extends ManagedMemoryResourceOptions with a preferred_location_type field
("device", "host", "host_numa", or None) enabling NUMA-aware managed memory
pool placement. Adds ManagedMemoryResource.preferred_location property to
query the resolved setting. Fully backwards-compatible: existing code using
preferred_location alone continues to work unchanged.

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the refactor-mempool-hierarchy branch from 363ddb7 to 2dd3d1a Compare March 2, 2026 23:46
@Andy-Jost
Copy link
Contributor Author

/ok to test 2dd3d1a

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __dealloc__ violates explicit Cython documentation.

Not a blocker, but can this be split up? It seems like it's coupling two changesets that don't need to be in the same PR.


def __dealloc__(self):
try:
self.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not supposed to call methods in __dealloc__: https://cython.readthedocs.io/en/stable/src/userguide/special_methods.html#finalization-methods-dealloc-and-del

In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.

  • _peer_accessible_by is a Python object
  • _DMR_set_peer_accessible_by looks like it's modifying the object based on the way it's being invoked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a job for __del__ and also adding a context manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to work around a driver bug (5698116). When a mempool is destroyed and a new one allocated, if the new pool happens to reuse the previous memory address, it incorrectly inherits the peer access attributes of the destroyed pool. This code is all about clearing the peer access attributes on destruction to avoid that.

The real fix is in C++ under resource_handles.* and should only be applied to owned pools created by cuda.core. The changes here, in Python, were a misadventure. This was resetting peer access for the shared default pool, which is not being destroyed but just released.

return False


cdef inline _DMR_set_peer_accessible_by(DeviceMemoryResource self, devices):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that should almost certainly never be invoked in __dealloc__. It's reading from and writing to the Python object whose __dealloc__ method is being called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah this was a mess. I removed all peer-access-clearing logic from Python. The real problem was that certain tests were using the default memory pool, which is persistent and should keep its peer access settings. Those tests incorrectly assumed the beginning state should be "no peer access." I updated those to simply use fresh pools.

- Remove __dealloc__ and close() override from DeviceMemoryResource
  that cleared peer access before destruction. The C++ RAII deleter
  already handles this for owned pools (nvbug 5698116 workaround).
  For non-owned pools (default device pool), clearing peer access
  on handle disposal was incorrect behavior.

- Update peer access tests to use owned pools (DeviceMemoryResourceOptions())
  instead of default pools. Default pools are shared and may have stale
  peer access state from prior tests, causing test failures.

Made-with: Cursor
@Andy-Jost
Copy link
Contributor Author

/ok to test c7f85ff

@Andy-Jost
Copy link
Contributor Author

/ok to test 4c36462

@Andy-Jost
Copy link
Contributor Author

@cpcloud Please take another look

@Andy-Jost Andy-Jost requested a review from cpcloud March 5, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add preferred_location_type option and query property to ManagedMemoryResource [BUG]: IPC-enabled pinned pool uses a fixed host NUMA node

4 participants