Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c
* [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionality
* [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs
* [cuda.pathfinder](https://nvidia.github.io/cuda-python/cuda-pathfinder/latest): Utilities for locating CUDA components installed in the user's Python environment
* [cuda.coop](https://nvidia.github.io/cccl/python/coop): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
* [cuda.compute](https://nvidia.github.io/cccl/python/compute): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host*
* [cuda.coop](https://nvidia.github.io/cccl/unstable/python/coop.html): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels
* [cuda.compute](https://nvidia.github.io/cccl/unstable/python/compute/index.html): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host*
* [numba.cuda](https://nvidia.github.io/numba-cuda/): A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions
* [cuda.tile](https://docs.nvidia.com/cuda/cutile-python/): A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels
* [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest): Pythonic access to NVIDIA CPU & GPU Math Libraries, with [*host*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#host-apis), [*device*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#device-apis), and [*distributed*](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/index.html) APIs. It also provides low-level Python bindings to host C APIs ([nvmath.bindings](https://docs.nvidia.com/cuda/nvmath-python/latest/bindings/index.html)).
Expand Down Expand Up @@ -44,4 +44,6 @@ The list of available interfaces is:
* NVRTC
* nvJitLink
* NVVM
* nvFatbin
* cuFile
* NVML
4 changes: 4 additions & 0 deletions cuda_core/docs/nv-versions.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
"version": "latest",
"url": "https://nvidia.github.io/cuda-python/cuda-core/latest/"
},
{
"version": "1.0.0",
"url": "https://nvidia.github.io/cuda-python/cuda-core/1.0.0/"
},
{
"version": "0.7.0",
"url": "https://nvidia.github.io/cuda-python/cuda-core/0.7.0/"
Expand Down
9 changes: 4 additions & 5 deletions cuda_core/docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@
``cuda.core`` API Reference
===========================

This is the main API reference for ``cuda.core``. The package has not yet
reached version 1.0.0, and APIs may change between minor versions, possibly
without deprecation warnings. Once version 1.0.0 is released, APIs will
be considered stable and will follow semantic versioning with appropriate
deprecation periods for breaking changes.
This is the main API reference for ``cuda.core``. As of version 1.0.0, all
APIs are considered stable and follow `Semantic Versioning <https://semver.org/>`_
with appropriate deprecation periods for breaking changes. See the
:doc:`support policy <support>` for details.


Devices and execution
Expand Down
1 change: 1 addition & 0 deletions cuda_core/docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Welcome to the documentation for ``cuda.core``.
.. toctree::
:maxdepth: 1

support
conduct
license

Expand Down
2 changes: 1 addition & 1 deletion cuda_core/docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ dependencies are as follows:
Free-threading Build Support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As of cuda-core 0.4.0, **experimental** packages for the `free-threaded interpreter`_ are shipped.
As of cuda-core 1.0.0, **experimental** packages for the `free-threaded interpreter`_ are shipped.

1. Support for these builds is best effort, due to heavy use of `built-in
modules that are known to be thread-unsafe`_, such as ``ctypes``.
Expand Down
186 changes: 156 additions & 30 deletions cuda_core/docs/source/release/1.0.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,79 @@ New features
including string process state queries, lock/checkpoint/restore/unlock
operations, and GPU UUID remapping support for restore.
(`#1343 <https://github.com/NVIDIA/cuda-python/issues/1343>`__)
- Added green context support (CUDA 12.4+). New types :class:`Context`,
:class:`ContextOptions`, :class:`SMResource`, :class:`SMResourceOptions`,
:class:`WorkqueueResource`, and :class:`WorkqueueResourceOptions` enable GPU
SM and workqueue resource partitioning. Create green contexts via
:meth:`Device.create_context`, then use :meth:`Context.create_stream` and
:attr:`Context.resources` to work within the partitioned resources.
(`#1976 <https://github.com/NVIDIA/cuda-python/pull/1976>`__)
- Added the :mod:`cuda.core.system` module for NVIDIA Management Library (NVML)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in 0.6.0. Below is just changes (additions) to it.

Suggested change
- Added the :mod:`cuda.core.system` module for NVIDIA Management Library (NVML)
- Changes to :mod:`cuda.core.system` module for NVIDIA Management Library (NVML)

access:

- :attr:`system.Device.mig` for querying and setting MIG mode, enumerating
MIG device instances, and navigating parent/child relationships.
(`#1916 <https://github.com/NVIDIA/cuda-python/pull/1916>`__)
- :attr:`system.Device.compute_running_processes` for querying running compute
processes on a device, returning :class:`~system.ProcessInfo` objects with
PID, GPU memory usage, and MIG instance IDs.
(`#1917 <https://github.com/NVIDIA/cuda-python/pull/1917>`__)
- :meth:`system.Device.get_nvlink` for querying NVLink version and state per
link, and :attr:`system.Device.utilization` returning current GPU and memory
utilization rates.
(`#1918 <https://github.com/NVIDIA/cuda-python/pull/1918>`__)

- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw
integer re-exports from ``cuda.bindings.nvml``. Added
:class:`~system.typing.GpuP2PCapsIndex`, :class:`~system.typing.GpuP2PStatus`,
and :class:`~system.typing.GpuTopologyLevel` enums.
(`#2014 <https://github.com/NVIDIA/cuda-python/pull/2014>`__)
Comment on lines +45 to +49
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw
integer re-exports from ``cuda.bindings.nvml``. Added
:class:`~system.typing.GpuP2PCapsIndex`, :class:`~system.typing.GpuP2PStatus`,
and :class:`~system.typing.GpuTopologyLevel` enums.
(`#2014 <https://github.com/NVIDIA/cuda-python/pull/2014>`__)
- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw
integer re-exports from ``cuda.bindings.nvml``. These are available in
``cuda.core.system.typing``.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably generate a merge conflict as a reminder, but not to selves: /IF/ we choose to merge #2037, we should make sure typing -> enums here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2037 not being merged, so we are good as-is here.

- Moved all :mod:`cuda.core.system` enums into a new :mod:`cuda.core.system.typing`
module. Imports from ``cuda.core.system`` continue to work but the canonical
location is now ``cuda.core.system.typing``.
(`#2022 <https://github.com/NVIDIA/cuda-python/pull/2022>`__)
Comment on lines +50 to +53
Copy link
Copy Markdown
Contributor

@mdboom mdboom May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really part of the note above. The fact that we merged it and them immediately moved it is uninteresting to our users. It's also dangerously incorrect (their old locations will not continue to work, publicly).

Suggested change
- Moved all :mod:`cuda.core.system` enums into a new :mod:`cuda.core.system.typing`
module. Imports from ``cuda.core.system`` continue to work but the canonical
location is now ``cuda.core.system.typing``.
(`#2022 <https://github.com/NVIDIA/cuda-python/pull/2022>`__)

- Enums are now available in places where a small number of string values are
accepted or returned. You may continue to use the string values, or use
enumerations for better linting and type-checking.
(`#2016 <https://github.com/NVIDIA/cuda-python/issues/2016>`__)
The new enums are:

- :class:`cuda.core.typing.CompilerBackendType`
- :class:`cuda.core.typing.GraphConditionalType`
- :class:`cuda.core.typing.GraphMemoryType`
- :class:`cuda.core.typing.ManagedMemoryLocationType`
- :class:`cuda.core.typing.ObjectCodeFormatType`
- :class:`cuda.core.typing.PCHStatusType`
- :class:`cuda.core.typing.SourceCodeType`
- :class:`cuda.core.typing.VirtualMemoryAccessType`
- :class:`cuda.core.typing.VirtualMemoryAllocationType`
- :class:`cuda.core.typing.VirtualMemoryGranularityType`
- :class:`cuda.core.typing.VirtualMemoryHandleType`
- :class:`cuda.core.typing.VirtualMemoryLocationType`


Breaking changes
----------------

- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor``
objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor``
is passed to any ``from_*`` classmethod (``from_dlpack``,
``from_cuda_array_interface``, ``from_array_interface``, or
``from_any_interface``), tensor metadata is read directly from the underlying
C struct, bypassing the DLPack and CUDA Array Interface protocol overhead.
This yields ~7–20x faster ``StridedMemoryView`` construction for PyTorch
tensors (depending on whether stream ordering is required). Proper CUDA stream
ordering is established between PyTorch's current stream and the consumer
stream, matching the DLPack synchronization contract.
Requires PyTorch >= 2.3.

This is a *behavioral* breaking change: because the AOTI tensor bridge reads
raw metadata without re-enacting PyTorch's export guardrails, tensors that
PyTorch would reject at the DLPack boundary (notably ``requires_grad``,
conjugated, non-strided/sparse, and wrong-current-device CUDA tensors) are
now accepted. This is intentional — ``StridedMemoryView`` is designed for
low-level interop where those checks are not needed.
(`#749 <https://github.com/NVIDIA/cuda-python/issues/749>`__)
- Renamed :class:`~graph.GraphDef` to :class:`~graph.GraphDefinition` for
consistency with the rest of the API, which spells words out (e.g.
``TensorMapDescriptor``, not ``TensorMapDesc``).
Expand Down Expand Up @@ -125,36 +193,94 @@ Breaking changes
- :obj:`cuda.core.typing.DevicePointerT` -> :obj:`cuda.core.typing.DevicePointerType`
- :obj:`cuda.core.typing.IsStreamT` -> :obj:`cuda.core.typing.IsStreamType`

Fixes and enhancements
-----------------------
- Renamed and converted multiple :class:`~system.Device` properties and methods
for naming consistency
(`#1946 <https://github.com/NVIDIA/cuda-python/pull/1946>`__):

- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor``
objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor``
is passed to any ``from_*`` classmethod (``from_dlpack``,
``from_cuda_array_interface``, ``from_array_interface``, or
``from_any_interface``), tensor metadata is read directly from the underlying
C struct, bypassing the DLPack and CUDA Array Interface protocol overhead.
This yields ~7-20x faster ``StridedMemoryView`` construction for PyTorch
tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch's current
stream and the consumer stream, matching the DLPack synchronization contract.
Requires PyTorch >= 2.3.
(`#749 <https://github.com/NVIDIA/cuda-python/issues/749>`__)
On :class:`~system.Device`:

- Enums are not available in places where a small number of string values are
accepted or returned. You may continue to use the string values, or use
enumerations for better linting and type-checking.
(`#2016 <https://github.com/NVIDIA/cuda-python/issues/2016>`__)
The new enums are:
- ``is_c2c_mode_enabled`` -> ``is_c2c_enabled``
- ``persistence_mode_enabled`` -> ``is_persistence_mode_enabled``
- ``clock(clock_type)`` -> ``get_clock(clock_type)``
- ``get_auto_boosted_clocks_enabled()`` -> ``is_auto_boosted_clocks_enabled``
(method -> property)
- ``get_current_clock_event_reasons()`` -> ``current_clock_event_reasons``
(method -> property)
- ``get_supported_clock_event_reasons()`` -> ``supported_clock_event_reasons``
(method -> property)
- ``display_mode`` -> ``is_display_connected``
- ``display_active`` -> ``is_display_active``
- ``fan(fan=0)`` -> ``get_fan(fan=0)``
- ``get_supported_pstates()`` -> ``supported_pstates``
(method -> property)

- :class:`cuda.core.typing.CompilerBackendType`
- :class:`cuda.core.typing.GraphConditionalType`
- :class:`cuda.core.typing.GraphMemoryType`
- :class:`cuda.core.typing.ManagedMemoryLocationType`
- :class:`cuda.core.typing.ObjectCodeFormatType`
- :class:`cuda.core.typing.PCHStatusType`
- :class:`cuda.core.typing.SourceCodeType`
- :class:`cuda.core.typing.VirtualMemoryAccessType`
- :class:`cuda.core.typing.VirtualMemoryAllocationType`
- :class:`cuda.core.typing.VirtualMemoryGranularityType`
- :class:`cuda.core.typing.VirtualMemoryHandleType`
- :class:`cuda.core.typing.VirtualMemoryLocationType`
On ``PciInfo``:

- ``get_max_pcie_link_generation()`` -> ``link_generation`` (method -> property)
- ``get_gpu_max_pcie_link_generation()`` -> ``max_link_generation``
(method -> property)
- ``get_max_pcie_link_width()`` -> ``max_link_width`` (method -> property)
- ``get_current_pcie_link_generation()`` -> ``current_link_generation``
(method -> property)
- ``get_current_pcie_link_width()`` -> ``current_link_width``
(method -> property)
- ``get_pcie_throughput(counter)`` -> ``get_throughput(counter)``
- ``get_pcie_replay_counter()`` -> ``replay_counter`` (method -> property)

On ``Temperature``:

- ``sensor(sensor=...)`` -> ``get_sensor(sensor=...)``
- ``threshold(threshold_type)`` -> ``get_threshold(threshold_type)``
- ``thermal_settings(sensor_index)`` -> ``get_thermal_settings(sensor_index)``

On ``FanInfo``:

- ``set_default_fan_speed()`` -> ``set_default_speed()``

- Removed 18 helper/data-container classes from ``cuda.core.system.__all__``:
``BAR1MemoryInfo``, ``ClockInfo``, ``ClockOffsets``, ``CoolerInfo``,
``DeviceAttributes``, ``DeviceEvents``, ``EventData``, ``FanInfo``,
``FieldValue``, ``FieldValues``, ``GpuDynamicPstatesInfo``,
``GpuDynamicPstatesUtilization``, ``InforomInfo``, ``PciInfo``,
``RepairStatus``, ``Temperature``, ``ThermalSensor``, ``ThermalSettings``.
These classes are still returned by :class:`~system.Device` properties and
methods but should not be directly instantiated by users.
(`#1942 <https://github.com/NVIDIA/cuda-python/pull/1942>`__)
- Removed ``BrandType``, ``NvlinkVersion``, ``PcieUtilCounter``, ``Pstates``,
and ``TemperatureSensors`` enums from ``cuda.core.system``; the underlying
values are now returned as plain strings or accessed through other APIs.
(`#2014 <https://github.com/NVIDIA/cuda-python/pull/2014>`__)
Comment on lines +249 to +252
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by the enum changes above.

Suggested change
- Removed ``BrandType``, ``NvlinkVersion``, ``PcieUtilCounter``, ``Pstates``,
and ``TemperatureSensors`` enums from ``cuda.core.system``; the underlying
values are now returned as plain strings or accessed through other APIs.
(`#2014 <https://github.com/NVIDIA/cuda-python/pull/2014>`__)

- :attr:`system.Device.uuid` now returns the full NVML UUID with prefix
(e.g. ``GPU-...``). Use :attr:`system.Device.uuid_without_prefix` for
the previous behavior.
(`#1916 <https://github.com/NVIDIA/cuda-python/pull/1916>`__)

Fixes and enhancements
-----------------------

- Fixed :attr:`Buffer.is_managed` returning ``False`` for pool-allocated managed
memory (:class:`ManagedMemoryResource`), which caused DLPack interop to
misclassify managed buffers as ``kDLCUDAHost``. The fix queries both the
driver pointer attribute and the memory resource.
(`#1924 <https://github.com/NVIDIA/cuda-python/pull/1924>`__)
- :attr:`system.Device.arch` now returns ``UNKNOWN`` instead of raising
``ValueError`` when NVML reports an architecture not yet in the enum.
(`#1937 <https://github.com/NVIDIA/cuda-python/pull/1937>`__)
- :meth:`system.Device.get_field_values` and
:meth:`system.Device.clear_field_values` with an empty list no longer raise
``InvalidArgumentError``.
(`#1982 <https://github.com/NVIDIA/cuda-python/pull/1982>`__)
- :class:`Linker` error and info log retrieval now properly checks return codes
from nvJitLink, raising exceptions on failure instead of silently ignoring
errors.
(`#1993 <https://github.com/NVIDIA/cuda-python/pull/1993>`__)
- Fixed a potential crash when NVML event set creation failed, due to
``__dealloc__`` freeing an uninitialized handle.
(`#1992 <https://github.com/NVIDIA/cuda-python/pull/1992>`__)
Comment on lines +277 to +279
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fixed a potential crash when NVML event set creation failed, due to
``__dealloc__`` freeing an uninitialized handle.
(`#1992 <https://github.com/NVIDIA/cuda-python/pull/1992>`__)
- Fixed a potential crash when NVML event set creation failed on Windows, due to
``__dealloc__`` freeing an uninitialized handle.
(`#1992 <https://github.com/NVIDIA/cuda-python/pull/1992>`__)

- CUDA Runtime error messages are now more reliable, especially on Windows
where the runtime DLL name table could disagree with the installed bindings.
(`#2003 <https://github.com/NVIDIA/cuda-python/pull/2003>`__)
- Linux release wheels are now stripped of debug symbols, significantly reducing
package size. Debug builds are now supported via
``--config-settings=debug=true``.
(`#1890 <https://github.com/NVIDIA/cuda-python/pull/1890>`__)
75 changes: 75 additions & 0 deletions cuda_core/docs/source/support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
.. SPDX-License-Identifier: Apache-2.0

.. _cuda-core-support:

``cuda.core`` Support Policy
============================

Versioning Scheme
-----------------

``cuda.core`` follows `Semantic Versioning (SemVer) <https://semver.org/>`_ with the version
format ``major.minor.patch``:

- **Major**: Bumped when a new CUDA major release is out and support for the oldest CUDA major
version is dropped. Breaking API changes only happen at major-version boundaries.
- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python minor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python calls these "feature releases", because "minor" sort of makes sense because it's the second number, it undersells the importance of them (and it doesn't follow semver anyway).

Suggested change
- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python minor
- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python feature

release is out and the oldest supported Python version reaches EOL.
- **Patch**: Bumped for bug fixes and backward-compatible maintenance updates.

Unlike ``cuda.bindings``, the ``cuda.core`` version is *not* aligned with the CUDA Toolkit version.
Consult the table below or the :doc:`release notes <release>` to determine which CUDA versions are
supported by a given ``cuda.core`` release.

CUDA Version Support
--------------------

``cuda.core`` is actively maintained to support the two (2) most recent CUDA major versions. For
example, ``cuda.core`` 1.x supports CUDA 12 and 13. Any fix in the latest release would be
backported as needed.

When a new CUDA major version is released and support for the oldest major version is dropped,
``cuda.core`` will release a new major version (e.g., 1.x → 2.0.0).

.. list-table:: CUDA Version Support Matrix
:header-rows: 1

* - ``cuda.core`` version
- Supported CUDA versions
* - 1.x
- 12, 13

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add that certain features in cuda_core may require a specific minimum version of cuda_bindings? (I'm thinking of NVML support specifically, but that may not be the only case). Saying it supports "any 12 or 13" is a bit misleading.

As a minimum, I'm going to file another pull request to document the minimum required cuda_bindings version that will set next to the cuda.core.system docs.

Python Version Support
----------------------

``cuda.core`` supports all Python versions following the `CPython EOL schedule
<https://devguide.python.org/versions/>`_. As of writing, Python 3.10 – 3.14 are supported.

When a new Python minor version is released and the oldest supported version reaches EOL,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a new Python minor version is released and the oldest supported version reaches EOL,
When a new Python feature version is released and the oldest supported version reaches EOL,

``cuda.core`` will bump its minor version accordingly.

Free-threading Build Support
----------------------------

As of ``cuda.core`` 1.0.0, wheels for the `free-threaded interpreter
<https://docs.python.org/3/howto/free-threading-python.html>`_ are shipped to PyPI. This support
is currently *experimental*.

1. For now, you are responsible for making sure that calls into the underlying CUDA libraries
are thread-safe. This is subject to change.

Release Cadence
---------------

- ``cuda.core`` follows its own release cadence, independent of CUDA Toolkit releases, as long as
SemVer guarantees are maintained.
- We currently aim for bimonthly releases, though this is subject to change.
- Major version releases are aligned to CUDA major version releases.
- New features may be delivered in minor releases at any time — not gated by the CUDA Toolkit
release schedule.

----

The NVIDIA CUDA Python team reserves the right to amend the above support policy. Any major changes,
however, will be announced to users in advance.
Loading