Add atomic fetch_max/fetch_min (P0493R5) for integral and pointer types#9205
Add atomic fetch_max/fetch_min (P0493R5) for integral and pointer types#9205edenfunf wants to merge 3 commits into
fetch_max/fetch_min (P0493R5) for integral and pointer types#9205Conversation
Surface fetch_max/fetch_min on cuda::std::atomic and atomic_ref for the integral and pointer specializations, add the atomic_fetch_max/min and atomic_fetch_max/min_explicit free functions, and define the __cccl_lib_atomic_min_max feature-test macro. This reuses the existing __atomic_fetch_max_dispatch/__atomic_fetch_min_dispatch machinery (host CAS loop + device atom.max / CAS fallback) that the cuda::atomic extension already exposes. Floating-point min/max (P3008R6) is tracked separately by NVIDIA#9176 and is intentionally out of scope here. Adds std-layer tests covering integral and pointer types, host + device. Fixes NVIDIA#9173
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
suggestion: WalkthroughAdds atomic min/max member implementations via a new macro, wires them into owned/reference atomics, exposes public free-function APIs (implicit and explicit memory-order variants) for integral and pointer atomics, and adds conformance tests covering host/device and multiple memory scopes. ChangesAtomic Min/Max Operations
Assessment against linked issues
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found ... [truncated 1285 characters] ... "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp:27:10: fatal error: 'cuda/std/atomic' file not found ... [truncated 1285 characters] ... "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" Comment |
fbusato
left a comment
There was a problem hiding this comment.
thanks for the contribution, @edenfunf!
The PR is really good. The only gap that I see is that atomic_ref functions are not tested directly. Outside that, there are stylistic issues, but unfortunately this not mimic the surrounding code, e.g. SFINAE->require, namespace qualification, inline with template, etc.
Another comment is about the accepted types. P0493R5 leaves open the door for supporting pointers, as the PR on the host side. On the other hand, the c++-draft only specifies integral types.
I leave the final decision to @wmaxey
miscco
left a comment
There was a problem hiding this comment.
Thanks a lot for working on this. It is already looking great 🎉
I believe we only need to modify the testing of pointers, because I would prefer to have a well defined ordering we can test.
For now its only integral types, so that is fine and we can easily extend that to floating point types later
The pointer overloads of the atomic_fetch_max/min tests fabricated pointers via integer-to-pointer casts and compared them, which is undefined behavior. Use a real array and compare pointers into it, which have a well-defined ordering. Also add std-layer tests that exercise cuda::std::atomic_ref::fetch_max and fetch_min directly for integral types.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8c6556c2-aac5-41ab-951a-5935168a3aef
📒 Files selected for processing (6)
libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_max_explicit.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_max.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_ref_fetch_min.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min_explicit.pass.cpp
- libcudacxx/test/libcudacxx/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_min.pass.cpp
Sweep atomic_ref<int*> and atomic_ref<const int*> in addition to the integral types, since __atomic_ref_pointer also provides fetch_max/fetch_min. Pointers into a real array are used so the ordering is well-defined.
|
A final step to this PR is going to be removing the overloads for See here cccl/libcudacxx/include/cuda/__atomic/atomic.h Lines 58 to 74 in c47f140 cccl/libcudacxx/include/cuda/__atomic/atomic.h Lines 102 to 110 in c47f140 We'll also need to update testing to include both interfaces. |
Description
closes #9173
Implements P0493R5 atomic minimum/maximum for
cuda::std::atomicandcuda::std::atomic_ref.P0493R5 covers integer and pointer types only, so this PR adds:
fetch_max/fetch_minon the integral and pointer specializations ofcuda::std::atomicandcuda::std::atomic_ref, via a new_LIBCUDACXX_ATOMIC_MINMAX_IMPLmacro.atomic_fetch_max/atomic_fetch_minand their_explicitforms (integral and pointer overloads).__cccl_lib_atomic_min_max(202403L).atomic_fetch_addones, covering integral and pointer types on host and device.The implementation only wires up the std-conforming API surface: it reuses the existing
__atomic_fetch_max_dispatch/__atomic_fetch_min_dispatchmachinery (host CAS loop + deviceatom.max/atom.minwith a CAS fallback) that thecuda::atomicextension already exposes. No code generation is touched.Floating-point min/max (P3008R6) is intentionally out of scope and tracked separately by #9176.
A couple of notes for reviewers:
__atomic_bitwise) and pointer specializations only, not to the floating-point one. Consistent with the existingfetch_addfree functions, the free functions excludeboolvia SFINAE._CCCL_STD_VER), since the functionality is provided as an extension regardless of the language standard, following the__cccl_lib_saturation_arithmeticprecedent. Happy to gate it behind C++26 instead if preferred.Checklist