Documentation for rocThrust available at https://rocm.docs.amd.com/projects/rocThrust/en/latest/.
- Updated the required version of Google Benchmark from 1.8.0 to 1.9.0.
device_malloc_allocator.hhas been removed. This header file was unused and should not impact users.- Removed C++14 support, only C++17 is supported.
thrust::device_malloc_allocatoris deprecated as of this version. It will be removed in an upcoming version.
- Added gfx950 support.
- Merged changes from upstream CCCL/thrust 2.6.0
- The order of the values being compared by thrust::exclusive_scan_by_key and thrust::inclusive_scan_by_key can change between runs when integers are being compared. This can cause incorrect output when a non-commutative operator such as division is being used.
- Added a section to install Thread Building Block (TBB) inside
cmake/Dependencies.cmakeif TBB is not already available. - Made Thread Building Block (TBB) an optional dependency with the new
BUILD_HIPSTDPAR_TEST_WITH_TBBflag, default isOFF. When the flag isOFFand TBB is not already on the machine it will compile without TBB. Otherwise is will compile it with TBB. - Added extended tests to
rtest.py. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Usepython rtest.py [--emulation|-e|--test|-t]=extendedto run these tests. - Added regression tests to
rtest.py. These tests recreate scenarios that have caused hardware problems in past emulation environments. Usepython rtest.py [--emulation|-e|--test|-t]=regressionto run these tests. - Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use
python rtest.py [--emulation|-e|--test|-t]=smoketo run these tests. - Added
--emulationoption forrtest.py - Merged changes from upstream CCCL/thrust 2.4.0
- Merged changes from upstream CCCL/thrust 2.5.0
- Added
find_first_ofto HIPSTDPAR - Added
searchandfind_endto HIPSTDPAR - Added
search_nto HIPSTDPAR - Updated HIPSTDPAR's
adjacent_findto use rocPRIM's implementation
- Changed the C++ version from 14 to 17. C++14 will be deprecated in the next major release.
--test|-tis no longer a required flag forrtest.py. Instead, the user can use either--emulation|-eor--test|-t, but not both.- Split the contents of HIPSTDPAR's forwarding header into several implementation headers.
- Fixed
copy_ifto work with large data types (512 bytes)
thrust::inclusive_scan_by_keymight produce incorrect results when it's used with -O2 or -O3 optimization.
- The error is caused by a recent compiler change. There is a fix available that will be released at a later date.
- Merged changes from upstream CCCL/thrust 2.3.2
- Only the NVIDIA backend uses
tupleandpairtypes from libcu++, other backends continue to use the original Thrust implementations and hence do not require libcu++ (CCCL) as a dependency.
- Only the NVIDIA backend uses
- Added the
thrust::hip::par_detexecution policy to enable bitwise reproducibility on algorithms that are not bitwise reproducible by default.
- Updated the default value for the
-aargument fromrmake.pytogfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201. - Enabled the upstream (thrust) test suite for execution by default. It can still be disabled by CMake option
-DENABLE_UPSTREAM_TESTS=OFF.
- Fixed an issue in
rmake.pywhere the list storing cmake options would contain individual characters instead of a full string of options. - Fixed the HIP backend not passing
TestCopyIfNonTrivialfrom the upstream (thrust) test suite. - Fixed tests failing when compiled with
-D_GLIBCXX_ASSERTIONS=ON.
- Merged changes from upstream CCCL/thrust 2.2.0
- Updated the contents of
system/hipandtestwith the upstream changes tosystem/cudaandtesting
- Updated the contents of
- Added HIPSTDPAR library as part of rocThrust.
- Updated internal calls to
rocprim::detail::invoke_resultto use the public APIrocprim::invoke_result. - Use
rocprim::device_adjacent_differenceforadjacent_differenceAPI call. - Updated internal use of custom iterator in
thrust::detail::unique_by_keyto use rocPRIM'srocprim::unique_by_key. - Updated
adjecent_differenceto make use ofrocprim:adjecent_differencewhen iterators are comparable and not equal otherwise userocprim:adjacent_difference_inplace.
- Fixed incorrect implementation of
thrust::optional<T&>::emplace().
thrust::reduce_by_keyoutputs are not bit-wise reproducible, as run-to-run results for pseudo-associative reduction operators (e.g. floating-point arithmetic operators) are not deterministic on the same device.- Note that currently, rocThrust memory allocation is performed in such a way that most algorithmic API functions cannot be called from within hipGraphs.
- Updated to match upstream Thrust 2.0.1
- NV_IF_TARGET macro from libcu++ for NVIDIA backend and HIP implementation for HIP backend.
- The cmake build system now additionally accepts
GPU_TARGETSin addition toAMDGPU_TARGETSfor setting the targeted gpu architectures.GPU_TARGETS=allwill compile for all supported architectures.AMDGPU_TARGETSis only provided for backwards compatibility,GPU_TARGETSshould be preferred. - Removed cub symlink from the root of the repository.
- Removed support for deprecated macros (THRUST_DEVICE_BACKEND and THRUST_HOST_BACKEND).
- Fixed a segmentation fault when binary search / upper bound / lower bound / equal range was invoked with
hip_rocprim::execute_on_stream_basepolicy.
- The
THRUST_HAS_CUDARTmacro, which is no longer used in Thrust (it's provided only for legacy support) is replaced withNV_IF_TARGETandTHRUST_RDC_ENABLEDin the NVIDIA backend. The HIP backend doesn't have aTHRUST_RDC_ENABLEDmacro, so some branches in Thrust code may be unreachable in the HIP backend.
lower_bound,upper_bound, andbinary_searchfailed to compile for certain types.- Fixed issue where
transform_iteratorwould not compile with__device__-only operators.
- Updated
docsdirectory structure to match the standard of rocm-docs-core. - Removed references to and workarounds for deprecated hcc
- Updates to match upstream Thrust 1.17.2
partition_copynow usesrocprim::partition_two_wayfor increased performance
set_differenceandset_intersectionno longer hang if the number of items is aboveUINT_MAX(the unit tests forset_differenceandset_intersectionused to fail theTestSetDifferenceWithBigIndexes)
- Updates to match upstream Thrust 1.16.0
- rocThrust functionality dependent on device malloc is functional (ROCm 5.2 reenabled device malloc); you can now use device launched
thrust::sortandthrust::sort_by_key
- Packages for tests and benchmark executables on all supported operating systems using CPack
async_copy,partition, andstable_sort_by_keyunit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.15.0
async_copy,partition, andstable_sort_by_keyunit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.13.0
- Updates to match upstream Thrust 1.14.0
- Added async scan
- Scan algorithms:
inclusive_scannow uses theinput-typeasaccumulator-type;exclusive_scanusesinitial-value-type- This changes the behavior of small-size input types with large-size output types (e.g.
shortinput,intoutput) and low-res input with high-res output (e.g.floatinput,doubleoutput)
- This changes the behavior of small-size input types with large-size output types (e.g.
- Initial HIP on Windows support
- Packaging has changed to a development package (called
rocthrust-devfor.debpackages androcthrust-develfor.rpmpackages). Because rocThrust is a header-only library, there is no runtime package. To aid in the transition, the development package sets theprovidesfield torocthrust, so that existing packages that are dependent on rocThrust can continue to work. Thisprovidesfeature is introduced as a deprecated feature because it will be removed in a future ROCm release.
async_copy,partition, andstable_sort_by_keyunit tests are failing for HIP on Windows- Mixed-type exclusive scan algorithm is not using the initial value type for the results type
- gfx1030 support
- AddressSanitizer build option
- async_transform unit test failure
- Updates to match upstream Thrust 1.11
- gfx90a support
- gfx803 support re-enabled
- Updates to match upstream Thrust 1.10
- rocThrust now requires CMake version 3.10.2 or greater
- Size zero inputs are now properly handled with newer ROCm builds, which no longer allow zero-size kernel grid/block dimensions
- Warning of unused results
- There are no changes with this release
- Updated to upstream Thrust 1.10.0
- Implemented runtime error for unsupported algorithms and disabled respective tests
- Updated CMake to use downloaded rocPRIM
copy_ifon device test case
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sortandthrust::sort_by_key. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOCto 1. thrust::sortandthrust::sort_by_keymay work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.8
- New test cases for device-side algorithms
- Bug for binary search
- Implemented workarounds for
hipStreamDefaulthang
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sortandthrust::sort_by_key. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOCto 1. thrust::sortandthrust::sort_by_keymay work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sortandthrust::sort_by_key. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOCto 1. thrust::sortandthrust::sort_by_keymay work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.4
- Package dependency has changed to rocPRIM only
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sortandthrust::sort_by_key. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOCto 1. thrust::sortandthrust::sort_by_keymay work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sortandthrust::sort_by_key. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOCto 1. thrust::sortandthrust::sort_by_keymay work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Improved tests with fixed and random seeds for test data
- CMake searches for rocThrust locally first; if it isn't found, CMake downloads it from GitHub
- HCC build has been deprecated