Describe the bug
Failing shape and parameters:
COIL20 dataset
n_rows = 1440
n_features = 16384
dtype = float32
- DLPack device:
kDLCPU
- Layout: contiguous row-major,
strides = nullptr
- Metric:
L2Expanded
graph_degree = 10
intermediate_graph_degree = 20
max_iterations = 20
return_distances = true
A lower-dimensional float32 dataset, for example 70000 x 784, works through the same C API path.
Actual Error
CUDA error encountered at:
cpp/src/neighbors/detail/nn_descent.cuh line=1564
call='cudaPeekAtLastError()'
Reason=cudaErrorInvalidValue: invalid argument
Expected behavior
cuvsNNDescentBuild succeeds for this valid float32 host tensor shap
Environment details (please complete the following information):
- GPU: NVIDIA GeForce RTX 5060 Ti, 16 GB
- Driver:
595.71.05
- CUDA:
13.2
libcuvs: 26.06.00, CUDA 13 build
libraft: 26.06.00, CUDA 13 build
- OS: Linux x86_64
Observed Behavior
The build fails inside cuVS with cudaErrorInvalidValue after cudaPeekAtLastError() in nn_descent.cuh.
Additional context
The same installation can run cuVS NN-descent on lower-dimensional float32 input such as 70000 x 784, so the failure appears related to high feature count rather than general CUDA/cuVS setup.
Steps/Code to reproduce bug
#include <cstdint>
#include <vector>
#include <dlpack/dlpack.h>
#include <cuvs/core/c_api.h>
#include <cuvs/neighbors/nn_descent.h>
static DLManagedTensor make_tensor(void* data, int64_t* shape) {
DLManagedTensor tensor{};
tensor.dl_tensor.data = data;
tensor.dl_tensor.device.device_type = kDLCPU;
tensor.dl_tensor.device.device_id = 0;
tensor.dl_tensor.ndim = 2;
tensor.dl_tensor.dtype.code = kDLFloat;
tensor.dl_tensor.dtype.bits = 32;
tensor.dl_tensor.dtype.lanes = 1;
tensor.dl_tensor.shape = shape;
tensor.dl_tensor.strides = nullptr;
tensor.dl_tensor.byte_offset = 0;
tensor.manager_ctx = nullptr;
tensor.deleter = nullptr;
return tensor;
}
int main() {
const int64_t n_rows = 1440;
const int64_t n_features = 16384;
std::vector<float> x(static_cast<size_t>(n_rows) * n_features);
// Fill x with finite float32 values.
int64_t dataset_shape[2] = {n_rows, n_features};
DLManagedTensor dataset_tensor = make_tensor(x.data(), dataset_shape);
cuvsResources_t res;
cuvsResourcesCreate(&res);
cuvsNNDescentIndexParams_t params;
cuvsNNDescentIndexParamsCreate(¶ms);
params->metric = L2Expanded;
params->graph_degree = 10;
params->intermediate_graph_degree = 20;
params->max_iterations = 20;
params->return_distances = true;
cuvsNNDescentIndex_t index;
cuvsNNDescentIndexCreate(&index);
auto status = cuvsNNDescentBuild(
res,
params,
&dataset_tensor,
nullptr,
index
);
cuvsNNDescentIndexDestroy(index);
cuvsNNDescentIndexParamsDestroy(params);
cuvsResourcesDestroy(res);
return static_cast<int>(status);
}
Describe the bug
Failing shape and parameters:
COIL20 dataset
n_rows = 1440n_features = 16384dtype = float32kDLCPUstrides = nullptrL2Expandedgraph_degree = 10intermediate_graph_degree = 20max_iterations = 20return_distances = trueA lower-dimensional float32 dataset, for example
70000 x 784, works through the same C API path.Actual Error
Expected behavior
cuvsNNDescentBuild succeeds for this valid float32 host tensor shap
Environment details (please complete the following information):
595.71.0513.2libcuvs:26.06.00, CUDA 13 buildlibraft:26.06.00, CUDA 13 buildObserved Behavior
The build fails inside cuVS with cudaErrorInvalidValue after cudaPeekAtLastError() in nn_descent.cuh.
Additional context
The same installation can run cuVS NN-descent on lower-dimensional float32 input such as 70000 x 784, so the failure appears related to high feature count rather than general CUDA/cuVS setup.
Steps/Code to reproduce bug