[FEA]: Add improved latency test for cuda.bindings benchmarks, add C++ comparison

### Is this a duplicate?

- [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cuda-python/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Area

cuda.bindings

### Is your feature request related to a problem? Please describe.

With the new release of cuda.bench using nvbench as a backing, we want to utilize the statistical models it employs to more accurately bench the runtime/latency/overhead of our binding calls specifically. Adding C++ comparisons will allow allow quick comparision of cuda.bindings overall performance 

### Describe the solution you'd like

/benchmarks/ folder that has identical benchmarks of CUDA API's in python through bindings, as well as the raw C++ functions. NVBench wrapping both of these to generate json files that can be compared to view latency differences.

### Describe alternatives you've considered

Current implementation uses pytest, which does not offer the same granularity of nvbench. 

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Add improved latency test for cuda.bindings benchmarks, add C++ comparison #1220

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA]: Add improved latency test for cuda.bindings benchmarks, add C++ comparison #1220

Description

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions