fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions by vinitjain2005 · Pull Request #467 · JdeRobot/PerceptionMetrics

vinitjain2005 · 2026-03-21T03:37:46Z

Summary

This PR extends the timing fix originally proposed in the closed PR to all get_computational_cost() functions across the library, and also addresses the review comment to store torch.cuda.is_available() in a variable instead of calling it on every loop iteration.

Closes #453

Problem

The get_computational_cost() functions across multiple files had two issues:

torch.cuda.synchronize() was called unconditionally on every iteration — it is a no-op on CPU and was designed only for CUDA devices
time.time() was used for timing — it has low resolution on Windows (15ms granularity) causing timing to show 0ms or jump in 15ms chunks
torch.cuda.is_available() was being called on every loop iteration instead of being stored once as a variable

Fix

# Before
for _ in range(runs):
    torch.cuda.synchronize()        # unconditional, no-op on CPU
    start = time.time()             # low resolution on Windows
    ...
    torch.cuda.synchronize()
    inference_times.append(time.time() - start)

# After
cuda_available = torch.cuda.is_available()  # stored once
for _ in range(runs):
    if cuda_available:
        torch.cuda.synchronize()
    start = time.perf_counter()     # high resolution on all platforms
    ...
    if cuda_available:
        torch.cuda.synchronize()
    inference_times.append(time.perf_counter() - start)

Files Changed

perceptionmetrics/models/torch_detection.py
- Fixed standalone get_computational_cost() function
perceptionmetrics/models/torch_segmentation.py
- Fixed TorchImageSegmentationModel.get_computational_cost()
- Fixed TorchLiDARSegmentationModel.get_computational_cost()
perceptionmetrics/models/tf_segmentation.py
- Fixed TensorflowImageSegmentationModel.get_computational_cost()
- Note: Uses has_gpu variable (already stored before loop)
  and replaces time.time() with time.perf_counter()

Impact

Every user running computational cost estimation on CPU gets accurate timing results. This is especially relevant since CUDA is optional and most contributors and new users run on CPU-only machines.

References

PyTorch docs: torch.cuda.synchronize() only synchronizes CUDA device operations
Python docs: time.perf_counter() is the recommended high-resolution timer for benchmarking

Github: @vinitjain2005

… get_computational_cost functions

dpascualhe

Good! Upon fixing these small issues, we can merge.

Reset random sampling for Open3D-ML models during warm-up runs.

vinitjain2005 · 2026-03-26T02:25:42Z

@dpascualhe I have updated the changes

dpascualhe · 2026-04-17T10:22:30Z

Thanks for addressing the previous concerns! Upon closer inspection, there is a last change that would improve your PR. Could you use the self.device attribute in the condition for calling torch.cuda.synchronize instead of the torch.cuda.is_available? In that way, even if CUDA is available, if model is supposed to be CPU, we will avoid calling synchronize unnecessarily. Sorry for the back and forth, just realized that 😅

vinitjain2005 · 2026-04-18T04:02:13Z

I have done the changes and created a new pr request #553 . So i am closing these pr as fix in #553

fix: store cuda_available variable and extend perf_counter fix to all…

299719d

… get_computational_cost functions

vinitjain2005 mentioned this pull request Mar 21, 2026

[Bug] get_computational_cost() uses torch.cuda.synchronize() unconditionally causing inaccurate CPU timing #453

Open

dpascualhe reviewed Mar 25, 2026

View reviewed changes

Comment thread perceptionmetrics/models/torch_segmentation.py Outdated

Comment thread perceptionmetrics/models/torch_segmentation.py Outdated

vinitjain2005 added 2 commits March 26, 2026 07:47

Add random sampling reset for Open3D-ML models

f53a6fd

Reset random sampling for Open3D-ML models during warm-up runs.

Update torch_segmentation.py

fc39747

dpascualhe mentioned this pull request Apr 17, 2026

[Bug] Default device hardcoded to 'cuda' causes RuntimeError on CPU-only machines. #523

Closed

vinitjain2005 closed this Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#467

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#467
vinitjain2005 wants to merge 3 commits into
JdeRobot:masterfrom
vinitjain2005:fix/improve-computational-cost-timing

vinitjain2005 commented Mar 21, 2026

Uh oh!

dpascualhe left a comment

Uh oh!

Uh oh!

Uh oh!

vinitjain2005 commented Mar 26, 2026

Uh oh!

dpascualhe commented Apr 17, 2026

Uh oh!

vinitjain2005 commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vinitjain2005 commented Mar 21, 2026

Summary

Problem

Fix

Files Changed

Impact

References

Uh oh!

dpascualhe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vinitjain2005 commented Mar 26, 2026

Uh oh!

dpascualhe commented Apr 17, 2026

Uh oh!

vinitjain2005 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vinitjain2005 commented Apr 18, 2026 •

edited

Loading