Skip to content

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#467

Closed
vinitjain2005 wants to merge 3 commits into
JdeRobot:masterfrom
vinitjain2005:fix/improve-computational-cost-timing
Closed

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#467
vinitjain2005 wants to merge 3 commits into
JdeRobot:masterfrom
vinitjain2005:fix/improve-computational-cost-timing

Conversation

@vinitjain2005
Copy link
Copy Markdown
Contributor

Summary

This PR extends the timing fix originally proposed in the closed PR to all get_computational_cost() functions across the library, and also addresses the review comment to store torch.cuda.is_available() in a variable instead of calling it on every loop iteration.

Closes #453

Problem

The get_computational_cost() functions across multiple files had two issues:

  1. torch.cuda.synchronize() was called unconditionally on every iteration — it is a no-op on CPU and was designed only for CUDA devices

  2. time.time() was used for timing — it has low resolution on Windows (15ms granularity) causing timing to show 0ms or jump in 15ms chunks

  3. torch.cuda.is_available() was being called on every loop iteration instead of being stored once as a variable

Fix

# Before
for _ in range(runs):
    torch.cuda.synchronize()        # unconditional, no-op on CPU
    start = time.time()             # low resolution on Windows
    ...
    torch.cuda.synchronize()
    inference_times.append(time.time() - start)

# After
cuda_available = torch.cuda.is_available()  # stored once
for _ in range(runs):
    if cuda_available:
        torch.cuda.synchronize()
    start = time.perf_counter()     # high resolution on all platforms
    ...
    if cuda_available:
        torch.cuda.synchronize()
    inference_times.append(time.perf_counter() - start)

Files Changed

  • perceptionmetrics/models/torch_detection.py

    • Fixed standalone get_computational_cost() function
  • perceptionmetrics/models/torch_segmentation.py

    • Fixed TorchImageSegmentationModel.get_computational_cost()
    • Fixed TorchLiDARSegmentationModel.get_computational_cost()
  • perceptionmetrics/models/tf_segmentation.py

    • Fixed TensorflowImageSegmentationModel.get_computational_cost()
    • Note: Uses has_gpu variable (already stored before loop)
      and replaces time.time() with time.perf_counter()

Impact

Every user running computational cost estimation on CPU gets accurate timing results. This is especially relevant since CUDA is optional and most contributors and new users run on CPU-only machines.

References

  • PyTorch docs: torch.cuda.synchronize() only synchronizes CUDA device operations
  • Python docs: time.perf_counter() is the recommended high-resolution timer for benchmarking

Github: @vinitjain2005

Copy link
Copy Markdown
Collaborator

@dpascualhe dpascualhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! Upon fixing these small issues, we can merge.

Comment thread perceptionmetrics/models/torch_segmentation.py Outdated
Comment thread perceptionmetrics/models/torch_segmentation.py Outdated
Reset random sampling for Open3D-ML models during warm-up runs.
@vinitjain2005
Copy link
Copy Markdown
Contributor Author

@dpascualhe I have updated the changes

@dpascualhe
Copy link
Copy Markdown
Collaborator

Thanks for addressing the previous concerns! Upon closer inspection, there is a last change that would improve your PR. Could you use the self.device attribute in the condition for calling torch.cuda.synchronize instead of the torch.cuda.is_available? In that way, even if CUDA is available, if model is supposed to be CPU, we will avoid calling synchronize unnecessarily. Sorry for the back and forth, just realized that 😅

@vinitjain2005
Copy link
Copy Markdown
Contributor Author

vinitjain2005 commented Apr 18, 2026

I have done the changes and created a new pr request #553 . So i am closing these pr as fix in #553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] get_computational_cost() uses torch.cuda.synchronize() unconditionally causing inaccurate CPU timing

2 participants