filters: address TODOs for getMinMax3D to be templated and optimized#6409
filters: address TODOs for getMinMax3D to be templated and optimized#6409Stado1 wants to merge 3 commits intoPointCloudLibrary:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the pcl::getMinMax3D overloads for pcl::PCLPointCloud2 to address TODOs by templating coordinate/distance types and applying a few micro-optimizations (e.g., skipping finiteness checks for dense clouds).
Changes:
- Removed the non-templated
PCLPointCloud2getMinMax3Dimplementations fromfilters/src/voxel_grid.cpp. - Replaced the exported non-template declarations in
voxel_grid.hwith templated declarations forPCLPointCloud2. - Added the new templated
PCLPointCloud2getMinMax3Ddefinitions intofilters/include/pcl/filters/impl/voxel_grid.hpp, plus minor formatting cleanups.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| filters/src/voxel_grid.cpp | Deletes the previous float-only PCLPointCloud2 min/max implementations (now moved to templates). |
| filters/include/pcl/filters/voxel_grid.h | Replaces exported non-template API with template declarations for PCLPointCloud2 overloads. |
| filters/include/pcl/filters/impl/voxel_grid.hpp | Adds templated PCLPointCloud2 min/max implementations and dense-branch optimizations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mvieth
left a comment
There was a problem hiding this comment.
First of all, thank you for your pull request!
I am wondering: did you benchmark your new solution with separate x, y, z variables against the previous implementation with Array4f? Because with the previous implementation, the compiler could use SSE (SIMD) instructions to compare all coordinates at once, which might be faster even if the fourth element is unused.
| if (limit_negative) | ||
| { | ||
| if ((distance_value < max_distance) && (distance_value > min_distance)) |
There was a problem hiding this comment.
Minor optimization: We can check limit_negative == (distance_value < max_distance && distance_value > min_distance) here and skip this point if true (3 more like this below)
There was a problem hiding this comment.
These are the results with this minor optimisation.
Environment:
- CPU: Intel(R) Xeon(R) CPU E5462 @ 2.80GHz
- OS: Ubuntu 24.04.3 LTS
- Compiler: GCC 13.3.0 (Optimization:
-O3)
- Basic Function (No Filters)
10M random points, 100 loops | 1k points, 1M loops
points have x, y and z values from -100.0 to 100.0
| Dataset Size | Configuration | Original | PR Code | Improvement |
|---|---|---|---|---|
| 10M Points | Dense | 8.4s | 4.4s | +47.6% |
| 10M Points | Non-Dense | 8.4s | 5.9s | +29.7% |
| 1k Points | Dense | 8.0s | 3.2s | +60.0% |
| 1k Points | Non-Dense | 8.0s | 5.2s | +35.0% |
- Selective Access (Indices Only)
Indices are set to every other entry
| Dataset Size | Configuration | Original | PR Code | Improvement |
|---|---|---|---|---|
| 10M Points | Dense | 4.8s | 4.6s | +4.1% |
| 10M Points | Non-Dense | 5.0s | 4.8s | +4.0% |
| 1k Points | Dense | 4.0s | 2.6s | +35.0% |
| 1k Points | Non-Dense | 4.2s | 3.4s | +19.0% |
- Coordinate Filtering (Bounds Only)
Bounded for x-axis -50.0 to 50.0
| Dataset Size | Configuration | Original | PR Code | Improvement |
|---|---|---|---|---|
| 10M Points | Dense | 9.4s | 8.6s | +8.5% |
| 10M Points | Non-Dense | 9.4s | 9.4s | 0.0% |
| 1k Points | Dense | 9.1s | 8.0s | +12.1% |
| 1k Points | Non-Dense | 9.1s | 8.5s | +6.6% |
- Combined (Bounds & Indices)
Bounded for x-axis -50.0 to 50.0 and indices are every other entry
| Dataset Size | Configuration | Original | PR Code | Delta |
|---|---|---|---|---|
| 10M Points | Dense | 5.5s | 5.7s | −3.6% |
| 10M Points | Non-Dense | 5.8s | 5.9s | −1.7% |
| 1k Points | Dense | 5.0s | 5.1s | −2.0% |
| 1k Points | Non-Dense | 4.5s | 4.8s | −6.6% |
This PR addresses the TODOs for getMinMax3D functions for PCLPointCloud2 to improve performance and allow for different types than just float to be used.
implemented templates:
Performance Optimization: