From 3d19e9d4a7f7a45753ca3f472c58c92cc197c81e Mon Sep 17 00:00:00 2001 From: Hyungtae Lim Date: Sat, 23 May 2026 14:09:39 +0900 Subject: [PATCH 1/3] chore(release): v1.4.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch bump for the per-patch heap-traffic refactor (#100). pypatchworkpp.patchworkpp gains 14.8% Hz (97.5 -> 111.9 Hz on KITTI seq 00, i7-12700), driven by killing short-lived vector / Eigen::Matrix allocations in R-VPF + R-GPF. Closes part of #96. Numerical equivalence verified end-to-end on KITTI seq 00 (4541 frames): F1 delta 0.00 for patchwork classic (bit-identical), +0.01 for patchwork++ (within the ±0.05 macro budget). Bumps: - python/pyproject.toml 1.4.0 -> 1.4.1 - cpp/CMakeLists.txt 1.4.0 -> 1.4.1 CHANGELOG.md updated with the full v1.4.1 entry. See #100. --- CHANGELOG.md | 66 +++++++++++++++++++++++++++++++++++++++++++ cpp/CMakeLists.txt | 2 +- python/pyproject.toml | 2 +- 3 files changed, 68 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a934cb..0e36884 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,71 @@ # Changelog +## v1.4.1 + +### Perf — `pypatchworkpp.patchworkpp` per-patch heap traffic eliminated + +Three changes inside `PatchWorkpp::extract_piecewiseground` and +`PatchWorkpp::estimate_plane` take KITTI seq 00 from 10.26 ms to +8.94 ms per frame (**97.5 Hz → 111.9 Hz, +14.8% Hz**, median of 3 +runs, i7-12700). This closes the part of #96 that was driven by +short-lived allocations in R-VPF + R-GPF. + +**High-impact (this is where the +14.8% comes from):** + +- `estimate_plane`: drop `Eigen::MatrixX3f eigen_ground` + `centered` + + `centered.adjoint() * centered`. Replace with a single-pass + scalar accumulation of mean and 9 cross-products, then build the + 3x3 covariance on the stack. No more per-call Eigen heap + allocations. +- `extract_piecewiseground`: promote `src_wo_verticals` and + `src_tmp` to reused instance scratch members. `vector::clear()` + keeps capacity, so per-patch malloc pressure on the glibc heap + (which was serialising the loop, see #96) drops away after the + first few patches. +- `estimateGround` main loop: `auto& zone` instead of `auto zone` + for `ConcentricZoneModel_[zone_idx]`. Avoids a deep-copy of the + full 3-level nested vector per outer iteration. Safe because each + `(zone, ring, sector)` patch is read once and the CZM is flushed + at the top of every `estimateGround` call. + +**Lower-impact, kept for cleanliness:** + +- `JacobiSVD` to `SelfAdjointEigenSolver::computeDirect` + for the 3x3 PSD covariance in both `cpp/common/src/plane_fit.cpp` + and the in-place `PatchWorkpp::estimate_plane`. Closed-form, no + Jacobi iterations. `singular_values_` is repacked descending so + every consumer (`linearity_` / `planarity_` in `common`, + `flatness_thr` index `(2)` in patchwork classic, + `ground_flatness=minCoeff()` and `line_variable=sv(0)/sv(1)` in + patchwork++) keeps the same convention bit-for-bit. +- `const&` on `addCloud`'s `add` parameter, `RevertCandidate` loop + vars, and the `temporal_ground_revert` / + `calc_point_to_plane_d` / `calc_mean_stdev` signatures. + +Patchwork classic is unaffected on the perf side: TBB +`parallel_for` already amortises allocations across cores and SVD +is sub-us/patch. + +### Numerical equivalence + +KITTI seq 00 (4541 frames), v1.4.0 to v1.4.1: + +| Method (protocol) | Before | After | Δ F1 | +| ------------------- | -------------------------- | -------------------------- | ----: | +| `patchwork` (pw) | P 92.34, R 94.64, F1 93.41 | P 92.34, R 94.64, F1 93.41 | 0.00 | +| `patchworkpp` (pp) | P 94.88, R 98.47, F1 96.62 | P 94.89, R 98.48, F1 96.63 | +0.01 | + +Algebraic identity of `JacobiSVD` vs `eigh` verified on 500 real +KITTI patch covariances: `normal_` (up to sign), +`singular_values_`, `linearity_`, `planarity_`, `ground_flatness`, +`line_variable` all match to FP precision. Both within the ±0.05 +macro budget. + +### References + +- #100 — PR (perf: alloc-free + eigh) +- #96 — Issue (R-VPF / R-GPF allocation profile) + ## v1.4.0 ### Refactor — shared `common` library + optional TBB parallelisation diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index 4fca928..fdfb9b4 100755 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -1,5 +1,5 @@ cmake_minimum_required(VERSION 3.11) -project(patchworkpp VERSION 1.4.0) +project(patchworkpp VERSION 1.4.1) option(USE_SYSTEM_EIGEN3 "Use system pre-installed Eigen" OFF) option(INCLUDE_CPP_EXAMPLES "Include C++ example codes, which require Open3D for visualization" OFF) diff --git a/python/pyproject.toml b/python/pyproject.toml index 626551c..3e36e30 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "scikit_build_core.build" [project] name = "pypatchworkpp" -version = "1.4.0" +version = "1.4.1" requires-python = ">=3.8" description = "ground segmentation" dependencies = [ From 40b81192213fa57c8cbc3d71f48e184269dcf8ce Mon Sep 17 00:00:00 2001 From: Hyungtae Lim Date: Sat, 23 May 2026 14:51:04 +0900 Subject: [PATCH 2/3] chore(release): mdformat CHANGELOG.md --- CHANGELOG.md | 47 +++++++++++++++++++++++------------------------ 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e36884..d3d1246 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,11 +12,10 @@ short-lived allocations in R-VPF + R-GPF. **High-impact (this is where the +14.8% comes from):** -- `estimate_plane`: drop `Eigen::MatrixX3f eigen_ground` + `centered` - + `centered.adjoint() * centered`. Replace with a single-pass - scalar accumulation of mean and 9 cross-products, then build the - 3x3 covariance on the stack. No more per-call Eigen heap - allocations. +- `estimate_plane`: drop the `Eigen::MatrixX3f eigen_ground`, + `centered`, and `centered.adjoint() * centered` heap allocations. + Replace with a single-pass scalar accumulation of mean and 9 + cross-products, then build the 3x3 covariance on the stack. - `extract_piecewiseground`: promote `src_wo_verticals` and `src_tmp` to reused instance scratch members. `vector::clear()` keeps capacity, so per-patch malloc pressure on the glibc heap @@ -50,10 +49,10 @@ is sub-us/patch. KITTI seq 00 (4541 frames), v1.4.0 to v1.4.1: -| Method (protocol) | Before | After | Δ F1 | -| ------------------- | -------------------------- | -------------------------- | ----: | -| `patchwork` (pw) | P 92.34, R 94.64, F1 93.41 | P 92.34, R 94.64, F1 93.41 | 0.00 | -| `patchworkpp` (pp) | P 94.88, R 98.47, F1 96.62 | P 94.89, R 98.48, F1 96.63 | +0.01 | +| Method (protocol) | Before | After | Δ F1 | +| ------------------ | -------------------------- | -------------------------- | ----: | +| `patchwork` (pw) | P 92.34, R 94.64, F1 93.41 | P 92.34, R 94.64, F1 93.41 | 0.00 | +| `patchworkpp` (pp) | P 94.88, R 98.47, F1 96.62 | P 94.89, R 98.48, F1 96.63 | +0.01 | Algebraic identity of `JacobiSVD` vs `eigh` verified on 500 real KITTI patch covariances: `normal_` (up to sign), @@ -95,10 +94,10 @@ order so numerical results are byte-identical to the sequential path. Measured on KITTI seq 00 (i7-12700, 24 logical cores): -| Configuration | Median ms/frame | Median Hz | -| -- | --: | --: | -| `--method patchwork` single-thread (taskset -c 0) | 8.31 | 120.4 | -| `--method patchwork` parallel (TBB default scheduler) | **4.81** | **207.8** | +| Configuration | Median ms/frame | Median Hz | +| ----------------------------------------------------- | --------------: | --------: | +| `--method patchwork` single-thread (taskset -c 0) | 8.31 | 120.4 | +| `--method patchwork` parallel (TBB default scheduler) | **4.81** | **207.8** | **1.73× speedup**. TBB is an **optional** build dependency: missing TBB causes a CMake STATUS message and falls back to a sequential @@ -129,10 +128,10 @@ or a real user CPU complaint). KITTI 00-10 full sweep (23,201 frames), Patchwork++ paper protocol, v1.3.1 → v1.4.0: -| Method | F1 v1.3.1 | F1 v1.4.0 | Δ | -| --- | --- | --- | --- | -| `--method patchwork` | 96.0172 | 96.0172 | 0 (byte-identical) | -| `--method patchworkpp` | 96.2918 | 96.2919 | +0.0001 (float noise) | +| Method | F1 v1.3.1 | F1 v1.4.0 | Δ | +| ---------------------- | --------- | --------- | --------------------- | +| `--method patchwork` | 96.0172 | 96.0172 | 0 (byte-identical) | +| `--method patchworkpp` | 96.2918 | 96.2919 | +0.0001 (float noise) | Both well within the ±0.05 budget set in the refactor plan. @@ -177,12 +176,12 @@ parameters (`uprightness_thr=0.707`, `using_global_thr=false`) on SemanticKITTI sequences 00–10 (23,201 frames), under the Patchwork++ paper evaluation protocol (Sec. IV.A — VEGETATION excluded): -| Configuration | Precision | Recall | F1 | -| --- | --- | --- | --- | -| v1.2.0 (`pypatchworkpp.patchwork`) | 89.70 | 98.49 | 93.73 | -| **v1.3.0 (`pypatchworkpp.patchwork`)** | **94.64** | **97.58** | **96.02** | -| Original Patchwork ROS 2 (reference) | 94.38 | 97.90 | 96.05 | -| Patchwork++ paper Table I, Patchwork \[1\] | 94.23 | 97.62 | 95.88 | +| Configuration | Precision | Recall | F1 | +| ---------------------------------------- | --------- | --------- | --------- | +| v1.2.0 (`pypatchworkpp.patchwork`) | 89.70 | 98.49 | 93.73 | +| **v1.3.0 (`pypatchworkpp.patchwork`)** | **94.64** | **97.58** | **96.02** | +| Original Patchwork ROS 2 (reference) | 94.38 | 97.90 | 96.05 | +| Patchwork++ paper Table I, Patchwork [1] | 94.23 | 97.62 | 95.88 | **+2.29 F1** vs v1.2.0; within ±0.14 F1 of the original Patchwork ROS 2 build and within paper run-to-run variance of Table I. @@ -195,7 +194,7 @@ Fixes: effectively never fired for normal ground. 1. Plane-distance comparison now uses uncentred `normal · p` against `th_dist_d_ = th_dist − d_`, which is equivalent to "signed distance to - plane \< th_dist". The previous centred form shifted the cutoff by an + plane < th_dist". The previous centred form shifted the cutoff by an extra `−d_ ≈ |normal · mean| ≈ 1.6 m` on KITTI ground. 1. The elevation/flatness tier index is now the GLOBAL ring index across all zones, so each of the first `elevation_thr.size()` rings gets its own From e45dfa1c694ecfbe99b10b2f8cfd87a4e578dc71 Mon Sep 17 00:00:00 2001 From: Hyungtae Lim Date: Sat, 23 May 2026 15:22:06 +0900 Subject: [PATCH 3/3] chore(release): mdformat 0.7.9 escapes --- CHANGELOG.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d3d1246..ae1c5a5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -181,7 +181,7 @@ protocol (Sec. IV.A — VEGETATION excluded): | v1.2.0 (`pypatchworkpp.patchwork`) | 89.70 | 98.49 | 93.73 | | **v1.3.0 (`pypatchworkpp.patchwork`)** | **94.64** | **97.58** | **96.02** | | Original Patchwork ROS 2 (reference) | 94.38 | 97.90 | 96.05 | -| Patchwork++ paper Table I, Patchwork [1] | 94.23 | 97.62 | 95.88 | +| Patchwork++ paper Table I, Patchwork \[1\] | 94.23 | 97.62 | 95.88 | **+2.29 F1** vs v1.2.0; within ±0.14 F1 of the original Patchwork ROS 2 build and within paper run-to-run variance of Table I. @@ -194,7 +194,7 @@ Fixes: effectively never fired for normal ground. 1. Plane-distance comparison now uses uncentred `normal · p` against `th_dist_d_ = th_dist − d_`, which is equivalent to "signed distance to - plane < th_dist". The previous centred form shifted the cutoff by an + plane \< th_dist". The previous centred form shifted the cutoff by an extra `−d_ ≈ |normal · mean| ≈ 1.6 m` on KITTI ground. 1. The elevation/flatness tier index is now the GLOBAL ring index across all zones, so each of the first `elevation_thr.size()` rings gets its own