Conversation
Add adaptive frame skipping to VoxelGridMapper when processing can't keep up with incoming lidar rate. Self-regulates by using last add_frame() duration as the throttle interval, with a configurable minimum frequency floor. Config: autoscale: bool = True autoscale_min_frequency: float = 1.0 # Hz floor Telemetry via rerun: voxel_mapper/ingest_time_ms - add_frame() cost per frame voxel_mapper/publish_time_ms - get_global_pointcloud2() cost voxel_mapper/map_size - total voxel count voxel_mapper/frames_skipped - cumulative skipped frames Closes #1270
Greptile SummaryAdds adaptive frame skipping to
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LCM as LCM Transport Thread
participant OF as _on_frame()
participant AF as add_frame()
participant RR as Rerun Telemetry
participant PT as _publish_trigger
participant PGM as publish_global_map()
LCM->>OF: frame arrives
alt autoscale ON & frames_processed > 0
OF->>OF: elapsed = now - _last_ingest_time
OF->>OF: throttle = min(last_duration, 1/min_freq)
alt elapsed < throttle
OF-->>LCM: skip (return early)
Note right of OF: _frames_skipped++
end
end
OF->>AF: add_frame(frame)
AF-->>OF: done (ingest_duration measured)
OF->>OF: update _last_ingest_time, _last_ingest_duration
OF->>RR: log ingest_time_ms, map_size, frames_skipped
alt publish_interval == 0
OF->>PT: on_next(None)
PT->>PGM: publish_global_map()
PGM->>RR: log publish_time_ms
PGM->>PGM: global_map.publish(pc)
end
Last reviewed commit: f2ec462 |
dimos/mapping/voxels.py
Outdated
There was a problem hiding this comment.
ZeroDivisionError if autoscale_min_frequency is 0
1.0 / self.config.autoscale_min_frequency will raise ZeroDivisionError at runtime if a user passes autoscale_min_frequency=0. There's no validation preventing this value. Consider adding a guard or a config validation:
| max_interval = 1.0 / self.config.autoscale_min_frequency | |
| max_interval = (1.0 / self.config.autoscale_min_frequency) if self.config.autoscale_min_frequency > 0 else float("inf") |
| t0 = time.monotonic() | ||
| self.add_frame(frame) | ||
| if self.config.publish_interval == 0: | ||
| ingest_duration = time.monotonic() - t0 | ||
|
|
||
| self._last_ingest_time = t0 | ||
| self._last_ingest_duration = ingest_duration |
There was a problem hiding this comment.
_last_ingest_time = t0 limits autoscale effectiveness
_last_ingest_time is set to t0 (the time before add_frame() runs). Since all transports (LCM, SharedMemory, ROS) deliver callbacks synchronously on a single thread, the next _on_frame call can only begin after the current one returns. This means elapsed_since_last = now - t0 will always be >= _last_ingest_duration, so the skip condition elapsed < throttle_interval is effectively never true in production.
The autoscale will only skip frames when _on_frame is called directly from a separate thread (not through the transport), or if _last_ingest_duration is artificially set higher than the actual processing time (as done in the test).
If you want skipping to work with synchronous transport delivery, consider setting _last_ingest_time to time.monotonic() (after processing completes), so the elapsed-time check measures the gap between frames exclusive of processing time:
| t0 = time.monotonic() | |
| self.add_frame(frame) | |
| if self.config.publish_interval == 0: | |
| ingest_duration = time.monotonic() - t0 | |
| self._last_ingest_time = t0 | |
| self._last_ingest_duration = ingest_duration | |
| t0 = time.monotonic() | |
| self.add_frame(frame) | |
| ingest_duration = time.monotonic() - t0 | |
| self._last_ingest_time = time.monotonic() | |
| self._last_ingest_duration = ingest_duration |
With this change, frames that arrive back-to-back (queued in LCM during processing) would be correctly identified as arriving faster than the processing rate.
rr.log() was silently no-op because VoxelGridMapper runs in a
separate Dask worker process from the rerun bridge. Now:
- _ensure_rr() calls rr.init('dimos', spawn=False) in the mapper process
- Uses rr.set_time() with sequence + timestamp for proper time series
- Graphs will appear in Rerun viewer under voxel_mapper/*
Two bugs flagged by Greptile: 1. ZeroDivisionError: autoscale_min_frequency=0 would crash. Fix: guard with > 0 check, treat 0 as 'no frequency floor'. 2. Skip logic was dead code: _last_ingest_time was set before add_frame(), so elapsed_since_last always >= _last_ingest_duration on synchronous transport threads — frames never got skipped. Fix: set _last_ingest_time after add_frame() completes, so it measures idle time between frames. If idle time < last processing duration, the machine is falling behind and the frame is skipped. Also adds test_autoscale_min_frequency_zero_no_crash to cover the ZeroDivisionError case, and updates the saturation test to correctly set both _last_ingest_time and _last_ingest_duration.
|
Fixed both issues from Greptile review (3639cb6): 1. ZeroDivisionError on autoscale_min_frequency=0 2. Skip logic was dead code Fix: set Also added |
|
@leshy this looks like slop to me. def concerned if the agent cant even do this |
Summary
Closes #1270 — Auto-scaling global voxel grid mapper.
What
Adds adaptive frame skipping to
VoxelGridMapperso it self-regulates on weaker machines instead of falling behind on lidar ingest.How
When
autoscale=True(default),_on_frame()skips frames if they arrive faster than the lastadd_frame()processing duration. The system self-regulates: fast machines process every frame, slow machines find their natural ceiling.autoscale_min_frequency(default 1.0 Hz) prevents the map from going completely stale — even if processing takes 2s, we'll still ingest at least 1 frame per second.Config
Telemetry (Rerun)
Logs both ingest and publish timings separately to identify the real bottleneck:
voxel_mapper/ingest_time_ms—add_frame()wall-clock costvoxel_mapper/publish_time_ms—get_global_pointcloud2()+ publish costvoxel_mapper/map_size— total voxel count (correlates with publish cost growth)voxel_mapper/frames_skipped— cumulative frames dropped by autoscalerTesting
4 new tests:
test_autoscale_skips_when_saturated— simulates slow machine (fakes 500ms processing), verifies frames are skippedtest_autoscale_disabled_processes_all— verifiesautoscale=Falseprocesses every frametest_autoscale_min_frequency_respected— verifies high min_frequency prevents excessive skippingtest_autoscale_rerun_logging— verifies rerun telemetry is logged correctlyAll existing tests unchanged and passing.
How to Test
To observe autoscaling live with rerun on a real robot: