fix: slice numpy array values in custom_data per row in CSVSink#2199
Conversation
custom_data per row in CSVSink
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2199 +/- ##
=======================================
Coverage 78% 78%
=======================================
Files 63 63
Lines 7972 7979 +7
=======================================
+ Hits 6248 6257 +9
+ Misses 1724 1722 -2 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes incorrect serialization of per-detection custom_data in CSVSink/JSONSink when users pass numpy arrays (previously the full array was written on every row), aligning output with expected “one value per detection row” behavior.
Changes:
- Update
CSVSink.parse_detection_data()to slicecustom_datanumpy arrays per detection row. - Update
JSONSink.parse_detection_data()to slicecustom_datanumpy arrays per detection row. - Add a unit test ensuring
CSVSinkslices numpy-arraycustom_dataper row.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/supervision/detection/tools/csv_sink.py |
Slice numpy-array custom_data per detection row when producing CSV rows. |
src/supervision/detection/tools/json_sink.py |
Apply analogous per-row slicing for numpy-array custom_data when producing JSON rows. |
tests/detection/test_csv.py |
Add regression test covering numpy-array custom_data in CSVSink. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The else branch added by the original fix used hasattr(value, "__getitem__")
to decide whether to slice custom_data values per detection. This incorrectly
indexes dicts by integer 0, raising KeyError when custom_data contains dict
values (e.g. {"metadata": {"sensor_id": 101}}).
Non-ndarray types (dicts, scalars, lists used as a single value per detection)
should be written as-is. Only np.ndarray values require per-row indexing.
[resolve roboflow#1] /review finding by sw-engineer (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): CSVSink else branch KeyError:0 on dict custom_data at csv_sink.py:153
---
Co-authored-by: Claude Code <noreply@anthropic.com>
Adds a parametrized case to test_json_sink verifying that np.ndarray values in custom_data are sliced per detection row (not written as the full array on every row). Mirrors the existing CSV counterpart added by the original PR. [resolve roboflow#3] /review finding by qa-specialist (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): Missing JSONSink test for numpy array custom_data slicing --- Co-authored-by: Claude Code <noreply@anthropic.com>
Exercises all three dispatch branches of the custom_data handler simultaneously: np.ndarray values are sliced per detection row while scalar values (int, str, etc.) are broadcast as-is to every row. [resolve roboflow#4] /review finding by qa-specialist (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): Missing test for mixed-type custom_data --- Co-authored-by: Claude Code <noreply@anthropic.com>
|
Thanks for the fix @farukalamai! I've pushed 3 follow-up commits to your branch:
All pre-commit hooks pass and 15/15 tests pass. |
Extract a private _slice_value(value, i) static method into both CSVSink and JSONSink that centralises the per-row ndarray dispatch: 0-d ndarray -> value as-is, n-d ndarray -> value[i], anything else -> value as-is. Both parse_detection_data methods now call this helper for detections.data and custom_data, removing the duplicated isinstance chains and eliminating the dead `hasattr(__getitem__)` else-branch (detections.data values are always ndarrays via convert_data). Drop the pure-ndarray-only test_detections_array_custom_data.csv case from test_csv_sink; the mixed test (ndarray + scalar together) is a strict superset that exercises all three dispatch paths simultaneously. --- Co-authored-by: Claude Code <noreply@anthropic.com>
Before submitting
Description
Fixes a bug in
CSVSinkandJSONSinkwhere passing a numpy array as acustom_datavalue wrote the entire array on every row instead of theper-detection scalar value.
Type of Change
Motivation and Context
When users pass computed per-detection values like
detections.areaviacustom_data, each row should receive its own scalar — not the whole array.The root cause was
row.update(custom_data)inside the per-detection loop,which blindly wrote the whole value. The fix applies the same per-index
slicing logic that
detections.dataalready uses correctly.Closes #1397
Changes Made
src/supervision/detection/tools/csv_sink.py— slice numpy array values incustom_dataper detection rowsrc/supervision/detection/tools/json_sink.py— same fixtests/detection/test_csv.py— added test case for numpy array incustom_dataTesting
Google Colab (optional)
Colab link:
Screenshots/Videos (optional)
Additional Notes
The fix is backward compatible — scalar values in
custom_data(e.g.{"frame_number": 42}) continue to work as before, written as-is on everyrow.