Skip to content

fix: slice numpy array values in custom_data per row in CSVSink#2199

Merged
Borda merged 7 commits intoroboflow:developfrom
farukalamai:fix/csv-json-sink-custom-data-array-slicing
Apr 13, 2026
Merged

fix: slice numpy array values in custom_data per row in CSVSink#2199
Borda merged 7 commits intoroboflow:developfrom
farukalamai:fix/csv-json-sink-custom-data-array-slicing

Conversation

@farukalamai
Copy link
Copy Markdown
Contributor

Before submitting
  • Self-reviewed the code
  • Updated documentation, follow Google-style
  • Added docs entry for autogeneration (if new functions/classes)
  • Added/updated tests
  • All tests pass locally

Description

Fixes a bug in CSVSink and JSONSink where passing a numpy array as a
custom_data value wrote the entire array on every row instead of the
per-detection scalar value.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)

Motivation and Context

When users pass computed per-detection values like detections.area via
custom_data, each row should receive its own scalar — not the whole array.

# Before (broken): every row got the full array
with sv.CSVSink("out.csv") as sink:
    sink.append(detections, custom_data={"area": detections.area})
# area column: [400.0, 400.0] on every row ❌

# After (fixed): each row gets its own value
# area column: 400.0, 400.0 ✅

The root cause was row.update(custom_data) inside the per-detection loop,
which blindly wrote the whole value. The fix applies the same per-index
slicing logic that detections.data already uses correctly.

Closes #1397

Changes Made

  • src/supervision/detection/tools/csv_sink.py — slice numpy array values in custom_data per detection row
  • src/supervision/detection/tools/json_sink.py — same fix
  • tests/detection/test_csv.py — added test case for numpy array in custom_data

Testing

  • I have tested this code locally
  • I have added unit tests that prove my fix is effective or that my feature works
  • All new and existing tests pass

Google Colab (optional)

Colab link:

Screenshots/Videos (optional)

Additional Notes

The fix is backward compatible — scalar values in custom_data (e.g.
{"frame_number": 42}) continue to work as before, written as-is on every
row.

@farukalamai farukalamai requested a review from SkalskiP as a code owner April 3, 2026 20:38
@Borda Borda requested a review from Copilot April 8, 2026 12:27
@Borda Borda changed the title fix: slice numpy array values in custom_data per row in CSVSink and J… fix: slice numpy array values in custom_data per row in CSVSink Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78%. Comparing base (72fc49f) to head (43da77a).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2199   +/-   ##
=======================================
  Coverage       78%     78%           
=======================================
  Files           63      63           
  Lines         7972    7979    +7     
=======================================
+ Hits          6248    6257    +9     
+ Misses        1724    1722    -2     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect serialization of per-detection custom_data in CSVSink/JSONSink when users pass numpy arrays (previously the full array was written on every row), aligning output with expected “one value per detection row” behavior.

Changes:

  • Update CSVSink.parse_detection_data() to slice custom_data numpy arrays per detection row.
  • Update JSONSink.parse_detection_data() to slice custom_data numpy arrays per detection row.
  • Add a unit test ensuring CSVSink slices numpy-array custom_data per row.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/supervision/detection/tools/csv_sink.py Slice numpy-array custom_data per detection row when producing CSV rows.
src/supervision/detection/tools/json_sink.py Apply analogous per-row slicing for numpy-array custom_data when producing JSON rows.
tests/detection/test_csv.py Add regression test covering numpy-array custom_data in CSVSink.

Comment thread src/supervision/detection/tools/json_sink.py Outdated
Comment thread src/supervision/detection/tools/json_sink.py Outdated
Comment thread src/supervision/detection/tools/json_sink.py
Comment thread src/supervision/detection/tools/csv_sink.py Outdated
Comment thread src/supervision/detection/tools/csv_sink.py Outdated
@Borda Borda added waiting for author bug Something isn't working labels Apr 8, 2026
Borda and others added 5 commits April 13, 2026 11:21
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The else branch added by the original fix used hasattr(value, "__getitem__")
to decide whether to slice custom_data values per detection. This incorrectly
indexes dicts by integer 0, raising KeyError when custom_data contains dict
values (e.g. {"metadata": {"sensor_id": 101}}).

Non-ndarray types (dicts, scalars, lists used as a single value per detection)
should be written as-is. Only np.ndarray values require per-row indexing.

[resolve roboflow#1] /review finding by sw-engineer (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): CSVSink else branch KeyError:0 on dict custom_data at csv_sink.py:153

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Adds a parametrized case to test_json_sink verifying that np.ndarray
values in custom_data are sliced per detection row (not written as the
full array on every row). Mirrors the existing CSV counterpart added
by the original PR.

[resolve roboflow#3] /review finding by qa-specialist (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): Missing JSONSink test for numpy array custom_data slicing

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Exercises all three dispatch branches of the custom_data handler
simultaneously: np.ndarray values are sliced per detection row while
scalar values (int, str, etc.) are broadcast as-is to every row.

[resolve roboflow#4] /review finding by qa-specialist (report: .temp/output-review-fix-csv-json-sink-custom-data-array-slicing-2026-04-13.md): Missing test for mixed-type custom_data

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda
Copy link
Copy Markdown
Member

Borda commented Apr 13, 2026

Thanks for the fix @farukalamai! I've pushed 3 follow-up commits to your branch:

  1. fix: restore non-ndarray custom_data passthrough in CSVSink — the new else branch used hasattr(value, "__getitem__") to decide whether to slice custom_data values per detection. This accidentally indexed dicts by integer 0, raising KeyError: 0 for the existing "Complex Data" test case (custom_data={"metadata": {"sensor_id": 101}} etc.). Non-ndarray types should be written as-is; only np.ndarray values need per-row indexing. Changed to row[key] = value in the else branch.

  2. test: add JSONSink test for numpy array custom_data slicing — symmetric test for JSONSink to match the CSV coverage you added.

  3. test: add mixed-type custom_data test (ndarray + scalar together) — exercises all three dispatch paths at once: ndarray sliced per row + scalar broadcast to every row.

All pre-commit hooks pass and 15/15 tests pass.

Extract a private _slice_value(value, i) static method into both
CSVSink and JSONSink that centralises the per-row ndarray dispatch:
0-d ndarray -> value as-is, n-d ndarray -> value[i], anything else
-> value as-is. Both parse_detection_data methods now call this helper
for detections.data and custom_data, removing the duplicated isinstance
chains and eliminating the dead `hasattr(__getitem__)` else-branch
(detections.data values are always ndarrays via convert_data).

Drop the pure-ndarray-only test_detections_array_custom_data.csv case
from test_csv_sink; the mixed test (ndarray + scalar together) is a
strict superset that exercises all three dispatch paths simultaneously.

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda merged commit e514142 into roboflow:develop Apr 13, 2026
24 checks passed
@Borda Borda mentioned this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working waiting for author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save detection area with CSVSink

3 participants