-
-
Notifications
You must be signed in to change notification settings - Fork 708
Description
Description
In src/formattedcode/output_csv.py, the flatten_scan() function incorrectly assigns copyr['start_line'] to inf['end_line'] for copyrights, holders, and authors entries. This means the end_line column in CSV output always contains the same value as start_line, losing the actual end line information.
Affected lines
- Line 188 (copyrights):
inf['end_line'] = copyr['start_line'] - Line 196 (holders):
inf['end_line'] = copyr['start_line'] - Line 204 (authors):
inf['end_line'] = copyr['start_line']
All three should be copyr['end_line'].
Evidence
The underlying data model (CopyrightDetection, HolderDetection, AuthorDetection in src/cluecode/copyrights.py) all define both start_line and end_line as attrs fields. The JSON output correctly includes both values. Only the CSV output has this issue.
Git blame traces this to commit ef8086d ("Update CSV output to latest copyright data format", ~2018), suggesting a copy-paste error in the original refactor.
Expected behavior
inf['end_line'] should be assigned copyr['end_line'] so that the CSV output reflects accurate line ranges for copyright, holder, and author detections.
How to reproduce
- Run a scan with copyright detection and CSV output: scancode -c --csv output.csv <input_path>
- Open
output.csvand compare thestart_lineandend_linecolumns for copyright/holder/author rows - Notice
end_linealways equalsstart_line, even for multi-line copyright statements