Skip to content

Bug: CSV output assigns start_line to end_line for copyrights, holders, and authors #4785

@codewithfourtix

Description

@codewithfourtix

Description

In src/formattedcode/output_csv.py, the flatten_scan() function incorrectly assigns copyr['start_line'] to inf['end_line'] for copyrights, holders, and authors entries. This means the end_line column in CSV output always contains the same value as start_line, losing the actual end line information.

Affected lines

  • Line 188 (copyrights): inf['end_line'] = copyr['start_line']
  • Line 196 (holders): inf['end_line'] = copyr['start_line']
  • Line 204 (authors): inf['end_line'] = copyr['start_line']

All three should be copyr['end_line'].

Evidence

The underlying data model (CopyrightDetection, HolderDetection, AuthorDetection in src/cluecode/copyrights.py) all define both start_line and end_line as attrs fields. The JSON output correctly includes both values. Only the CSV output has this issue.

Git blame traces this to commit ef8086d ("Update CSV output to latest copyright data format", ~2018), suggesting a copy-paste error in the original refactor.

Expected behavior

inf['end_line'] should be assigned copyr['end_line'] so that the CSV output reflects accurate line ranges for copyright, holder, and author detections.

How to reproduce

  1. Run a scan with copyright detection and CSV output: scancode -c --csv output.csv <input_path>
  2. Open output.csv and compare the start_line and end_line columns for copyright/holder/author rows
  3. Notice end_line always equals start_line, even for multi-line copyright statements

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions