Skip to content

Bug: answer_position.split(',') breaks evaluation for sheet names containing commas #33

@twhjames

Description

@twhjames

Description

In evaluation/evaluation.py, the compare_workbooks() function splits the answer_position string on commas to handle multiple sheet/range pairs:

# Line 169
sheet_cell_ranges = answer_position.split(',')

This assumes commas only appear as delimiters between separate sheet/range entries. However, Excel sheet names can legally contain commas (e.g. b2b, sez, de), and when they do, the answer_position wraps them in single quotes per Excel convention:

'b2b, sez, de'!A5:V10

The naive .split(',') does not respect the quoting and tears the string apart incorrectly.

Steps to reproduce

  1. Run evaluation on task 130-9, which has:
    • answer_position: 'b2b, sez, de'!A5:V10
    • answer_sheet: b2b, sez, de
  2. Provide a correct output file (verified by cell-by-cell comparison against the golden file — all values, formats, and structure match perfectly).
  3. Evaluation returns False (score 0).

Root cause

"'b2b, sez, de'!A5:V10".split(',') produces:

["'b2b", " sez", " de'!A5:V10"]

The loop then processes the third fragment " de'!A5:V10", splits on !, strips quotes, and looks for a sheet named de — which does not exist. Result: False.

The first two fragments ('b2b and sez) have no !, so they fall through to the else branch on line 177 and get treated as cell ranges on the first sheet, which is also incorrect.

Suggested fix

Replace line 169 with a regex that respects single-quoted sheet names:

import re
sheet_cell_ranges = re.findall(r"(?:'[^']*'![^,]+|[^,]+)", answer_position)

This matches either a single-quoted name followed by ! and a range, or a plain comma-free token — keeping 'b2b, sez, de'!A5:V10 intact as a single entry.

Impact

All tasks whose sheet names contain commas will silently score 0 even when the model output is fully correct. This affects the reliability of benchmark results without any visible error or warning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions