Skip to content

fix(csv-stringify): quote fields fusing with a multi-character separator#489

Open
spokodev wants to merge 1 commit into
adaltas:masterfrom
spokodev:fix/stringify-multichar-delimiter-fusion
Open

fix(csv-stringify): quote fields fusing with a multi-character separator#489
spokodev wants to merge 1 commit into
adaltas:masterfrom
spokodev:fix/stringify-multichar-delimiter-fusion

Conversation

@spokodev

Copy link
Copy Markdown

When a field is serialized before a multi-character delimiter (or record_delimiter), and the field’s tail is a non-empty prefix of that separator, the appended separator fuses with the field tail. On parse, the separator is then matched inside the original field, silently corrupting the data:

import { stringify } from "csv-stringify/sync";
import { parse } from "csv-parse/sync";

const s = stringify([["a:", "b"]], { delimiter: "::" }); // "a:::b\n"
parse(s, { delimiter: "::" });                           // [["a", ":b"]]  <- not [["a:", "b"]]

The same happens for a multi-character record delimiter with the last field of a record:

stringify([["x", "a#"]], { record_delimiter: "##" });    // x,a###...  -> parses as a record boundary

The quoting decision used value.indexOf(separator) >= 0, which quotes a field that contains the whole separator but misses this boundary fusion. Forcing quoted_string: true already produces correct, round-trippable output, confirming the field simply needs quoting.

The fix routes both delimiter and record_delimiter through a small emits_separator(value, separator) helper:

const emits_separator = function (value, separator) {
  return (
    separator.length !== 0 &&
    (value.indexOf(separator) !== -1 ||
      (separator.length > 1 &&
        (value + separator).indexOf(separator) < value.length))
  );
};

The boundary check runs only for multi-character separators, so single-character delimiters and non-fusing fields are unaffected (no extra quoting) — e.g. delimiter ":x" with field "a:" stays unquoted, since ":x" never starts inside "a:".

Tests added in option.delimiter.js and option.record_delimiter.js (fusing field is quoted; non-fusing field is not). Full suite passes (the one unrelated pre-existing failure in api.callback.js, "catch error in end handler #386", is present on master as well). dist/ is left for the release build, matching #484.

A field whose tail is a non-empty prefix of a multi-character delimiter or
record delimiter must be quoted, otherwise the separator appended after it
re-tokenizes the boundary on parse and the value is silently corrupted:

  stringify([["a:", "b"]], { delimiter: "::" })            // a:::b
  parse("a:::b", { delimiter: "::" })                      // [["a", ":b"]]

The quoting predicate only checked `value.indexOf(separator) >= 0`, which
catches a field that contains the whole separator but misses this boundary
fusion. The same gap affected `record_delimiter`. Both now share an
`emits_separator` helper; single-character separators and non-fusing fields
are unaffected (no extra quoting). RFC 4180 §2.6, generalized to the library's
multi-character separators.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant