Skip to content

fix: wrap comma-containing field in quotes in v5.4 Field_Level CSV (Fixes #775, #744)#792

Open
rtmalikian wants to merge 1 commit into
OHDSI:mainfrom
rtmalikian:fix/issue-775-csv-quoting
Open

fix: wrap comma-containing field in quotes in v5.4 Field_Level CSV (Fixes #775, #744)#792
rtmalikian wants to merge 1 commit into
OHDSI:mainfrom
rtmalikian:fix/issue-775-csv-quoting

Conversation

@rtmalikian

Copy link
Copy Markdown

Fixes #775
Fixes #744

Problem

The OMOP_CDMv5.4_Field_Level.csv file has a CSV formatting error at the cdm_source.source_documentation_reference row. The userGuidance field value contains an unquoted comma:

Refers to a publication or web resource describing the source data, e.g. a data dictionary.

Standard CSV parsers (Python csv, csvlint.io, GitHub's web UI) interpret the comma as a field delimiter, splitting this row into 14 fields instead of the expected 13. GitHub even displays a warning banner when viewing the file.

The same issue does not exist in v5.3 or v6.0 — only v5.4 is affected.

Solution

Wrap the userGuidance field in double quotes to properly escape the embedded comma:

-cdm_source,source_documentation_reference,No,varchar(255),Refers to a publication or web resource describing the source data, e.g. a data dictionary.,NA,...
+cdm_source,source_documentation_reference,No,varchar(255),"Refers to a publication or web resource describing the source data, e.g. a data dictionary.",NA,...

Verification

  • csv.reader() now parses all 551 data rows with exactly 13 fields each (previously row 355 had 14)
  • git diff --cached confirms CRLF line endings preserved (1 line changed, not a multi-line rewrite)
  • Only inst/csv/OMOP_CDMv5.4_Field_Level.csv modified — no other files affected

Changelog

Date Change Author
2026-06-19 Wrap comma-containing userGuidance field in quotes in v5.4 Field_Level CSV rtmalikian

Files Changed

  • inst/csv/OMOP_CDMv5.4_Field_Level.csv — Added quotes around source_documentation_reference userGuidance field (line 399)

Verification

  • Python csv.reader() parses all rows correctly (13 fields each)
  • CRLF line endings preserved in the modified line

About the Author: Raphael Malikian — Clinical AI Solutions Architect. I specialise in building and fixing AI/ML systems for healthcare, including vector databases, RAG pipelines, and clinical NLP. If you need help with your project or think I can add value to your organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn: http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a


Disclosure: This code was developed with assistance from mimo-v2.5-pro (Xiaomi) via Hermes Agent (Nous Research). All changes were reviewed, tested against the actual codebase, and verified for correctness.

The userGuidance field for cdm_source.source_documentation_reference
contains a comma ('...source data, e.g. a data dictionary.') that was
not wrapped in quotes, causing CSV parsers to split it into 14 fields
instead of 13.

Adds double quotes around the field value to properly escape the comma.

Fixes OHDSI#775
Fixes OHDSI#744

Signed-off-by: Raphael Malikian <rtmalikian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing quotes around text block in CDM definition (CSV file for v5.4) result in incorrect number of items for that row. Fix CSV file formatting

1 participant