Skip to content

ENH: Ingest ITKIOScanco into Modules/IO#6266

Open
hjmjohnson wants to merge 134 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:ingest-IOScanco
Open

ENH: Ingest ITKIOScanco into Modules/IO#6266
hjmjohnson wants to merge 134 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:ingest-IOScanco

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

Ingest the ITKIOScanco remote module into Modules/IO/IOScanco (Wave 2 of #6160), preserving upstream merge topology. Replaces the .remote.cmake reference and enables Module_IOScanco in the configure-ci pixi task.

What was ingested
Merge-topology preservation

The ingest was performed via --allow-unrelated-histories merge of the filter-repo'd upstream module history. All upstream merges from the original KitwareMedical/ITKIOScanco repository are preserved as two-parent merge commits; git blame walks back through the merge into the original upstream contributors.

git rev-list --count upstream/main..HEAD          = 126
git rev-list --count --merges upstream/main..HEAD = 25
Follow-on commits
  • COMP: Remove IOScanco .remote.cmake (in-tree) — drops the now-redundant remote-module manifest.
  • ENH: Enable Module_IOScanco in configure-ci pixi task — surfaces the in-tree module in the CI configure step.
  • ENH: Normalize C0004255.ISQ to CID content-link (now in ITKTestingData) — switches the test fixture from a .sha512 URL reference to an ExternalData CID link.
Test data (ITKTestingData)

The C0004255.ISQ test fixture has been published to ITKTestingData via PR InsightSoftwareConsortium/ITKTestingData#50 (merged). CID:

bafkreigcun2qy5mcnsyhpwjaspkds5wmanibtc2v5xwnnajgl3v2x62drm

The ingest tip references this CID directly via ExternalData content-link, so the test resolves once the ITKTestingData submodule reference picks up the merged PR.

hjmjohnson and others added 7 commits April 22, 2026 21:24
GCC's -Wstringop-truncation flagged 6 calls in itkScancoImageIO.cxx
where strncpy was invoked with a bound equal to the destination size.
When the source string is at least as long as the destination, strncpy
leaves the buffer without a NUL terminator, causing undefined behavior
in subsequent C-string accessors (GetVersion, GetPatientName, etc.).

The in-memory header buffers (m_Version[18], m_PatientName[42],
m_CreationDate[32], m_ModificationDate[32], m_RescaleUnits[18],
m_CalibrationData[66]) are explicitly sized two bytes wider than the
on-disk fixed-width fields (16, 40, 8/decoded-32, 32, 16, 64) so that a
NUL terminator always fits; itkISQHeaderIO.cxx:137 documents this with
an explicit '\\0' write.

Switch all six metadata-import sites and the five (six counting
ModificationDate) public Set*() inline accessors to
std::snprintf(dst, sizeof(dst), "%s", src), which always
NUL-terminates and silently truncates oversized input. Add <cstdio> to
both translation units.
The on-disk Scanco header has fixed-width text fields (16, 40, 64
bytes for Version/PatientName/CalibrationData and 16 for RescaleUnits;
8 bytes for the encoded VMS date) and the in-memory buffers are sized
two bytes wider so a NUL terminator always fits. Until now those
widths appeared as bare numeric literals throughout the
read/write/manipulate paths, with the disk-vs-buffer distinction only
implicit in pairings like 16/18, 40/42, 64/66.

Introduce a ScancoHeaderField namespace in itkScancoDataManipulation.h
holding constexpr widths for each field (DiskWidth + BufferSize pairs
plus EncodedDateDiskWidth and DateStringBufferSize), and use them at
all the read/write call sites in itkScancoImageIO, itkISQHeaderIO,
itkAIMHeaderIO, and itkScancoDataManipulation. No semantic change.
Brings IOScanco from a configure-time remote fetch into the ITK
source tree at Modules/IO/IOScanco/ using the v4 ingestion
pipeline (whitelist filter-repo + per-commit clang-format + black +
commit-prefix sanitization).

Upstream repo:  https://github.com/InsightSoftwareConsortium/ITKIOScanco.git
Upstream tip:   2a6ac86c5af5dbcdc61b1e28067737e5a912c3a9
Ingest date:    2026-05-14
Whitelist:      default.list

Per-commit transforms applied across all 122 commits:
  - filter-repo --paths-from-file (whitelist)
  - filter-repo --to-subdirectory-filter Modules/IO/IOScanco
  - clang-format -style=file (ITK main's .clang-format) for *.cxx/.h/.hxx/...
  - black for *.py
  - heuristic ITK prefix added to commit subjects without one

Merge topology preserved: 65 -> 24 merge(s).

Primary author: Matt McCormick <matt.mccormick@kitware.com>

Co-authored-by: Dženan Zukić <dzenan.zukic@kitware.com>
Co-authored-by: ebald19 <ebaldesarra19@gmail.com>
Co-authored-by: Erica Baldesarra <ebaldesarra19@gmail.com>
Co-authored-by: Erica Baldesarra <erica.baldesarra@ucalgary.ca>
Co-authored-by: Hans J. Johnson <hans-johnson@uiowa.edu>
Co-authored-by: Hans Johnson <hans-johnson@uiowa.edu>
Co-authored-by: Jon Haitz Legarreta Gorroño <jon.haitz.legarreta@gmail.com>
Co-authored-by: Mathew Seng <mathewseng@gmail.com>
Co-authored-by: Matt McCormick <matt@fideus.io>
Co-authored-by: Matt McCormick <matt@mmmccormick.com>
Co-authored-by: Michael Kuczynski <mkuczyns@ucalgary.ca>
Co-authored-by: Tom Birdsong <40648863+tbirdso@users.noreply.github.com>
Co-authored-by: Tom Birdsong <tom.birdsong@kitware.com>
IOScanco is now built from Modules/IO/IOScanco/ in the ITK source
tree; the configure-time remote fetch is no longer needed.
Replaces the legacy .sha512 content-link for the IOScanco C0004255.ISQ
test fixture (~107 MB) with a CIDv1-raw .cid pointer. Content is
byte-identical (sha512 9101d92b…999c654); only the pointer format
changes.
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:IO Issues affecting the IO module area:Remotes Issues affecting the Remote module labels May 14, 2026
@hjmjohnson hjmjohnson marked this pull request as ready for review May 14, 2026 11:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR ingests the ITKIOScanco remote module into Modules/IO/IOScanco/, adding ~4 300 lines of C++ for reading and writing Scanco microCT .isq, .aim, .rsq, and .rad files, and deletes the now-redundant .remote.cmake manifest.

Confidence Score: 3/5

Several defects in the new IO layer need fixes before this is safe to merge.

Memory leak, buffer overflow, zero-division, double-free, and silent rescaling bypass found across three source files.

itkAIMHeaderIO.cxx, itkISQHeaderIO.cxx, itkScancoImageIO.cxx

Reviews (2): Last reviewed commit: "COMP: Scope _CRT_SECURE_NO_WARNINGS to t..." | Re-trigger Greptile

Comment thread Modules/IO/IOScanco/src/itkScancoDataManipulation.cxx Outdated
Comment thread Modules/IO/IOScanco/src/itkScancoDataManipulation.cxx Outdated
Comment thread Modules/IO/IOScanco/src/itkScancoDataManipulation.cxx
Comment thread Modules/IO/IOScanco/src/itkScancoImageIO.cxx Outdated
Comment thread Modules/IO/IOScanco/src/itkScancoHeaderIO.cxx Outdated
Comment thread Modules/IO/IOScanco/CMakeLists.txt Outdated
The space-padding loop iterated length times unconditionally, on top of
the bytes already copied from source. PadString therefore wrote up to
2*length bytes into dest, overrunning the fixed-width header fields
that follow it on every ISQ/AIM write.
CheckVersion called strcmp on a fixed-width 16-byte buffer that is not
guaranteed to be null-terminated (CanReadFile reads 512 raw bytes from
the file and passes the pointer in). The comparison could walk past the
buffer's logical end into trailing header bytes, producing undefined
behavior on arbitrary inputs. Use strncmp bounded by VersionDiskWidth,
matching the existing CTDATA check.
…mats

The else branch unconditionally returned fileType = 2 (AIMDATA_V020)
for any header that failed both prior checks, so CanReadFile claimed
every readable file as a Scanco file and the IO factory would
self-nominate for unrelated inputs. Match AIMDATA_V020 explicitly via
strncmp and let fileType remain 0 when no magic string matches.
ReadImageInformation called SetPixelType(SCALAR) unconditionally after
the AIM branch had already called ParseAIMComponentType, which sets
VECTOR for multi-component AIM data types (e.g. 0x00120003, 0x00060003).
The trailing SetPixelType clobbered that decision, leaving downstream
pipelines to treat multi-component AIM volumes as scalar.
WriteHeader's open call passed the local filename parameter, which is
empty when the caller relied on a previously-set m_FileName (the case
the surrounding check explicitly accepts). Use m_FileName, matching
ReadHeader. Also correct "/n" to "\\n" in the error message.
add_definitions() leaks to every sibling target compiled in the
remaining directory scope. Replace with target_compile_definitions on
the ITKIOScanco library defined in src/CMakeLists.txt.
@hjmjohnson
Copy link
Copy Markdown
Member Author

Addressed all 5 P1 + 1 P2 greptile findings in 6 BUG:/COMP: fixup commits appended to preserve merge topology. ITKTestingData PR #50 just merged; ExternalData cache should resolve within ~5 min, so Pixi-Cxx CI should rerun green.

Commits
  • d4a96ec BUG: Fix PadString writing twice the expected byte count
  • 60517f0 BUG: Use bounded comparison for Scanco magic string check
  • 504267d BUG: Make CheckVersion return non-zero only for recognized Scanco formats
  • 877a779 BUG: Preserve VECTOR pixel type for multi-component AIM data
  • 3b267eb BUG: Use m_FileName in WriteHeader and fix newline escape typo
  • ba4ba5b COMP: Scope _CRT_SECURE_NO_WARNINGS to the ITKIOScanco target

pre-commit run --all-files: clean. Merge topology preserved (25 merges, 132 commits ahead of upstream/main).

@hjmjohnson
Copy link
Copy Markdown
Member Author

/azp run

@hjmjohnson
Copy link
Copy Markdown
Member Author

@greptileai review this draft — addressed all 6 findings in commits d4a96ec..ba4ba5b. Please confirm no regressions.

…b 100MB)

The IOScanco ISQ test input blob is 106,958,336 bytes, which exceeds
GitHub's 100 MB hard limit for blobs in repositories. The CID
content-link in ITKTestingData PR InsightSoftwareConsortium#50 produced only a zero-byte
marker file because the actual content could not be pushed.

Restore the original SHA512 content-link. ExternalData resolves this
file via the data.kitware.com SHA512 mirror, which hosts the full
blob at:

  https://data.kitware.com/api/v1/file/hashsum/sha512/<digest>/download

(HTTP 200, Content-Length 106958336 — verified 2026-05-14.)

The orphaned zero-byte CID marker on ITKTestingData gh-pages is
removed in a companion PR.
@hjmjohnson
Copy link
Copy Markdown
Member Author

Reverted Modules/IO/IOScanco/test/Input/C0004255.ISQ.cid back to the original .sha512 content-link in 191e28d. The blob is 106,958,336 bytes (> GitHub's 100 MB per-file limit), so the ITKTestingData CID upload landed only a zero-byte marker. ExternalData resolves the SHA512 form via the data.kitware.com mirror (HTTP 200, Content-Length: 106958336 — verified today).

Companion cleanup: InsightSoftwareConsortium/ITKTestingData#55 removes the orphaned zero-byte CID marker from gh-pages.

The Baseline C0004255.mha.cid (67 MB) is left as-is — under the limit and presumably already resolves.

The C0004255.ISQ fixture is a 107 MB file hosted on data.kitware.com
via SHA512 mirror (it exceeds GitHub's 100 MB single-file limit so
cannot live in ITKTestingData). Tests that require this fixture are
gated behind ITK_BUILD_BIG_DATA_TESTS (default OFF) and labelled with
BigIO;RUNS_LONG, matching the pattern used by itkLargeTIFFImageWriteRead*
tests in Modules/IO/TIFF. Default builds keep the smaller AIM (4 MB)
fixtures, exercising the reader without the multi-hundred-MB download.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:IO Issues affecting the IO module area:Python wrapping Python bindings for a class area:Remotes Issues affecting the Remote module type:Data Changes to testing data type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants