Fix/csv import authority separator parsing#544
Open
Hazel-0 wants to merge 2 commits into4Science:main-crisfrom
Open
Fix/csv import authority separator parsing#544Hazel-0 wants to merge 2 commits into4Science:main-crisfrom
Hazel-0 wants to merge 2 commits into4Science:main-crisfrom
Conversation
- Enhanced resolveValueAndAuthority() to handle authorities containing :: - Fixes NumberFormatException when parsing values like: value::will be referenced::ORCID::0000-0002-5474-1918::600 - Properly handles 2-part, 3-part, and 4+ part formats - Maintains backward compatibility with existing CSV imports
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
References
Description
Fixes CSV metadata import failure when authority-controlled metadata values contain the authority separator (
::) within the authority string itself, causing aNumberFormatExceptionduring parsing.Instructions for Reviewers
This PR fixes a bug in the CSV metadata import functionality where the import fails with a
NumberFormatExceptionwhen processing metadata values where the authority string itself contains the authority separator (::). This commonly occurs with ORCID and ROR-ID authority references (e.g.,Fischer, Frank::will be referenced::ORCID::0000-0002-5474-1918::600).The
resolveValueAndAuthority()method inMetadataImport.javaincorrectly assumed that the format is alwaysvalue::authority::confidence(exactly 3 parts) and that the authority never contains the separator itself. When an authority likewill be referenced::ORCID::0000-0002-5474-1918is split by::, it produces more than 3 parts, causing the parser to incorrectly identify parts and throw aNumberFormatException.List of changes in this PR:
resolveValueAndAuthority()method to correctly handle authorities containing separators by implementing logic to reconstruct authority strings from multiple parts< 3to< 2to properly handle 2-part format (value::authority) which was previously ignoredCF_ACCEPTEDas default confidence (consistent with existing behavior when authority is provided)Include guidance for how to test or review your PR. This may include: steps to reproduce a bug, screenshots or description of a new feature, or reasons behind specific changes.
How to test this PR:
Prepare a CSV file with authority-controlled metadata values containing separators in the authority:
Run the CSV metadata import via the DSpace admin interface or command line:
Verify the import succeeds without throwing
NumberFormatExceptionVerify the metadata values are correctly imported with:
will be referenced::ORCID::0000-0002-5474-1918)600)Test Cases Covered:
value::authority::600value::authority(now correctly sets authority with CF_ACCEPTED)Fischer, Frank::will be referenced::ORCID::0000-0002-5474-1918::600Chemnitz University of Technology::will be referenced::ROR-ID::https://ror.org/00a208s56::600value::authority::with::separators(treats all as authority)Backward Compatibility:
Fully backward compatible - the fix maintains all existing behavior for standard 3-part format while adding support for edge cases.
Checklist
mainbranch of code (unless it is a backport or is fixing an issue specific to an older branch).