Skip to content

Warn instead of raise when user-provided data_files yields a subset#8215

Open
adityasingh2400 wants to merge 1 commit into
huggingface:mainfrom
adityasingh2400:fix-non-matching-splits-with-explicit-data-files-7867
Open

Warn instead of raise when user-provided data_files yields a subset#8215
adityasingh2400 wants to merge 1 commit into
huggingface:mainfrom
adityasingh2400:fix-non-matching-splits-with-explicit-data-files-7867

Conversation

@adityasingh2400
Copy link
Copy Markdown

Fixes #7867.

NonMatchingSplitsSizesError currently fires whenever the loaded split size differs from the expected size, including when the user explicitly passed data_files for a known subset of the dataset. The only user-side workaround is verification_mode='no_checks', which silences ALL checks rather than just the split-size one.

This PR downgrades the split-size mismatch to a UserWarning when data_files was explicitly provided by the caller. Other mismatch paths (corrupted download, wrong size for full download) remain hard errors.

NonMatchingSplitsSizesError fires whenever the loaded split size differs
from the expected size, including when the user has explicitly passed
data_files for a known subset of the dataset. The user-side workaround
is verification_mode='no_checks', which silences ALL checks rather than
just the split-size one.

Downgrade the split-size mismatch to a UserWarning when data_files was
explicitly provided by the caller. Other mismatch paths (corrupted
download, wrong size for full download) remain hard errors.

Fixes huggingface#7867
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NonMatchingSplitsSizesError when loading partial dataset files

1 participant