Add pre-flight BAM integrity check before scAbsolute calling#34
Merged
Conversation
New validate_bams rule runs samtools quickcheck on every BAM in the
sample sheet before any qc / scale_scAbsolute job is scheduled. If any
file is truncated or otherwise unreadable, the workflow aborts up front
with the full list of bad files so the user can fix them all in one
pass (typically by re-copying from the source) rather than discovering
failures one cell at a time deep into a multi-hour run.
Failures are recorded in the SAME canonical CSV used by the downstream
combine / merge step:
results/<binSize>/<sampleName>_<binSize>_failed_cells.csv
with a new failure_reason value "truncated_bam" alongside the existing
"missing_output" and "process_crash". One file, one schema, one place
to look regardless of which stage caught the failure.
Also updates README.md to document the failure_reason vocabulary and
recommends running snakemake with --keep-going so that single-cell
failures later in the run do not abort the entire batch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New validate_bams rule runs samtools quickcheck on every BAM in the sample sheet before any qc / scale_scAbsolute job is scheduled. If any file is truncated or otherwise unreadable, the workflow aborts up front with the full list of bad files so the user can fix them all in one pass (typically by re-copying from the source) rather than discovering failures one cell at a time deep into a multi-hour run.
Failures are recorded in the SAME canonical CSV used by the downstream combine / merge step:
with a new failure_reason value "truncated_bam" alongside the existing "missing_output" and "process_crash". One file, one schema, one place to look regardless of which stage caught the failure.
Also updates README.md to document the failure_reason vocabulary and recommends running snakemake with --keep-going so that single-cell failures later in the run do not abort the entire batch.