Guard MannKendall against segments with <3 non-NA values#10
Open
shafighi wants to merge 1 commit into
Open
Conversation
The two cellcycle classification functions (cellcycleMetadata,
cellcycle_gctest) loop over genome segments and call
Kendall::MannKendall(co[idx]) on each segment's copy-number values.
The existing gates skip a segment if it has fewer than 20 bins
(segment_size_cutoff) or if all values are zero, but neither gate
accounts for NA values in co.
With the new mouse blacklist + pericentromeric + segmental duplication
masks, segments often clear the size gate but have most of their bins
NA-filtered. The non-zero gate also misfires because all(co == 0)
returns NA (not FALSE) when co contains NAs, so the gate's if-condition
errors -- which the outer tryCatch in run_scAbsolute.R catches as a
hard failure, producing an empty .rds.
Reproducibly observed on SLX-27548 mouse cells: A1_2 (65M reads) fails
with stop("length(x)<3") fired by Kendall internals when most of a
segment's bins are NA.
Fix:
1. all(co == 0) -> all(co == 0, na.rm=TRUE) so NA-only and zero+NA
segments are skipped instead of erroring.
2. New gate: sum(!is.na(co)) < 3 returns NA for that segment instead
of feeding a degenerate vector to MannKendall.
Same fix applied to both cellcycleMetadata (lines ~62-66) and
cellcycle_gctest (lines ~192-196).
There was a problem hiding this comment.
Pull request overview
This PR hardens the cell-cycle classification logic in cellcycleMetadata() and cellcycle_gctest() against genome segments where aggressive filtering leaves too few non-NA observations to safely run Kendall::MannKendall().
Changes:
- Make the “all zeros” gate
NA-safe viaall(co == 0, na.rm=TRUE)incellcycleMetadata(). - Add guards to skip segments with fewer than 3 usable (non-
NA) observations before callingKendall::MannKendall(). - Add explanatory comments describing why the new guard is needed in filtered mouse data.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+67
to
82
| # Mann-Kendall (and the partial-MK / partial-spearman calls below) all | ||
| # need >=3 jointly-non-NA values. With aggressive mouse filtering | ||
| # (blacklist + pericentromeric + segdup) and mm10-liftover gaps in | ||
| # the replication-timing track, a long segment can clear the size | ||
| # cutoff but still have <3 bins that are non-NA across co, time and | ||
| # covar simultaneously. base::sort(time, ...) drops NAs from time | ||
| # silently, so the resulting idx is shorter than the segment and | ||
| # co[idx] feeds a degenerate vector into Kendall::MannKendall, | ||
| # triggering stop("length(x)<3") inside the package. | ||
| jointly_valid = !is.na(co) & !is.na(time) & !is.na(covar) | ||
| if(sum(jointly_valid) < 3){ | ||
| return(list("pval"=NA, "tau"=NA, "tau_corrected"=NA, "cor_corrected"=NA)) | ||
| } | ||
| idx = base::sort(time, index.return=TRUE)$ix | ||
| # ggplot(data=dplyr::tibble(co=co[idx], covar=covar[idx], time=time[idx])) + geom_point(aes(x=time, y=co, color=covar)) | ||
| trend.kendall_test = Kendall::MannKendall(co[idx]) |
Comment on lines
+202
to
209
| # See cellcycleMetadata above for the same guard. base::sort(time, ...) | ||
| # silently drops NA time values, so the segment can clear the size | ||
| # gate but still feed a <3-length vector to Kendall::MannKendall. | ||
| if(sum(!is.na(co) & !is.na(time)) < 3){ | ||
| return(list("pval"=NA, "tau"=NA)) | ||
| } | ||
| idx = base::sort(time, index.return=TRUE)$ix | ||
| trend.kendall_test = Kendall::MannKendall(co[idx]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The two cellcycle classification functions (cellcycleMetadata, cellcycle_gctest) loop over genome segments and call Kendall::MannKendall(co[idx]) on each segment's copy-number values. The existing gates skip a segment if it has fewer than 20 bins (segment_size_cutoff) or if all values are zero, but neither gate accounts for NA values in co.
With the new mouse blacklist + pericentromeric + segmental duplication masks, segments often clear the size gate but have most of their bins NA-filtered. The non-zero gate also misfires because all(co == 0) returns NA (not FALSE) when co contains NAs, so the gate's if-condition errors -- which the outer tryCatch in run_scAbsolute.R catches as a hard failure, producing an empty .rds.
Reproducibly observed on SLX-27548 mouse cells: A1_2 (65M reads) fails with stop("length(x)<3") fired by Kendall internals when most of a segment's bins are NA.
Fix:
Same fix applied to both cellcycleMetadata (lines ~62-66) and cellcycle_gctest (lines ~192-196).