Hello Modkit team,
Thank you for developing and maintaining Modkit.
I have a question regarding the behaviour of the "--single-code" option in differential methylation analysis.
Suppose a modBAM contains both 5mC ("m") and 5hmC ("h") calls from a Dorado model, and I run:
"modkit dmr pair --single-code h"
Could you please clarify how non-target modification calls are handled internally?
For example, at a CpG site with the following read-level composition:
- 10 reads called as 5hmC ("h")
- 80 reads called as 5mC ("m")
- 10 reads called as canonical C
When using "--single-code h":
- Are 5mC calls treated as canonical/unmodified bases?
- Are 5mC calls excluded from the coverage denominator?
- Are 5mC calls retained as an "other modification" category?
- How exactly is the modification proportion calculated for DMR testing?
Similarly, when using:
modkit dmr pair --single-code m
how are 5hmC calls handled?
My main biological question is whether "--single-code h" can be interpreted as a differential analysis of only 5hmC sites, independent of 5mC. In other words, if a CpG is called as 5mC in some reads and 5hmC in other reads, does the presence of 5mC contribute to the denominator or influence the estimated 5hmC frequency?
I am interested in generating separate 5mC and 5hmC landscapes from Nanopore data and would like to understand whether:
- "--single-code h" effectively measures 5hmC versus everything else, or
- it measures 5hmC while excluding 5mC calls from the analysis, or
- it uses some other counting model internally.
Any clarification on the underlying counting model, treatment of other modification classes, and denominator used for DMR testing would be greatly appreciated.
Thank you very much for your help.
Hello Modkit team,
Thank you for developing and maintaining Modkit.
I have a question regarding the behaviour of the "--single-code" option in differential methylation analysis.
Suppose a modBAM contains both 5mC ("m") and 5hmC ("h") calls from a Dorado model, and I run:
"modkit dmr pair --single-code h"
Could you please clarify how non-target modification calls are handled internally?
For example, at a CpG site with the following read-level composition:
When using "--single-code h":
Similarly, when using:
modkit dmr pair --single-code m
how are 5hmC calls handled?
My main biological question is whether "--single-code h" can be interpreted as a differential analysis of only 5hmC sites, independent of 5mC. In other words, if a CpG is called as 5mC in some reads and 5hmC in other reads, does the presence of 5mC contribute to the denominator or influence the estimated 5hmC frequency?
I am interested in generating separate 5mC and 5hmC landscapes from Nanopore data and would like to understand whether:
Any clarification on the underlying counting model, treatment of other modification classes, and denominator used for DMR testing would be greatly appreciated.
Thank you very much for your help.