~/temp_test > cat dorado.fastq
@924f595d-fb8d-473e-bb99-b554dfdd5ce9 st:Z:2024-09-11T03:04:06.675+00:00 RG:Z:41b0457e40e383b3df64af4d8e649576ca9a4668_dna_r10.4.1_e8.2_400bps_fast@v5.0.0
GAAGCGACAGCGTATGCGCGTGTTTAAGTTCGACTGGTTCTCTGCCACCGGTACCGCCATTCTTTTTGCTGCCCTGCTCTCGATTGTCTGGCTGAAGATGAAACCATCTGACGCTATCAGCGCCTTCGGCAGCACGCTGAAGGACTGGCTCTGCCTATCTACTCCATCGGTATGGTGCTGGCGTTCGCCTTTATCTCGAACTATTCCGGACTATCATCAACGCTGGCGCTGGCGCTCGCACACACCGGCCATGCATTCCGCCTTTTCTCTCTCGCCGTTCCTCGGCTGGCTTGGTGTCTTCCTGACCGGATCGGATACCTCATCTAACGCCCTGTTCGCCGCCCTGCAAGCCGCTGCAGCACAACAAATTGGCGTTTCTGACCTGTTGTTGGTTGCCGCCAACACCGCCGGTGGTGTCGCCGGTTAAGATGATCTCTTCCGCAATCTATCGCTATCACCTATGCGGGGATAGGCGTGGTAGGCAAAGAGTCAGATCTCTCGCTTTACCATCAAACGCAGCTAAATCTCACCTGTATGGTCGGCGTGATCGCCACGCTCAGGCTTATGTCTTAACGTGGATAATTTGCTAATGATTGTTTTACCCAGACGCCTGTCAGACAAGGTCCGATCGTGTGCGGGCGCTGATGGTGATG
+
89D>AABD:9:2106.-&)'))**4495@50'&'7.99=54166AH@A>=D>86789A=>=ADSDMIHB@>??BG?=<1<...1>8;9>656M@9869711A?<:422<<GFBC@>E<77<(((EI4116213321882/>=)0'62669)'(01/%%%&%&)+68B?@ABGCAB==>KHDCGB71246:DB64331427/2(%$%%%+,/.>>?>>:99@888B58))*=<@++*)()(((348:8;7/3//:'&&*,*46<64@33)7.@9;<HAAA66B0..@S546>(((*<603878F8<<C+)**0/,'7-,.5//0/3)*2579<;<...6>?96?A811/<546,**42+,--.,03./43-+,-1-/,(+-6>AS?ACD::6771)((-/75.1777.-')-6312596**,1571-*,,,,-420.1*'<<:**-655<,,-299&%&&:''&..6071*+/0.*)*;<76:7+*79.-.%$#($.+%'$&'(;7*&%$()1(76,*+))(%%*0.-*56:HIE??=77''-/-,.345+*+/-(/<';:;?@946@732;<ABEBA::>-.3++/..-26,,11))+>C8==G:66:534H>B5/,,02020---))),()**(.%),;'''.++(*&$&&)
~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado.fastq /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.622076s
2nd stage: 0.110107s
3rd stage: 0.0044s
Total : 0.736583s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 0
No. of unique counted k-mers : 0
Total no. of k-mers : 0
Total no. of reads : 1
Total no. of super-k-mers : 0
I realized that this is because of the presence of tab characters in the line1 of the FASTQ file. By converting the tabs into spaces, the expected behaviour is retrieved:
~/temp_test > cat dorado.fastq | tr '\t' ' ' > dorado_notab.fastq
~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado_notab.fastq /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.645912s
2nd stage: 0.102941s
3rd stage: 0.002236s
Total : 0.751089s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 633
No. of unique counted k-mers : 633
Total no. of k-mers : 633
Total no. of reads : 1
Total no. of super-k-mers : 93
Hello,
With FASTQ files generated by
dorado demux --emit-fastqkmc (I am using KMC 3.2.4) doesn't work correctly:I realized that this is because of the presence of tab characters in the line1 of the FASTQ file. By converting the tabs into spaces, the expected behaviour is retrieved:
Could you please fix this issue in a future release ? Many thanks !
Bogdan