-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathextract_chromosomes.py
More file actions
executable file
·52 lines (41 loc) · 1.32 KB
/
extract_chromosomes.py
File metadata and controls
executable file
·52 lines (41 loc) · 1.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/usr/bin/env pypy3
"""
Script that extracts chromosome names from a SAM header.
This script reads SAM-format data from standard input and parses all @SQ lines
to extract chromosome names (i.e. reference sequence names). It stops reading
once the SAM header ends—that is, when the first alignment record is encountered.
Example input (SAM header lines):
@HD VN:1.6 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chrM LN:16569
@PG ID:samtools PN:samtools VN:1.10
Example output:
chr1
chr2
chrM
Output:
A list of chromosome names, one per line.
Usage example:
samtools view -H sample.bam | python extract_chromosomes.py
"""
import sys
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
def extract_chromosomes(file):
chromosomes = []
for line in file:
if line.startswith("@SQ"):
fields = line.strip().split("\t")
for field in fields:
if field.startswith("SN:"):
chrom = field[3:] # strip "SN:"
chromosomes.append(chrom)
elif not line.startswith("@"):
# Header ended
break
return chromosomes
if __name__ == "__main__":
chroms = extract_chromosomes(sys.stdin)
for chrom in chroms:
print(f"{chrom}")