Skip to content

Commit 9586129

Browse files
SanddhyaJSanddhya Jayabalangeorg-wolflein
authored
Contributing cyvcf2 get mean allele length fork (#29)
* adding cyvcf2 fork * Update references * Rename task, make it a bit more complicated, and add tests --------- Co-authored-by: Sanddhya Jayabalan <sanddhyajayabalan@Sanddhyas-MacBook-Air.local> Co-authored-by: Georg Wölflein <georgw7777@gmail.com>
1 parent 10a4f6d commit 9586129

13 files changed

Lines changed: 1460 additions & 0 deletions

tasks/cyvcf2_count_alterations/__init__.py

Whitespace-only changes.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Add large files here that should not be committed to the repository

tasks/cyvcf2_count_alterations/data/SRR2058984_zc.vcf

Lines changed: 343 additions & 0 deletions
Large diffs are not rendered by default.

tasks/cyvcf2_count_alterations/data/SRR2058985_zc.vcf

Lines changed: 224 additions & 0 deletions
Large diffs are not rendered by default.

tasks/cyvcf2_count_alterations/data/SRR2058987_zc.vcf

Lines changed: 255 additions & 0 deletions
Large diffs are not rendered by default.

tasks/cyvcf2_count_alterations/data/SRR2058988_zc.vcf

Lines changed: 227 additions & 0 deletions
Large diffs are not rendered by default.

tasks/cyvcf2_count_alterations/data/SRR2058989_zc.vcf

Lines changed: 246 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
def cyvcf2_count_alterations(
2+
input_vcf: str = "/mount/input/SRR2058984_zc.vcf",
3+
reference_nucleotide: str = "A",
4+
alternate_nucleotide: str = "C",
5+
) -> dict:
6+
"""
7+
Use the cyvcf2 to parse through VCF file containing detected sequence variants to identify the number of single
8+
nucleotide polymorphisms (SNPs) from a specific reference nucleotide to a specific alternate nucleotide.
9+
10+
Args:
11+
input_vcf: Path to the input VCF file
12+
reference_nucleotide: The reference nucleotide to compare against ("A", "C", "G", or "T")
13+
alternate_nucleotide: The alternate nucleotide to compare against ("A", "C", "G", or "T")
14+
15+
Returns:
16+
dict with the following structure:
17+
{
18+
'num_snps': int # The number of SNPs that are altered from reference `reference_nucleotide` to
19+
`alternate_nucleotide`.
20+
}
21+
"""
22+
from cyvcf2 import VCF
23+
24+
# Initialize counters
25+
num_snps = 0
26+
27+
# Iterate over each variant
28+
for variant in VCF(input_vcf):
29+
if (
30+
variant.is_snp
31+
and variant.REF == reference_nucleotide
32+
and variant.ALT
33+
and variant.ALT[0] == alternate_nucleotide
34+
):
35+
num_snps += 1
36+
37+
return {"num_snps": num_snps}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#! /bin/bash
2+
set -e
3+
4+
git clone https://github.com/brentp/cyvcf2 /workspace/cyvcf2
5+
cd /workspace/cyvcf2 && git checkout main && git checkout 541ab16
6+
7+
# Insert commands here to install dependencies and setup the environment...
8+
pip install cyvcf2
9+
pip install numpy
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
name: cyvcf2_count_alterations
2+
repo:
3+
name: cyvcf2
4+
url: "https://github.com/brentp/cyvcf2"
5+
commit: 541ab16
6+
branch: main
7+
env: []
8+
papers: [pedersen2017cyvcf2]
9+
category: genomics_proteomics
10+
requires: cpu
11+
description: Use cyvcf2 to parse through VCF file containing detected sequence variants to identify the number of single nucleotide polymorphisms (SNPs) from a specific reference nucleotide to a specific alternate nucleotide.
12+
arguments:
13+
- name: input_vcf
14+
description: Path to the input VCF file
15+
type: str
16+
- name: reference_nucleotide
17+
description: The reference nucleotide to compare against ("A", "C", "G", or "T")
18+
type: str
19+
- name: alternate_nucleotide
20+
description: The alternate nucleotide to compare against ("A", "C", "G", or "T")
21+
type: str
22+
returns:
23+
- name: num_snps
24+
description: The number of SNPs that are altered from reference `reference_nucleotide` to `alternate_nucleotide`.
25+
type: int
26+
example:
27+
arguments:
28+
- name: input_vcf
29+
value: /mount/input/SRR2058984_zc.vcf
30+
- name: reference_nucleotide
31+
value: "A"
32+
- name: alternate_nucleotide
33+
value: "C"
34+
mount:
35+
- source: SRR2058984_zc.vcf
36+
target: SRR2058984_zc.vcf
37+
test_invocations:
38+
- name: SRR2058985
39+
arguments:
40+
- name: input_vcf
41+
value: /mount/input/SRR2058985_zc.vcf
42+
- name: reference_nucleotide
43+
value: "A"
44+
- name: alternate_nucleotide
45+
value: "T"
46+
mount:
47+
- source: SRR2058985_zc.vcf
48+
target: SRR2058985_zc.vcf
49+
- name: SRR2058987
50+
arguments:
51+
- name: input_vcf
52+
value: /mount/input/SRR2058987_zc.vcf
53+
- name: reference_nucleotide
54+
value: "T"
55+
- name: alternate_nucleotide
56+
value: "C"
57+
mount:
58+
- source: SRR2058987_zc.vcf
59+
target: SRR2058987_zc.vcf
60+
- name: SRR2058988
61+
arguments:
62+
- name: input_vcf
63+
value: /mount/input/SRR2058988_zc.vcf
64+
- name: reference_nucleotide
65+
value: "T"
66+
- name: alternate_nucleotide
67+
value: "A"
68+
mount:
69+
- source: SRR2058988_zc.vcf
70+
target: SRR2058988_zc.vcf
71+
- name: SRR2058989
72+
arguments:
73+
- name: input_vcf
74+
value: /mount/input/SRR2058989_zc.vcf
75+
- name: reference_nucleotide
76+
value: "T"
77+
- name: alternate_nucleotide
78+
value: "G"
79+
mount:
80+
- source: SRR2058989_zc.vcf
81+
target: SRR2058989_zc.vcf

0 commit comments

Comments
 (0)