Precision HLA typing from next-generation sequencing data
Authors: András Szolek, Benjamin Schubert, Christopher Mohr, Jonas Scheid Version: 1.5.0 License: BSD-3-Clause
OptiType is a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.
pip install optitypegit clone https://github.com/FRED-2/OptiType.git
cd OptiType
pip install -e .OptiType requires external tools that cannot be installed via pip:
-
RazerS3 - Read mapper
# Via conda/bioconda conda install -c bioconda razers3 # Or build from source: https://github.com/seqan/seqan
-
ILP Solver - At least one of:
- GLPK (open source)
apt install glpk-utils # Debian/Ubuntu conda install -c conda-forge glpk # Conda
- CBC (open source)
apt install coinor-cbc # Debian/Ubuntu conda install -c conda-forge coincbc # Conda
- CPLEX (commercial, free for academia)
- GLPK (open source)
optitype check-deps# DNA sequencing (paired-end)
optitype run -i reads_1.fastq -i reads_2.fastq --dna -o results/
# RNA sequencing (single-end)
optitype run -i sample.fastq --rna -o results/
# Re-analyze from BAM
optitype run -i mapped.bam --dna -o results/from optitype import run_hla_typing, HLATypingConfig
result = run_hla_typing(
fastq_files=["sample_1.fastq", "sample_2.fastq"],
seq_type="dna",
config=HLATypingConfig(solver="cbc", threads=4)
)
print(result.best_result)
# {'A1': 'A*02:01', 'A2': 'A*03:01', 'B1': 'B*07:02', ...}optitype run --help
Usage: optitype run [OPTIONS]
Run HLA typing analysis.
Options:
-i, --input PATH Input FASTQ or BAM files (use multiple times for paired-end)
-r, --rna Input data is RNA sequencing
-d, --dna Input data is DNA sequencing (default)
-o, --outdir PATH Output directory for results (required)
-p, --prefix TEXT Output filename prefix (default: timestamp)
-b, --beta FLOAT Homozygosity detection parameter (0.0-0.1)
-e, --enumerate INT Number of solutions to enumerate
--solver [glpk|cbc|cplex] ILP solver to use
--razers3 PATH Path to RazerS3 binary
--threads INT Number of threads for mapping
-v, --verbose Enable verbose output
-c, --config PATH Path to config.ini file
--help Show this message and exit
# Check dependencies
optitype check-deps
# Generate config file
optitype init-config
# Show installation info
optitype infoGenerate a config file with:
optitype init-config -o config.iniKey settings:
[mapping]
threads=4 # Threads for read mapping
[ilp]
solver=glpk # ILP solver: glpk, cbc, or cplex
threads=1 # Threads for ILP solver
[behavior]
deletebam=true # Delete intermediate BAM files
unpaired_weight=0 # Weight for unpaired reads (0-1)docker pull fred2/optitype
docker run -v /path/to/data:/data -t fred2/optitype \
-i /data/reads_1.fastq -i /data/reads_2.fastq --dna -o /data/results/# DNA (paired-end)
optitype run \
-i ./test/exome/NA11995_SRR766010_1_fished.fastq \
-i ./test/exome/NA11995_SRR766010_2_fished.fastq \
--dna -v -o ./test/exome/
# RNA (paired-end)
optitype run \
-i ./test/rna/CRC_81_N_1_fished.fastq \
-i ./test/rna/CRC_81_N_2_fished.fastq \
--rna -v -o ./test/rna/OptiType produces:
*_result.tsv- HLA typing results*_coverage_plot.pdf- Coverage visualization
Example output:
A1 A2 B1 B2 C1 C2 Reads Objective
0 A*02:01 A*03:01 B*07:02 B*44:02 C*07:02 C*05:01 1234 1156.5
Version 1.5 introduces a modernized CLI. Main changes:
- Install with
pip install optitype - Use
optitype runinstead ofpython OptiTypePipeline.py - Multiple input files: use
-i file1 -i file2instead of-i file1 file2 - Data bundled with package (no need to set paths)
The core algorithm and output format remain unchanged.
- Python 3.10+
- External: RazerS3, ILP solver (GLPK/CBC/CPLEX)
Szolek, A, Schubert, B, Mohr, C, Sturm, M, Feldhahn, M, and Kohlbacher, O (2014). OptiType: precision HLA typing from next-generation sequencing data Bioinformatics, 30(23):3310-6. doi:10.1093/bioinformatics/btu548
András Szolek szolek@informatik.uni-tuebingen.de University of Tübingen, Applied Bioinformatics