This system was built and tested to predict the correct stoichiometry for targets released without stochiometry information (Phase 0) in the 16th worldwide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) concluded in December 2024.
Using this system, MULTICOM4 achieved remarkable success, ranking 1st in protein complex structure prediction for such targets.
This program handles both the template-based stoichiometry prediction and the AlphaFold3-based stoichiometry prediction part in the above diagram.
- Operating System: Linux (Ubuntu 22.04 LTS recommended)
- Storage: Up to 1 TB (SSD recommended) for genetic databases
- GPU: NVIDIA GPU with Compute Capability 8.0+ (e.g., A100 80 GB, H100 80 GB)
- Supports inputs up to 5,120 tokens on a single A100 or H100 (80 GB)
- RAM: Minimum 64 GB (especially for long targets)
- Installation Time: Approximately 45 minutes to download and set up databases (without SSD).
- Average Runtime: Varies by target length, intended stoichiometries, and GPU memory. Inputs with up to 5,120 tokens fit on an NVIDIA A100 or H100 (80 GB).
Example : Rutime of CASP16 target H0208
Sequence Lengths : Chain A : 328, Chain B : 318
Number of models generated per stoichiometry : 25
Device Configuration for running PreStoi and AlphaFold3:- CPU : AMD EPYC 7643 2.3 Ghz, RAM : 512 GB, GPU : One Nvidia A100 (80 GB)
The average runtime of testing all 9 stoichiometry candidates (A1B1, A1B2, A1B3, A2B1, A2B2, A2B3, A3B1, A3B2, A3B3) is 266 minutes.
git clone https://github.com/jianlin-cheng/prestoi
cd prestoi
You can create a virtual environment by yourself or install a fresh conda environment using the following commands:
wget "https://github.com/conda-forge/miniforge/releases/download/23.1.0-3/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
rm Mambaforge-$(uname)-$(uname -m).sh
source ~/.bashrc
Then run the following commands to install the required python packages:
mamba install -y -c bioconda hmmer
pip install pandas biopython requests
python download_template_database_and_tools.py
https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md
Test whether AlphaFold3 program is working properly.
Once you have installed AlphaFold3, you can test your setup using e.g. the
following input JSON file named fold_input.json:
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}You can then run AlphaFold 3 using the following command and check whether AlphaFold3 is running correctly:
docker run -it \
--volume $HOME/af_input:/root/af_input \
--volume $HOME/af_output:/root/af_output \
--volume <MODEL_PARAMETERS_DIR>:/root/models \
--volume <DATABASES_DIR>:/root/public_databases \
--gpus all \
alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output
python configure.py --af3_program_path /path/to/alphafold3_program/ --af3_params_path /path/to/alphafold3_parameters/ --af3_db_path /path/to/alphafold3_databases/
This step will create a config.json file in the current working directory(prestoi) with the following information.
{
"af3_program_path": "/path/to/alphafold3_program/",
"af3_params_path": "/path/to/alphafold3_parameters/",
"af3_db_path": "/path/to/alphafold3_databases/",
"uniref90_path": "/path/to/uniref90",
"hhsearch_binary": "/path/to/hhsearch",
"hhmake_binary": "/path/to/hhmake",
"hhdb_prefix": "/path/to/hhdb",
"stoichiometry_pdb_fasta_folder": "/path/to/stoichiometry_pdb_fasta_folder",
"pdb_stoichiometry_database": "/path/to/pdb_stoichiometry_database",
"template_atom_dir": "/path/to/template_atom_dir",
}Note: This step only requires running once. However, this can be run again if any of the above paths change. Make sure the paths are valid.
python template_based_prediction_v2.py
- Required
--input_fasta: target complex FASTA (one entry per subunit)--output_path: output folder
- Optional (recommended for speed / reproducibility)
- Provide precomputed MSAs/profiles per subunit to skip jackhmmer:
--a3m_map: JSON mappingsubunit_id -> /path/to/subunit.a3m--hmm_map: JSON mappingsubunit_id -> /path/to/subunit.hmm- or
--a3m_dir/--hmm_dir: directories containing files named<subunit>.a3mor<subunit>.hmm
- Provide precomputed MSAs/profiles per subunit to skip jackhmmer:
Priority: HMM (if provided) > A3M (if provided) > jackhmmer (fallback).
python template_based_prediction_v2.py \
--input_fasta /path/to/target.fasta \
--output_path /path/to/output_dir \
--n_topn 10This script will generate the possible number of copies for each subunit in the input FASTA file, generate stoichiometry candidates for AlphaFold3-based prediction, and provide template-based stoichiometry predictions when sufficient template evidence is available (e.g., a complex template that covers all subunits in the input FASTA)
If template search or MSA generation has already been performed externally, PreStoi can reuse per-subunit precomputed A3M and/or HMM files instead of rerunning jackhmmer. The keys in the JSON map, or the filenames in directory mode, must exactly match the FASTA record IDs.
For example, if the input FASTA contains:
>H0208_A
SEQUENCE_A
>H0208_B
SEQUENCE_B
then the corresponding map keys or filenames should be H0208_A and H0208_B.
python template_based_prediction_v2.py \
--input_fasta /path/to/H0208.fasta \
--output_path /path/to/output_dir \
--n_topn 10 \
--a3m_map /path/to/a3m_map.json \
--hmm_map /path/to/hmm_map.jsonExample a3m_map.json:
{
"H0208_A": "/msas/H0208_A.a3m",
"H0208_B": "/msas/H0208_B.a3m"
}
Example hmm_map.json:
{
"H0208_A": "/hmms/H0208_A.hmm",
"H0208_B": "/hmms/H0208_B.hmm"
}
python template_based_prediction_v2.py \
--input_fasta /path/to/H0208.fasta \
--output_path /path/to/output_dir \
--n_topn 10 \
--a3m_dir /msas \
--hmm_dir /hmmsThis expects files named according to the FASTA record IDs, for example:
/msas/H0208_A.a3m
/msas/H0208_B.a3m
/hmms/H0208_A.hmm
/hmms/H0208_B.hmm
When both are available, PreStoi uses inputs in the following priority order:
HMM > A3M > jackhmmer
Example output for H0208 (True stoichiometry: A1B1):
======================================================================
Ranked Copy Numbers Per Subunit (local per-template evidence)
======================================================================
▶ Subunit: H0208_B
Rank | Score (–log10(E)) | Copies
-----------------------------------------
1 | 218.975800 | 1
2 | 93.003826 | 2
▶ Subunit: H0208_A
Rank | Score (–log10(E)) | Copies
-----------------------------------------
1 | 162.277588 | 1
2 | 157.298376 | 2
======================================================================
Ranked Stoichiometry Candidates (before coverage constraints)
======================================================================
1. A1B1
2. A2B1
3. A1B2
4. A2B2
Saved: casp16_results/H0208/ranked_stoichiometry_candidates.csv
======================================================================
Templates covering ALL subunits
======================================================================
6UXU: stoichiometry=A2
4NUR: stoichiometry=A2
6V73: stoichiometry=A2
======================================================================
Templates covering MULTIPLE subunits
======================================================================
6UXU: subunits=A,B, stoichiometry=A2
4NUR: subunits=A,B, stoichiometry=A2
6V73: subunits=A,B, stoichiometry=A2
======================================================================
Stoichiometry Candidates AFTER coverage/multi-template constraints
======================================================================
=== Applying multi-template constraints to 3 templates ===
[1] Template 6UXU covers 2 subunits; stoich=A2; evidence_sum(-log10E)=67.75; minE=5.70e-35
- Boost H0208_B: chain A -> copies=1 (+6.78e+10)
- Boost H0208_A: chain A -> copies=1 (+6.78e+10)
[2] Template 6V73 covers 2 subunits; stoich=A2; evidence_sum(-log10E)=61.26; minE=1.50e-31
- Boost H0208_A: chain A -> copies=1 (+3.06e+10)
- Boost H0208_B: chain A -> copies=1 (+3.06e+10)
[3] Template 4NUR covers 2 subunits; stoich=A2; evidence_sum(-log10E)=59.44; minE=7.80e-32
- Boost H0208_A: chain A -> copies=1 (+1.98e+10)
- Boost H0208_B: chain A -> copies=1 (+1.98e+10)
1. A1B1: 646
2. A2B1: 974
3. A1B2: 964
4. A2B2: 1292
python alphafold3_stoichiometry_prediction.py
The script requires 4 arguments:
- input_fasta : file path to the target fasta file
- stoichiometries : comma separated valid stoichiometries intended to be tested
- output_path : desired output path for the results
- num_models : number of models(in the multiple of 5) intended to be generated for each stoichiometry
This script will run all the required steps and print the results upon completion. Also /path/to/output_dir/ will contain a directory named after the input fasta file name which will contain:
- input_jsons/ which contains the json files generated to be fed as inputs to AlphaFold3
- AlphaFold3 outputs generated for different stoichiometries
- stoichiometry_results.csv which contains a table of maximum and average ranking scores for each stoichiometry.
python alphafold3_stoichiometry_prediction.py --input_fasta /path/to/T0270o.fasta --stoichiometries A2,A3,A4,A5,A6 --output_path /path/to/output_dir --num_models 25
Example output for T0270o (True stoichiometry: A3):
Stoichiometry results for : T0270o
Stoichiometry, Maximum ranking score, Average ranking score, Number of models
A2,0.2917254023268046,0.22109988348999923,25
A3,0.7356659644178217,0.6597546325500054,25
A4,0.4619621540111602,0.4311053457765267,25
A5,0.5574578147810352,0.47328416184068417,25
A6,0.5455593923883584,0.4632293540739206,25
!!!!!!!!!!Final Selection!!!!!!!!!!
Stoichiometry with the highest Maximum ranking score: A3
Stoichiometry with the highest Average ranking score: A3
Target T0270o
python alphafold3_stoichiometry_prediction.py --input_fasta /path/to/H0208.fasta --stoichiometries A1B1,A1B2,A1B3,A2B1,A2B2,A2B3,A3B1,A3B2,A3B3 --output_path /path/to/output_dir --num_models 25
Example output for H0208 (True stoichiometry: A1B1):
Stoichiometry results for : H0208
Stoichiometry, Maximum ranking score, Average ranking score, Number of models
A1B1,0.9579082725495932,0.9449554292400113,25
A1B2,0.7371544130131423,0.5392889019139051,25
A1B3,0.3972130584449553,0.35706427566899235,25
A2B1,0.8581746709490917,0.7907662683364439,25
A2B2,0.8726062852868774,0.6886386376967493,25
A2B3,0.5666602198361169,-15.505672491186171,25
A3B1,0.7930352000013527,0.7656765671860422,25
A3B2,0.4979020342021519,0.40007180256585756,25
A3B3,0.4140460397693776,0.37972803742981975,25
!!!!!!!!!!Final Selection!!!!!!!!!!
Stoichiometry with the highest Maximum ranking score: A1B1
Stoichiometry with the highest Average ranking score: A1B1
Target H0208
The AlphaFold3 structural models used to determine the stoichiometry for Phase 0 targets in CASP16 can be found at: https://zenodo.org/records/14807606
Most parts of the PreStoi method used in the blind CASP16 experiment can be automated and have been implemented in this GitHub repository. The automated results can be reproduced according to the following steps: (1) download the CASP16 Phase 0 protein complex dataset from this repository; (2) run PreStoi to propose stoichiometry candidates (template_based_prediction.py), generate AlphaFold3 structural models for them, and rank and select stoichiometries (alphafold3_stoichiometry_prediction.py); (3) compare the stoichiometry prediction with the results in Table 1 in the manuscript describing PreStoi (https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3). You may also compare the AlphaFold3 structural models and their ranking scores that you obtain with the ones generated in the CASP16 experiment that are available at https://zenodo.org/records/14807606.
If you find this work useful, please cite:
Liu, J., Neupane, P., & Cheng, J. (2025). Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information. bioRxiv, 2025-01 (https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3)
@article {Liu2025.01.12.632663,
author = {Liu, Jian and Neupane, Pawan and Cheng, Jianlin},
title = {Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information},
elocation-id = {2025.01.12.632663},
year = {2025},
doi = {10.1101/2025.01.12.632663},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3},
journal = {bioRxiv}
}













