PreStoi

Predicting stoichiometry of protein complexes by integrating AlphaFold3 and templates.

This system was built and tested to predict the correct stoichiometry for targets released without stochiometry information (Phase 0) in the 16th worldwide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) concluded in December 2024.

Using this system, MULTICOM4 achieved remarkable success, ranking 1st in protein complex structure prediction for such targets.

The workflow of the Stoichiometry Prediction system utilized by MULTICOM4 in CASP16

This program handles both the template-based stoichiometry prediction and the AlphaFold3-based stoichiometry prediction part in the above diagram.

Installation and Configuration

System Requirements

Operating System: Linux (Ubuntu 22.04 LTS recommended)
Storage: Up to 1 TB (SSD recommended) for genetic databases
GPU: NVIDIA GPU with Compute Capability 8.0+ (e.g., A100 80 GB, H100 80 GB)
- Supports inputs up to 5,120 tokens on a single A100 or H100 (80 GB)
RAM: Minimum 64 GB (especially for long targets)

Installation Time and Runtime Estimation

Installation Time: Approximately 45 minutes to download and set up databases (without SSD).
Average Runtime: Varies by target length, intended stoichiometries, and GPU memory. Inputs with up to 5,120 tokens fit on an NVIDIA A100 or H100 (80 GB). Example : Rutime of CASP16 target H0208 Sequence Lengths : Chain A : 328, Chain B : 318
Number of models generated per stoichiometry : 25
Device Configuration for running PreStoi and AlphaFold3:- CPU : AMD EPYC 7643 2.3 Ghz, RAM : 512 GB, GPU : One Nvidia A100 (80 GB)
The average runtime of testing all 9 stoichiometry candidates (A1B1, A1B2, A1B3, A2B1, A2B2, A2B3, A3B1, A3B2, A3B3) is 266 minutes.

The PreStoi installation requires four steps:

1. Download PreStoi package

git clone https://github.com/jianlin-cheng/prestoi
cd prestoi

2. Install conda envoirment

You can create a virtual environment by yourself or install a fresh conda environment using the following commands:

wget "https://github.com/conda-forge/miniforge/releases/download/23.1.0-3/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh 
rm Mambaforge-$(uname)-$(uname -m).sh
source ~/.bashrc

Then run the following commands to install the required python packages:

mamba install -y -c bioconda hmmer
pip install pandas biopython requests

3. Download databases and tools to make template-based predictions

python download_template_database_and_tools.py

4. AlphaFold3 installation. (Skip to step 5 if AlphaFold3 has already been installed)

Begin with the installation of AlphaFold3 program using the following.

https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md

Test whether AlphaFold3 program is working properly. Once you have installed AlphaFold3, you can test your setup using e.g. the following input JSON file named fold_input.json:

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

You can then run AlphaFold 3 using the following command and check whether AlphaFold3 is running correctly:

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DATABASES_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

5. Configure databases and tools for Stoichiometry Prediction

Run the configure.py to create a config.json file

python configure.py --af3_program_path /path/to/alphafold3_program/ --af3_params_path /path/to/alphafold3_parameters/ --af3_db_path /path/to/alphafold3_databases/

This step will create a config.json file in the current working directory(prestoi) with the following information.

{
  "af3_program_path": "/path/to/alphafold3_program/",
  "af3_params_path": "/path/to/alphafold3_parameters/",
  "af3_db_path": "/path/to/alphafold3_databases/",
  "uniref90_path": "/path/to/uniref90",
  "hhsearch_binary": "/path/to/hhsearch",
  "hhmake_binary": "/path/to/hhmake",
  "hhdb_prefix": "/path/to/hhdb",
  "stoichiometry_pdb_fasta_folder": "/path/to/stoichiometry_pdb_fasta_folder",
  "pdb_stoichiometry_database": "/path/to/pdb_stoichiometry_database",
  "template_atom_dir": "/path/to/template_atom_dir",
}

Note: This step only requires running once. However, this can be run again if any of the above paths change. Make sure the paths are valid.

Inference

Run the template-based stoichiometry prediction

python template_based_prediction_v2.py

Inputs

Required
- --input_fasta: target complex FASTA (one entry per subunit)
- --output_path: output folder
Optional (recommended for speed / reproducibility)
- Provide precomputed MSAs/profiles per subunit to skip jackhmmer:
  - --a3m_map: JSON mapping subunit_id -> /path/to/subunit.a3m
  - --hmm_map: JSON mapping subunit_id -> /path/to/subunit.hmm
  - or --a3m_dir / --hmm_dir: directories containing files named <subunit>.a3m or <subunit>.hmm

Priority: HMM (if provided) > A3M (if provided) > jackhmmer (fallback).

Command

python template_based_prediction_v2.py \
  --input_fasta /path/to/target.fasta \
  --output_path /path/to/output_dir \
  --n_topn 10

This script will generate the possible number of copies for each subunit in the input FASTA file, generate stoichiometry candidates for AlphaFold3-based prediction, and provide template-based stoichiometry predictions when sufficient template evidence is available (e.g., a complex template that covers all subunits in the input FASTA)

Using per-subunit precomputed A3M/HMM (optional)

If template search or MSA generation has already been performed externally, PreStoi can reuse per-subunit precomputed A3M and/or HMM files instead of rerunning jackhmmer. The keys in the JSON map, or the filenames in directory mode, must exactly match the FASTA record IDs.

For example, if the input FASTA contains:

>H0208_A
SEQUENCE_A
>H0208_B
SEQUENCE_B

then the corresponding map keys or filenames should be H0208_A and H0208_B.

Option A: JSON maps

python template_based_prediction_v2.py \
  --input_fasta /path/to/H0208.fasta \
  --output_path /path/to/output_dir \
  --n_topn 10 \
  --a3m_map /path/to/a3m_map.json \
  --hmm_map /path/to/hmm_map.json

Example a3m_map.json:

{
  "H0208_A": "/msas/H0208_A.a3m",
  "H0208_B": "/msas/H0208_B.a3m"
}

Example hmm_map.json:

{
  "H0208_A": "/hmms/H0208_A.hmm",
  "H0208_B": "/hmms/H0208_B.hmm"
}

Option B: directory mode

python template_based_prediction_v2.py \
  --input_fasta /path/to/H0208.fasta \
  --output_path /path/to/output_dir \
  --n_topn 10 \
  --a3m_dir /msas \
  --hmm_dir /hmms

This expects files named according to the FASTA record IDs, for example:

/msas/H0208_A.a3m
/msas/H0208_B.a3m
/hmms/H0208_A.hmm
/hmms/H0208_B.hmm

When both are available, PreStoi uses inputs in the following priority order:

HMM > A3M > jackhmmer

Outputs

Example output for H0208 (True stoichiometry: A1B1):

======================================================================
 Ranked Copy Numbers Per Subunit (local per-template evidence)
======================================================================

▶ Subunit: H0208_B
  Rank |   Score (–log10(E))     | Copies
  -----------------------------------------
   1   |   218.975800         | 1
   2   |   93.003826          | 2

▶ Subunit: H0208_A
  Rank |   Score (–log10(E))     | Copies
  -----------------------------------------
   1   |   162.277588         | 1
   2   |   157.298376         | 2

======================================================================
 Ranked Stoichiometry Candidates (before coverage constraints)
======================================================================
 1. A1B1
 2. A2B1
 3. A1B2
 4. A2B2
Saved: casp16_results/H0208/ranked_stoichiometry_candidates.csv

======================================================================
 Templates covering ALL subunits
======================================================================
  6UXU: stoichiometry=A2
  4NUR: stoichiometry=A2
  6V73: stoichiometry=A2

======================================================================
 Templates covering MULTIPLE subunits
======================================================================
  6UXU: subunits=A,B, stoichiometry=A2
  4NUR: subunits=A,B, stoichiometry=A2
  6V73: subunits=A,B, stoichiometry=A2

======================================================================
 Stoichiometry Candidates AFTER coverage/multi-template constraints
======================================================================

=== Applying multi-template constraints to 3 templates ===

[1] Template 6UXU covers 2 subunits; stoich=A2; evidence_sum(-log10E)=67.75; minE=5.70e-35
  - Boost H0208_B: chain A -> copies=1 (+6.78e+10)
  - Boost H0208_A: chain A -> copies=1 (+6.78e+10)

[2] Template 6V73 covers 2 subunits; stoich=A2; evidence_sum(-log10E)=61.26; minE=1.50e-31
  - Boost H0208_A: chain A -> copies=1 (+3.06e+10)
  - Boost H0208_B: chain A -> copies=1 (+3.06e+10)

[3] Template 4NUR covers 2 subunits; stoich=A2; evidence_sum(-log10E)=59.44; minE=7.80e-32
  - Boost H0208_A: chain A -> copies=1 (+1.98e+10)
  - Boost H0208_B: chain A -> copies=1 (+1.98e+10)
 1. A1B1: 646
 2. A2B1: 974
 3. A1B2: 964
 4. A2B2: 1292

Run the alphafold3-based stoichiometry prediction

python alphafold3_stoichiometry_prediction.py

The script requires 4 arguments:

input_fasta : file path to the target fasta file
stoichiometries : comma separated valid stoichiometries intended to be tested
output_path : desired output path for the results
num_models : number of models(in the multiple of 5) intended to be generated for each stoichiometry

This script will run all the required steps and print the results upon completion. Also /path/to/output_dir/ will contain a directory named after the input fasta file name which will contain:

input_jsons/ which contains the json files generated to be fed as inputs to AlphaFold3
AlphaFold3 outputs generated for different stoichiometries
stoichiometry_results.csv which contains a table of maximum and average ranking scores for each stoichiometry.

Homomultimer Example

python alphafold3_stoichiometry_prediction.py --input_fasta /path/to/T0270o.fasta --stoichiometries A2,A3,A4,A5,A6 --output_path /path/to/output_dir  --num_models 25

Example output for T0270o (True stoichiometry: A3):

Stoichiometry results for :  T0270o

Stoichiometry, Maximum ranking score, Average ranking score, Number of models
A2,0.2917254023268046,0.22109988348999923,25
A3,0.7356659644178217,0.6597546325500054,25
A4,0.4619621540111602,0.4311053457765267,25
A5,0.5574578147810352,0.47328416184068417,25
A6,0.5455593923883584,0.4632293540739206,25

!!!!!!!!!!Final Selection!!!!!!!!!!

Stoichiometry with the highest Maximum ranking score: A3
Stoichiometry with the highest Average ranking score: A3

Target T0270o

Stoichiometries	Maximum Ranking Score	Average Ranking Score
A2	0.2917254023268046	0.22109988348999923
A3	0.7356659644178217	0.6597546325500054
A4	0.4619621540111602	0.4311053457765267
A5	0.5574578147810352	0.47328416184068417
A6	0.5455593923883584	0.4632293540739206

Heteromultimer Example

python alphafold3_stoichiometry_prediction.py --input_fasta /path/to/H0208.fasta --stoichiometries A1B1,A1B2,A1B3,A2B1,A2B2,A2B3,A3B1,A3B2,A3B3 --output_path /path/to/output_dir  --num_models 25

Example output for H0208 (True stoichiometry: A1B1):

Stoichiometry results for :  H0208

Stoichiometry, Maximum ranking score, Average ranking score, Number of models
A1B1,0.9579082725495932,0.9449554292400113,25
A1B2,0.7371544130131423,0.5392889019139051,25
A1B3,0.3972130584449553,0.35706427566899235,25
A2B1,0.8581746709490917,0.7907662683364439,25
A2B2,0.8726062852868774,0.6886386376967493,25
A2B3,0.5666602198361169,-15.505672491186171,25
A3B1,0.7930352000013527,0.7656765671860422,25
A3B2,0.4979020342021519,0.40007180256585756,25
A3B3,0.4140460397693776,0.37972803742981975,25

!!!!!!!!!!Final Selection!!!!!!!!!!

Stoichiometry with the highest Maximum ranking score: A1B1
Stoichiometry with the highest Average ranking score: A1B1

Target H0208

Stoichiometries	Maximum Ranking Score	Average Ranking Score
A1B1	0.9579082725495932	0.9449554292400113
A1B2	0.7371544130131423	0.5392889019139051
A1B3	0.3972130584449553	0.35706427566899235
A2B1	0.8581746709490917	0.7907662683364439
A2B2	0.8726062852868774	0.6886386376967493
A2B3	0.5666602198361169	-15.505672491186171
A3B1	0.7930352000013527	0.7656765671860422
A3B2	0.4979020342021519	0.40007180256585756
A3B3	0.4140460397693776	0.37972803742981975

CASP16 Phase 0 Structural Models Availability

The AlphaFold3 structural models used to determine the stoichiometry for Phase 0 targets in CASP16 can be found at: https://zenodo.org/records/14807606

Reproduction Instruction

Most parts of the PreStoi method used in the blind CASP16 experiment can be automated and have been implemented in this GitHub repository. The automated results can be reproduced according to the following steps: (1) download the CASP16 Phase 0 protein complex dataset from this repository; (2) run PreStoi to propose stoichiometry candidates (template_based_prediction.py), generate AlphaFold3 structural models for them, and rank and select stoichiometries (alphafold3_stoichiometry_prediction.py); (3) compare the stoichiometry prediction with the results in Table 1 in the manuscript describing PreStoi (https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3). You may also compare the AlphaFold3 structural models and their ranking scores that you obtain with the ones generated in the CASP16 experiment that are available at https://zenodo.org/records/14807606.

Citing This Work

If you find this work useful, please cite:

Liu, J., Neupane, P., & Cheng, J. (2025). Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information. bioRxiv, 2025-01 (https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3)

@article {Liu2025.01.12.632663,
	author = {Liu, Jian and Neupane, Pawan and Cheng, Jianlin},
	title = {Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information},
	elocation-id = {2025.01.12.632663},
	year = {2025},
	doi = {10.1101/2025.01.12.632663},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/10.1101/2025.01.12.632663v3},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
CASP16_Phase_0_dataset		CASP16_Phase_0_dataset
P16-BMS		P16-BMS
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alphafold3_stoichiometry_prediction.py		alphafold3_stoichiometry_prediction.py
configure.py		configure.py
download_template_database_and_tools.py		download_template_database_and_tools.py
parsers.py		parsers.py
protein_utils.py		protein_utils.py
template_based_prediction.py		template_based_prediction.py
template_based_prediction_v2.py		template_based_prediction_v2.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

PreStoi

Predicting stoichiometry of protein complexes by integrating AlphaFold3 and templates.

The workflow of the Stoichiometry Prediction system utilized by MULTICOM4 in CASP16

This program handles both the template-based stoichiometry prediction and the AlphaFold3-based stoichiometry prediction part in the above diagram.

Installation and Configuration

System Requirements

Installation Time and Runtime Estimation

The PreStoi installation requires four steps:

1. Download PreStoi package

2. Install conda envoirment

3. Download databases and tools to make template-based predictions

4. AlphaFold3 installation. (Skip to step 5 if AlphaFold3 has already been installed)

Begin with the installation of AlphaFold3 program using the following.

5. Configure databases and tools for Stoichiometry Prediction

Run the configure.py to create a config.json file

Inference

Run the template-based stoichiometry prediction

Inputs

Command

Using per-subunit precomputed A3M/HMM (optional)

Option A: JSON maps

Option B: directory mode

Outputs

Run the alphafold3-based stoichiometry prediction

Homomultimer Example

Heteromultimer Example

CASP16 Phase 0 Structural Models Availability

Reproduction Instruction

Citing This Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages