Skip to content

Commit 8cbb609

Browse files
refactor: clean up BrownDye2 AMBER+APBS workflow
- Notebook: remove unnecessary complex.pdb assembly and validation step; protein and ligand are now loaded separately in tleap - Shell script: fix tleap atom-name mismatch by using pdb4amber + separate protein/ligand loading instead of combined PDB; add PBRadii mbondi3 so ParmEd writes correct radii; use inputgen Python API to work around pdb2pqr<=3.7.1 --istrng type bug; strip boilerplate checks and verbose output
1 parent 2e7d1ab commit 8cbb609

2 files changed

Lines changed: 64 additions & 161 deletions

File tree

examples/browndye/complex_pqr.ipynb

Lines changed: 8 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -3,130 +3,43 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": [
7-
"# BrownDye2 preparation: protein-ligand complex\n",
8-
"\n",
9-
"0. Validate complex (chain IDs, SMILES)\n",
10-
"1. Fix protein with PDBFixer, assign ligand topology with RDKit\n",
11-
"2. Assemble complex PDB for tleap\n",
12-
"3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n",
13-
"4. ParmEd convert to complex.pqr\n",
14-
"5. APBS input generation using pdb2pqr inputgen\n",
15-
"6. Run APBS\n",
16-
"\n",
17-
"Steps 0-2 in this notebook, steps 3-6 via `scripts/browndye/run_amber_apbs.sh`."
18-
]
6+
"source": "# BrownDye2: complex PQR preparation\n\nGenerate a PQR file and APBS electrostatic potential map for a protein-ligand complex.\n\n**This notebook**: prepare protein and ligand inputs from a docked PDB.\n1. Fix protein with PDBFixer (add missing atoms, protonate at target pH)\n2. Assign ligand bond orders from SMILES, write SDF for antechamber\n\n**`run_amber_apbs.sh`**: parameterize and solve electrostatics.\n1. `pdb4amber` -- strip H/water, fix residue names for tleap\n2. `antechamber` + `parmchk2` -- GAFF2 atom types and AM1-BCC charges\n3. `tleap` -- combine protein + ligand into AMBER topology\n4. `ParmEd` -- convert prmtop/rst7 to PQR (with mbondi3 radii)\n5. `inputgen` -- generate APBS input from PQR dimensions\n6. `APBS` -- solve the linearized Poisson-Boltzmann equation"
197
},
208
{
219
"cell_type": "code",
2210
"execution_count": null,
2311
"metadata": {},
2412
"outputs": [],
25-
"source": "import copy\nfrom pathlib import Path\n\nfrom Bio.Data.IUPACData import protein_letters_3to1\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\n# User configuration\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True, parents=True)\n\n# Required: canonical SMILES of the ligand\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4"
13+
"source": "from pathlib import Path\n\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True)\n\n# Canonical SMILES of the ligand (used to assign bond orders)\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4"
2614
},
2715
{
2816
"cell_type": "markdown",
2917
"metadata": {},
30-
"source": [
31-
"## Step 0: Validate complex"
32-
]
18+
"source": "## Step 1: Fix protein\n\nExtract protein chain and add missing residues/atoms/hydrogens via PDBFixer."
3319
},
3420
{
3521
"cell_type": "code",
3622
"execution_count": null,
3723
"metadata": {},
3824
"outputs": [],
39-
"source": [
40-
"STANDARD_AA = {code.upper() for code in protein_letters_3to1}\n",
41-
"\n",
42-
"parser = PDBParser(QUIET=True)\n",
43-
"structure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\n",
44-
"model = structure[0]\n",
45-
"\n",
46-
"chains = {chain.id: chain for chain in model}\n",
47-
"assert PROTEIN_CHAIN_ID in chains, f\"Chain {PROTEIN_CHAIN_ID} not found. Available: {list(chains)}\"\n",
48-
"assert LIGAND_CHAIN_ID in chains, f\"Chain {LIGAND_CHAIN_ID} not found. Available: {list(chains)}\"\n",
49-
"\n",
50-
"# Summarize selected chains\n",
51-
"prot_chain = chains[PROTEIN_CHAIN_ID]\n",
52-
"prot_resnames = {\n",
53-
" res.get_resname().strip()\n",
54-
" for res in prot_chain.get_residues()\n",
55-
" if res.get_resname().strip() not in (\"HOH\", \"WAT\")\n",
56-
"}\n",
57-
"n_prot_res = sum(1 for _ in prot_chain.get_residues())\n",
58-
"non_standard = prot_resnames - STANDARD_AA\n",
59-
"if non_standard:\n",
60-
" print(f\"WARNING: chain {PROTEIN_CHAIN_ID} has non-standard residues: {non_standard}\")\n",
61-
"print(f\"Protein chain {PROTEIN_CHAIN_ID}: {n_prot_res} residues\")\n",
62-
"\n",
63-
"lig_chain = chains[LIGAND_CHAIN_ID]\n",
64-
"lig_resnames = {res.get_resname().strip() for res in lig_chain.get_residues()}\n",
65-
"assert len(lig_resnames) == 1, f\"Ligand chain {LIGAND_CHAIN_ID} has {len(lig_resnames)} residues\"\n",
66-
"LIG_RESNAME = next(iter(lig_resnames))\n",
67-
"n_lig_atoms = sum(1 for _ in lig_chain.get_atoms())\n",
68-
"print(f\"Ligand chain {LIGAND_CHAIN_ID}: residue(s) {lig_resnames}, {n_lig_atoms} atoms\")"
69-
]
25+
"source": "parser = PDBParser(QUIET=True)\nstructure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\nmodel = structure[0]\nchains = {chain.id: chain for chain in model}\n\npdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract and fix protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\n\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")"
7026
},
7127
{
7228
"cell_type": "markdown",
7329
"metadata": {},
74-
"source": [
75-
"## Step 1a: Fix protein"
76-
]
30+
"source": "## Step 2: Assign ligand topology\n\nExtract ligand chain, assign bond orders from a SMILES template, and write an SDF\nfor antechamber. The SMILES is needed because PDB files lack bond-order information."
7731
},
7832
{
7933
"cell_type": "code",
8034
"execution_count": null,
8135
"metadata": {},
8236
"outputs": [],
83-
"source": "pdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\nprint(f\"Extracted protein chain {PROTEIN_CHAIN_ID} -> {protein_pdb}\")\n\n# Fix protein (add missing residues, atoms, hydrogens)\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")"
37+
"source": "# Extract ligand chain to PDB for RDKit parsing\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\n\n# Assign bond orders from SMILES template\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"Ligand net charge: {ligand_net_charge}\")\n\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nmol_assigned = assign_topology(mol, template_mol)\n\n# Write SDF for antechamber\nlig_resnames = {res.get_resname().strip() for res in chains[LIGAND_CHAIN_ID].get_residues()}\nmol_assigned.SetProp(\"_Name\", next(iter(lig_resnames)))\n\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n w.write(mol_assigned)\nprint(f\"Ligand SDF -> {ligand_sdf}\")"
8438
},
8539
{
8640
"cell_type": "markdown",
87-
"metadata": {},
88-
"source": [
89-
"## Step 1b: Assign ligand topology"
90-
]
91-
},
92-
{
93-
"cell_type": "code",
94-
"execution_count": null,
95-
"metadata": {},
96-
"outputs": [],
97-
"source": "# Extract ligand chain\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\nprint(f\"Extracted ligand chain {LIGAND_CHAIN_ID} -> {ligand_pdb}\")\n\n# Validate SMILES and compute net charge\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nassert template_mol is not None, f\"Invalid SMILES: {LIGAND_SMILES}\"\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"SMILES: {Chem.MolToSmiles(template_mol)}\")\nprint(f\"Net charge: {ligand_net_charge}\")\n\n# Assign bond orders from SMILES template\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nassert mol is not None, f\"RDKit failed to parse {ligand_pdb}\"\nmol_assigned = assign_topology(mol, template_mol)\n\n# Set molecule name\nmol_assigned.SetProp(\"_Name\", LIG_RESNAME)\n\n# Write SDF for antechamber\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n w.write(mol_assigned)\nprint(f\"Wrote topology-assigned ligand -> {ligand_sdf}\")"
98-
},
99-
{
100-
"cell_type": "markdown",
101-
"metadata": {},
102-
"source": [
103-
"## Step 2: Assemble complex PDB for tleap"
104-
]
105-
},
106-
{
107-
"cell_type": "code",
108-
"execution_count": null,
109-
"metadata": {},
110-
"outputs": [],
111-
"source": "# Re-parse fixed protein and graft the ligand chain onto it\nfixed_struct = parser.get_structure(\"fixed\", str(protein_fixed_pdb))\nfixed_model = fixed_struct[0]\n\n# Add ligand chain from original structure\nlig_chain_copy = copy.deepcopy(chains[LIGAND_CHAIN_ID])\nfixed_model.add(lig_chain_copy)\n\n# Write combined complex\ncomplex_pdb = WORKDIR / \"complex.pdb\"\npdb_io.set_structure(fixed_struct)\npdb_io.save(str(complex_pdb), ChainSelect([PROTEIN_CHAIN_ID, LIGAND_CHAIN_ID]))\nprint(f\"Assembled complex -> {complex_pdb}\")"
112-
},
113-
{
114-
"cell_type": "markdown",
115-
"metadata": {},
116-
"source": [
117-
"## Steps 3-6: AmberTools, PQR, APBS\n",
118-
"\n",
119-
"Edit the constants at the top of `scripts/browndye/run_amber_apbs.sh`, then run:\n",
120-
"\n",
121-
"```bash\n",
122-
"bash scripts/browndye/run_amber_apbs.sh\n",
123-
"```\n",
124-
"\n",
125-
"3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n",
126-
"4. ParmEd convert to complex.pqr\n",
127-
"5. APBS input generation using pdb2pqr inputgen\n",
128-
"6. Run APBS"
129-
]
41+
"source": "## Next: run_amber_apbs.sh\n\nThe notebook produced `tmp/protein_fixed.pdb` and `tmp/ligand.sdf`. The shell script\npicks up from here -- it loads the protein and ligand **separately** into tleap\n(via `pdb4amber` and `antechamber`), combines them, and runs APBS.\n\n```bash\nconda activate ambertools\ncd examples/browndye && bash run_amber_apbs.sh\n```\n\nOutputs in `tmp/`:\n\n| File | Description |\n|------|-------------|\n| `complex.prmtop` | AMBER topology |\n| `complex.rst7` | AMBER coordinates |\n| `complex.pqr` | PQR with AM1-BCC charges and mbondi3 radii |\n| `complex.in` | APBS input (mg-auto, LPBE) |\n| `complex.dx` | Electrostatic potential map (OpenDX) |",
42+
"metadata": {}
13043
}
13144
],
13245
"metadata": {
Lines changed: 56 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,98 @@
11
#!/usr/bin/env bash
2-
# run_amber_apbs.sh - Steps 3-6 of BrownDye2 complex PQR preparation
2+
# run_amber_apbs.sh - AMBER parameterization + APBS electrostatics
33
#
4-
# Expects in WORKDIR: complex.pdb, ligand.sdf, ligand.pdb
5-
# Produces: complex.prmtop, complex.rst7, complex.pqr, complex.in, complex.dx
4+
# Prerequisite: run complex_pqr.ipynb first to produce protein_fixed.pdb and ligand.sdf
5+
# in WORKDIR.
6+
#
7+
# Pipeline:
8+
# 1. pdb4amber - strip H and water, fix residue names for tleap
9+
# 2. antechamber - assign GAFF2 atom types and AM1-BCC charges to ligand
10+
# 3. tleap - combine protein + ligand, write AMBER topology
11+
# 4. ParmEd - convert prmtop/rst7 to PQR (charges + mbondi3 radii)
12+
# 5. inputgen - generate APBS input from PQR dimensions
13+
# 6. APBS - solve linearized Poisson-Boltzmann equation
14+
#
15+
# Usage:
16+
# conda activate ambertools
17+
# cd examples/browndye && bash run_amber_apbs.sh
618

719
set -euo pipefail
820

9-
# Configurations
21+
# ── Configuration ───────────────────────────────────────────────────────────
1022
WORKDIR="tmp"
1123
LIG_RESNAME="LIG"
1224
NET_CHARGE="-1"
1325
IONIC_STRENGTH="0.150"
1426
PROTEIN_FF="leaprc.protein.ff19SB"
1527
LIGAND_FF="leaprc.gaff2"
16-
17-
# Check required commands
18-
for cmd in obabel antechamber parmchk2 tleap python3 inputgen apbs; do
19-
if ! command -v "$cmd" >/dev/null 2>&1; then
20-
echo "ERROR: $cmd not found" >&2
21-
exit 1
22-
fi
23-
done
28+
PB_RADII="mbondi3"
2429

2530
cd "$WORKDIR"
2631

27-
for f in complex.pdb ligand.sdf ligand.pdb; do
28-
if [[ ! -f "$f" ]]; then
29-
echo "ERROR: required file not found: $WORKDIR/$f" >&2
30-
exit 1
31-
fi
32-
done
32+
# ── 1. pdb4amber ────────────────────────────────────────────────────────────
33+
echo "=== 1. pdb4amber ==="
34+
pdb4amber -i protein_fixed.pdb -o protein_amber.pdb -y -d --no-conect
3335

34-
# Step 3: AmberTools parameterization
35-
echo "=== Step 3: AmberTools parameterization ==="
36-
37-
echo "--- antechamber ---"
36+
# ── 2. Ligand parameterization ──────────────────────────────────────────────
37+
echo "=== 2. antechamber + parmchk2 ==="
3838
obabel ligand.sdf -O ligand_seed.mol2
39+
sed -i "s/UNL1/${LIG_RESNAME}/g" ligand_seed.mol2
3940

4041
antechamber \
41-
-i ligand_seed.mol2 \
42-
-fi mol2 \
43-
-o ligand_amber.mol2 \
44-
-fo mol2 \
45-
-c bcc \
46-
-s 2 \
47-
-at gaff2 \
48-
-nc "$NET_CHARGE" \
49-
-rn "$LIG_RESNAME" \
50-
-an n \
51-
-a ligand.pdb \
52-
-fa pdb \
53-
-ao name
42+
-i ligand_seed.mol2 -fi mol2 \
43+
-o ligand_amber.mol2 -fo mol2 \
44+
-c bcc -s 2 -at gaff2 \
45+
-nc "$NET_CHARGE" -rn "$LIG_RESNAME"
5446

55-
echo "--- parmchk2 ---"
5647
parmchk2 -i ligand_amber.mol2 -f mol2 -o ligand.frcmod
5748

58-
echo "--- tleap ---"
49+
# ── 3. tleap ────────────────────────────────────────────────────────────────
50+
echo "=== 3. tleap ==="
5951
cat >tleap.in <<EOF
6052
source $PROTEIN_FF
6153
source $LIGAND_FF
6254
6355
$LIG_RESNAME = loadmol2 ligand_amber.mol2
6456
loadamberparams ligand.frcmod
57+
protein = loadpdb protein_amber.pdb
58+
complex = combine {protein $LIG_RESNAME}
6559
66-
complex = loadpdb complex.pdb
67-
check complex
60+
set default PBRadii $PB_RADII
6861
saveamberparm complex complex.prmtop complex.rst7
69-
savepdb complex complex_from_tleap.pdb
7062
quit
7163
EOF
72-
7364
tleap -f tleap.in
7465

75-
# Step 4: ParmEd prmtop/rst7 -> PQR
76-
echo "=== Step 4: ParmEd -> PQR ==="
77-
66+
# ── 4. ParmEd -> PQR ───────────────────────────────────────────────────────
67+
echo "=== 4. ParmEd -> PQR ==="
7868
python3 -c "
7969
import parmed as pmd
8070
parm = pmd.load_file('complex.prmtop', xyz='complex.rst7')
8171
parm.save('complex.pqr', overwrite=True)
8272
"
8373

84-
# Step 5: APBS input generation via pdb2pqr inputgen
85-
echo "=== Step 5: APBS input generation ==="
86-
87-
inputgen "--istrng=${IONIC_STRENGTH}" --potdx complex.pqr
88-
echo "Generated APBS input from complex.pqr"
89-
90-
# inputgen writes <stem>.in next to the PQR
91-
APBS_IN="complex.in"
92-
if [[ ! -f "$APBS_IN" ]]; then
93-
# Fall back to newest .in file in case of different naming
94-
APBS_IN="$(ls -1t ./*.in 2>/dev/null | head -n 1 || true)"
95-
if [[ -z "$APBS_IN" || ! -f "$APBS_IN" ]]; then
96-
echo "ERROR: inputgen did not produce an APBS input file" >&2
97-
exit 1
98-
fi
99-
fi
100-
echo "Using APBS input: $APBS_IN"
101-
102-
# Step 6: Run APBS
103-
echo "=== Step 6: Run APBS ==="
74+
# ── 5. APBS input generation ───────────────────────────────────────────────
75+
echo "=== 5. inputgen ==="
76+
# inputgen CLI has a bug in pdb2pqr<=3.7.1: --istrng is parsed as str, not float.
77+
# Use the Python API directly.
78+
python3 -c "
79+
from pdb2pqr.inputgen import Input
80+
from pdb2pqr.psize import Psize
81+
82+
size = Psize()
83+
size.run_psize('complex.pqr')
84+
inp = Input('complex.pqr', size, method='mg-auto', asyncflag=False,
85+
istrng=${IONIC_STRENGTH}, potdx=True)
86+
inp.print_input_files('complex.in')
87+
"
88+
# Fix DX output stem: inputgen writes 'write pot dx complex.pqr' -> APBS would
89+
# produce complex.pqr.dx; change to 'complex' so output is complex-PE0.dx.
90+
sed -i 's|write pot dx complex\.pqr|write pot dx complex|' complex.in
10491

105-
apbs "$APBS_IN"
92+
# ── 6. APBS ────────────────────────────────────────────────────────────────
93+
echo "=== 6. APBS ==="
94+
apbs complex.in 2>&1 | tee apbs.log
95+
mv complex-PE0.dx complex.dx
10696

10797
echo "=== Done ==="
108-
ls -lh complex.prmtop complex.rst7 complex.pqr "$APBS_IN" complex*.dx 2>/dev/null || true
98+
ls -lh complex.prmtop complex.rst7 complex.pqr complex.in complex.dx

0 commit comments

Comments
 (0)