refactor: clean up BrownDye2 AMBER+APBS workflow

FridrichMethod · FridrichMethod · commit 8cbb609bdea3 · 2026-04-04T05:07:29.000-07:00
- Notebook: remove unnecessary complex.pdb assembly and validation
  step; protein and ligand are now loaded separately in tleap
- Shell script: fix tleap atom-name mismatch by using pdb4amber +
  separate protein/ligand loading instead of combined PDB; add
  PBRadii mbondi3 so ParmEd writes correct radii; use inputgen
  Python API to work around pdb2pqr&lt;=3.7.1 --istrng type bug;
  strip boilerplate checks and verbose output
diff --git a/examples/browndye/complex_pqr.ipynb b/examples/browndye/complex_pqr.ipynb
@@ -3,130 +3,43 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "# BrownDye2 preparation: protein-ligand complex\n",
-    "\n",
-    "0. Validate complex (chain IDs, SMILES)\n",
-    "1. Fix protein with PDBFixer, assign ligand topology with RDKit\n",
-    "2. Assemble complex PDB for tleap\n",
-    "3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n",
-    "4. ParmEd convert to complex.pqr\n",
-    "5. APBS input generation using pdb2pqr inputgen\n",
-    "6. Run APBS\n",
-    "\n",
-    "Steps 0-2 in this notebook, steps 3-6 via `scripts/browndye/run_amber_apbs.sh`."
-   ]
+   "source": "# BrownDye2: complex PQR preparation\n\nGenerate a PQR file and APBS electrostatic potential map for a protein-ligand complex.\n\n**This notebook**: prepare protein and ligand inputs from a docked PDB.\n1. Fix protein with PDBFixer (add missing atoms, protonate at target pH)\n2. Assign ligand bond orders from SMILES, write SDF for antechamber\n\n**`run_amber_apbs.sh`**: parameterize and solve electrostatics.\n1. `pdb4amber` -- strip H/water, fix residue names for tleap\n2. `antechamber` + `parmchk2` -- GAFF2 atom types and AM1-BCC charges\n3. `tleap` -- combine protein + ligand into AMBER topology\n4. `ParmEd` -- convert prmtop/rst7 to PQR (with mbondi3 radii)\n5. `inputgen` -- generate APBS input from PQR dimensions\n6. `APBS` -- solve the linearized Poisson-Boltzmann equation"
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "import copy\nfrom pathlib import Path\n\nfrom Bio.Data.IUPACData import protein_letters_3to1\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\n# User configuration\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True, parents=True)\n\n# Required: canonical SMILES of the ligand\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4"
+   "source": "from pathlib import Path\n\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True)\n\n# Canonical SMILES of the ligand (used to assign bond orders)\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4"
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Step 0: Validate complex"
-   ]
+   "source": "## Step 1: Fix protein\n\nExtract protein chain and add missing residues/atoms/hydrogens via PDBFixer."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "STANDARD_AA = {code.upper() for code in protein_letters_3to1}\n",
-    "\n",
-    "parser = PDBParser(QUIET=True)\n",
-    "structure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\n",
-    "model = structure[0]\n",
-    "\n",
-    "chains = {chain.id: chain for chain in model}\n",
-    "assert PROTEIN_CHAIN_ID in chains, f\"Chain {PROTEIN_CHAIN_ID} not found. Available: {list(chains)}\"\n",
-    "assert LIGAND_CHAIN_ID in chains, f\"Chain {LIGAND_CHAIN_ID} not found. Available: {list(chains)}\"\n",
-    "\n",
-    "# Summarize selected chains\n",
-    "prot_chain = chains[PROTEIN_CHAIN_ID]\n",
-    "prot_resnames = {\n",
-    "    res.get_resname().strip()\n",
-    "    for res in prot_chain.get_residues()\n",
-    "    if res.get_resname().strip() not in (\"HOH\", \"WAT\")\n",
-    "}\n",
-    "n_prot_res = sum(1 for _ in prot_chain.get_residues())\n",
-    "non_standard = prot_resnames - STANDARD_AA\n",
-    "if non_standard:\n",
-    "    print(f\"WARNING: chain {PROTEIN_CHAIN_ID} has non-standard residues: {non_standard}\")\n",
-    "print(f\"Protein chain {PROTEIN_CHAIN_ID}: {n_prot_res} residues\")\n",
-    "\n",
-    "lig_chain = chains[LIGAND_CHAIN_ID]\n",
-    "lig_resnames = {res.get_resname().strip() for res in lig_chain.get_residues()}\n",
-    "assert len(lig_resnames) == 1, f\"Ligand chain {LIGAND_CHAIN_ID} has {len(lig_resnames)} residues\"\n",
-    "LIG_RESNAME = next(iter(lig_resnames))\n",
-    "n_lig_atoms = sum(1 for _ in lig_chain.get_atoms())\n",
-    "print(f\"Ligand chain {LIGAND_CHAIN_ID}: residue(s) {lig_resnames}, {n_lig_atoms} atoms\")"
-   ]
+   "source": "parser = PDBParser(QUIET=True)\nstructure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\nmodel = structure[0]\nchains = {chain.id: chain for chain in model}\n\npdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract and fix protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\n\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")"
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Step 1a: Fix protein"
-   ]
+   "source": "## Step 2: Assign ligand topology\n\nExtract ligand chain, assign bond orders from a SMILES template, and write an SDF\nfor antechamber. The SMILES is needed because PDB files lack bond-order information."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "pdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\nprint(f\"Extracted protein chain {PROTEIN_CHAIN_ID} -> {protein_pdb}\")\n\n# Fix protein (add missing residues, atoms, hydrogens)\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")"
+   "source": "# Extract ligand chain to PDB for RDKit parsing\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\n\n# Assign bond orders from SMILES template\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"Ligand net charge: {ligand_net_charge}\")\n\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nmol_assigned = assign_topology(mol, template_mol)\n\n# Write SDF for antechamber\nlig_resnames = {res.get_resname().strip() for res in chains[LIGAND_CHAIN_ID].get_residues()}\nmol_assigned.SetProp(\"_Name\", next(iter(lig_resnames)))\n\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n    w.write(mol_assigned)\nprint(f\"Ligand SDF -> {ligand_sdf}\")"
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Step 1b: Assign ligand topology"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": "# Extract ligand chain\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\nprint(f\"Extracted ligand chain {LIGAND_CHAIN_ID} -> {ligand_pdb}\")\n\n# Validate SMILES and compute net charge\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nassert template_mol is not None, f\"Invalid SMILES: {LIGAND_SMILES}\"\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"SMILES: {Chem.MolToSmiles(template_mol)}\")\nprint(f\"Net charge: {ligand_net_charge}\")\n\n# Assign bond orders from SMILES template\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nassert mol is not None, f\"RDKit failed to parse {ligand_pdb}\"\nmol_assigned = assign_topology(mol, template_mol)\n\n# Set molecule name\nmol_assigned.SetProp(\"_Name\", LIG_RESNAME)\n\n# Write SDF for antechamber\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n    w.write(mol_assigned)\nprint(f\"Wrote topology-assigned ligand -> {ligand_sdf}\")"
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Step 2: Assemble complex PDB for tleap"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": "# Re-parse fixed protein and graft the ligand chain onto it\nfixed_struct = parser.get_structure(\"fixed\", str(protein_fixed_pdb))\nfixed_model = fixed_struct[0]\n\n# Add ligand chain from original structure\nlig_chain_copy = copy.deepcopy(chains[LIGAND_CHAIN_ID])\nfixed_model.add(lig_chain_copy)\n\n# Write combined complex\ncomplex_pdb = WORKDIR / \"complex.pdb\"\npdb_io.set_structure(fixed_struct)\npdb_io.save(str(complex_pdb), ChainSelect([PROTEIN_CHAIN_ID, LIGAND_CHAIN_ID]))\nprint(f\"Assembled complex -> {complex_pdb}\")"
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Steps 3-6: AmberTools, PQR, APBS\n",
-    "\n",
-    "Edit the constants at the top of `scripts/browndye/run_amber_apbs.sh`, then run:\n",
-    "\n",
-    "```bash\n",
-    "bash scripts/browndye/run_amber_apbs.sh\n",
-    "```\n",
-    "\n",
-    "3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n",
-    "4. ParmEd convert to complex.pqr\n",
-    "5. APBS input generation using pdb2pqr inputgen\n",
-    "6. Run APBS"
-   ]
+   "source": "## Next: run_amber_apbs.sh\n\nThe notebook produced `tmp/protein_fixed.pdb` and `tmp/ligand.sdf`. The shell script\npicks up from here -- it loads the protein and ligand **separately** into tleap\n(via `pdb4amber` and `antechamber`), combines them, and runs APBS.\n\n```bash\nconda activate ambertools\ncd examples/browndye && bash run_amber_apbs.sh\n```\n\nOutputs in `tmp/`:\n\n| File | Description |\n|------|-------------|\n| `complex.prmtop` | AMBER topology |\n| `complex.rst7` | AMBER coordinates |\n| `complex.pqr` | PQR with AM1-BCC charges and mbondi3 radii |\n| `complex.in` | APBS input (mg-auto, LPBE) |\n| `complex.dx` | Electrostatic potential map (OpenDX) |",
+   "metadata": {}
   }
  ],
  "metadata": {
diff --git a/examples/browndye/run_amber_apbs.sh b/examples/browndye/run_amber_apbs.sh
@@ -1,108 +1,98 @@
 #!/usr/bin/env bash
-# run_amber_apbs.sh - Steps 3-6 of BrownDye2 complex PQR preparation
+# run_amber_apbs.sh - AMBER parameterization + APBS electrostatics
 #
-# Expects in WORKDIR: complex.pdb, ligand.sdf, ligand.pdb
-# Produces: complex.prmtop, complex.rst7, complex.pqr, complex.in, complex.dx
+# Prerequisite: run complex_pqr.ipynb first to produce protein_fixed.pdb and ligand.sdf
+# in WORKDIR.
+#
+# Pipeline:
+#   1. pdb4amber   - strip H and water, fix residue names for tleap
+#   2. antechamber - assign GAFF2 atom types and AM1-BCC charges to ligand
+#   3. tleap       - combine protein + ligand, write AMBER topology
+#   4. ParmEd      - convert prmtop/rst7 to PQR (charges + mbondi3 radii)
+#   5. inputgen    - generate APBS input from PQR dimensions
+#   6. APBS        - solve linearized Poisson-Boltzmann equation
+#
+# Usage:
+#   conda activate ambertools
+#   cd examples/browndye && bash run_amber_apbs.sh
 
 set -euo pipefail
 
-# Configurations
+# ── Configuration ───────────────────────────────────────────────────────────
 WORKDIR="tmp"
 LIG_RESNAME="LIG"
 NET_CHARGE="-1"
 IONIC_STRENGTH="0.150"
 PROTEIN_FF="leaprc.protein.ff19SB"
 LIGAND_FF="leaprc.gaff2"
-
-# Check required commands
-for cmd in obabel antechamber parmchk2 tleap python3 inputgen apbs; do
-    if ! command -v "$cmd" >/dev/null 2>&1; then
-        echo "ERROR: $cmd not found" >&2
-        exit 1
-    fi
-done
+PB_RADII="mbondi3"
 
 cd "$WORKDIR"
 
-for f in complex.pdb ligand.sdf ligand.pdb; do
-    if [[ ! -f "$f" ]]; then
-        echo "ERROR: required file not found: $WORKDIR/$f" >&2
-        exit 1
-    fi
-done
+# ── 1. pdb4amber ────────────────────────────────────────────────────────────
+echo "=== 1. pdb4amber ==="
+pdb4amber -i protein_fixed.pdb -o protein_amber.pdb -y -d --no-conect
 
-# Step 3: AmberTools parameterization
-echo "=== Step 3: AmberTools parameterization ==="
-
-echo "--- antechamber ---"
+# ── 2. Ligand parameterization ──────────────────────────────────────────────
+echo "=== 2. antechamber + parmchk2 ==="
 obabel ligand.sdf -O ligand_seed.mol2
+sed -i "s/UNL1/${LIG_RESNAME}/g" ligand_seed.mol2
 
 antechamber \
-    -i ligand_seed.mol2 \
-    -fi mol2 \
-    -o ligand_amber.mol2 \
-    -fo mol2 \
-    -c bcc \
-    -s 2 \
-    -at gaff2 \
-    -nc "$NET_CHARGE" \
-    -rn "$LIG_RESNAME" \
-    -an n \
-    -a ligand.pdb \
-    -fa pdb \
-    -ao name
+    -i ligand_seed.mol2 -fi mol2 \
+    -o ligand_amber.mol2 -fo mol2 \
+    -c bcc -s 2 -at gaff2 \
+    -nc "$NET_CHARGE" -rn "$LIG_RESNAME"
 
-echo "--- parmchk2 ---"
 parmchk2 -i ligand_amber.mol2 -f mol2 -o ligand.frcmod
 
-echo "--- tleap ---"
+# ── 3. tleap ────────────────────────────────────────────────────────────────
+echo "=== 3. tleap ==="
 cat >tleap.in <<EOF
 source $PROTEIN_FF
 source $LIGAND_FF
 
 $LIG_RESNAME = loadmol2 ligand_amber.mol2
 loadamberparams ligand.frcmod
+protein = loadpdb protein_amber.pdb
+complex = combine {protein $LIG_RESNAME}
 
-complex = loadpdb complex.pdb
-check complex
+set default PBRadii $PB_RADII
 saveamberparm complex complex.prmtop complex.rst7
-savepdb complex complex_from_tleap.pdb
 quit
 EOF
-
 tleap -f tleap.in
 
-# Step 4: ParmEd prmtop/rst7 -> PQR
-echo "=== Step 4: ParmEd -> PQR ==="
-
+# ── 4. ParmEd -> PQR ───────────────────────────────────────────────────────
+echo "=== 4. ParmEd -> PQR ==="
 python3 -c "
 import parmed as pmd
 parm = pmd.load_file('complex.prmtop', xyz='complex.rst7')
 parm.save('complex.pqr', overwrite=True)
 "
 
-# Step 5: APBS input generation via pdb2pqr inputgen
-echo "=== Step 5: APBS input generation ==="
-
-inputgen "--istrng=${IONIC_STRENGTH}" --potdx complex.pqr
-echo "Generated APBS input from complex.pqr"
-
-# inputgen writes <stem>.in next to the PQR
-APBS_IN="complex.in"
-if [[ ! -f "$APBS_IN" ]]; then
-    # Fall back to newest .in file in case of different naming
-    APBS_IN="$(ls -1t ./*.in 2>/dev/null | head -n 1 || true)"
-    if [[ -z "$APBS_IN" || ! -f "$APBS_IN" ]]; then
-        echo "ERROR: inputgen did not produce an APBS input file" >&2
-        exit 1
-    fi
-fi
-echo "Using APBS input: $APBS_IN"
-
-# Step 6: Run APBS
-echo "=== Step 6: Run APBS ==="
+# ── 5. APBS input generation ───────────────────────────────────────────────
+echo "=== 5. inputgen ==="
+# inputgen CLI has a bug in pdb2pqr<=3.7.1: --istrng is parsed as str, not float.
+# Use the Python API directly.
+python3 -c "
+from pdb2pqr.inputgen import Input
+from pdb2pqr.psize import Psize
+
+size = Psize()
+size.run_psize('complex.pqr')
+inp = Input('complex.pqr', size, method='mg-auto', asyncflag=False,
+            istrng=${IONIC_STRENGTH}, potdx=True)
+inp.print_input_files('complex.in')
+"
+# Fix DX output stem: inputgen writes 'write pot dx complex.pqr' -> APBS would
+# produce complex.pqr.dx; change to 'complex' so output is complex-PE0.dx.
+sed -i 's|write pot dx complex\.pqr|write pot dx complex|' complex.in
 
-apbs "$APBS_IN"
+# ── 6. APBS ────────────────────────────────────────────────────────────────
+echo "=== 6. APBS ==="
+apbs complex.in 2>&1 | tee apbs.log
+mv complex-PE0.dx complex.dx
 
 echo "=== Done ==="
-ls -lh complex.prmtop complex.rst7 complex.pqr "$APBS_IN" complex*.dx 2>/dev/null || true
+ls -lh complex.prmtop complex.rst7 complex.pqr complex.in complex.dx