|
3 | 3 | { |
4 | 4 | "cell_type": "markdown", |
5 | 5 | "metadata": {}, |
6 | | - "source": [ |
7 | | - "# BrownDye2 preparation: protein-ligand complex\n", |
8 | | - "\n", |
9 | | - "0. Validate complex (chain IDs, SMILES)\n", |
10 | | - "1. Fix protein with PDBFixer, assign ligand topology with RDKit\n", |
11 | | - "2. Assemble complex PDB for tleap\n", |
12 | | - "3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n", |
13 | | - "4. ParmEd convert to complex.pqr\n", |
14 | | - "5. APBS input generation using pdb2pqr inputgen\n", |
15 | | - "6. Run APBS\n", |
16 | | - "\n", |
17 | | - "Steps 0-2 in this notebook, steps 3-6 via `scripts/browndye/run_amber_apbs.sh`." |
18 | | - ] |
| 6 | + "source": "# BrownDye2: complex PQR preparation\n\nGenerate a PQR file and APBS electrostatic potential map for a protein-ligand complex.\n\n**This notebook**: prepare protein and ligand inputs from a docked PDB.\n1. Fix protein with PDBFixer (add missing atoms, protonate at target pH)\n2. Assign ligand bond orders from SMILES, write SDF for antechamber\n\n**`run_amber_apbs.sh`**: parameterize and solve electrostatics.\n1. `pdb4amber` -- strip H/water, fix residue names for tleap\n2. `antechamber` + `parmchk2` -- GAFF2 atom types and AM1-BCC charges\n3. `tleap` -- combine protein + ligand into AMBER topology\n4. `ParmEd` -- convert prmtop/rst7 to PQR (with mbondi3 radii)\n5. `inputgen` -- generate APBS input from PQR dimensions\n6. `APBS` -- solve the linearized Poisson-Boltzmann equation" |
19 | 7 | }, |
20 | 8 | { |
21 | 9 | "cell_type": "code", |
22 | 10 | "execution_count": null, |
23 | 11 | "metadata": {}, |
24 | 12 | "outputs": [], |
25 | | - "source": "import copy\nfrom pathlib import Path\n\nfrom Bio.Data.IUPACData import protein_letters_3to1\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\n# User configuration\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True, parents=True)\n\n# Required: canonical SMILES of the ligand\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4" |
| 13 | + "source": "from pathlib import Path\n\nfrom Bio.PDB import PDBIO, PDBParser\nfrom rdkit import Chem\n\nfrom mdpp.prep import ChainSelect, assign_topology, fix_pdb\n\nCOMPLEX_PDB = Path(\"TurboID-bioAMP_model_0.pdb\")\nWORKDIR = Path(\"tmp\")\nWORKDIR.mkdir(exist_ok=True)\n\n# Canonical SMILES of the ligand (used to assign bond orders)\nLIGAND_SMILES = r\"Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P]([O-])(=O)OC(=O)CCCC[C@@H]4SC[C@@H]5NC(=O)N[C@H]45)[C@@H](O)[C@H]3O\"\nPROTEIN_CHAIN_ID = \"A\"\nLIGAND_CHAIN_ID = \"B\"\nPH = 7.4" |
26 | 14 | }, |
27 | 15 | { |
28 | 16 | "cell_type": "markdown", |
29 | 17 | "metadata": {}, |
30 | | - "source": [ |
31 | | - "## Step 0: Validate complex" |
32 | | - ] |
| 18 | + "source": "## Step 1: Fix protein\n\nExtract protein chain and add missing residues/atoms/hydrogens via PDBFixer." |
33 | 19 | }, |
34 | 20 | { |
35 | 21 | "cell_type": "code", |
36 | 22 | "execution_count": null, |
37 | 23 | "metadata": {}, |
38 | 24 | "outputs": [], |
39 | | - "source": [ |
40 | | - "STANDARD_AA = {code.upper() for code in protein_letters_3to1}\n", |
41 | | - "\n", |
42 | | - "parser = PDBParser(QUIET=True)\n", |
43 | | - "structure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\n", |
44 | | - "model = structure[0]\n", |
45 | | - "\n", |
46 | | - "chains = {chain.id: chain for chain in model}\n", |
47 | | - "assert PROTEIN_CHAIN_ID in chains, f\"Chain {PROTEIN_CHAIN_ID} not found. Available: {list(chains)}\"\n", |
48 | | - "assert LIGAND_CHAIN_ID in chains, f\"Chain {LIGAND_CHAIN_ID} not found. Available: {list(chains)}\"\n", |
49 | | - "\n", |
50 | | - "# Summarize selected chains\n", |
51 | | - "prot_chain = chains[PROTEIN_CHAIN_ID]\n", |
52 | | - "prot_resnames = {\n", |
53 | | - " res.get_resname().strip()\n", |
54 | | - " for res in prot_chain.get_residues()\n", |
55 | | - " if res.get_resname().strip() not in (\"HOH\", \"WAT\")\n", |
56 | | - "}\n", |
57 | | - "n_prot_res = sum(1 for _ in prot_chain.get_residues())\n", |
58 | | - "non_standard = prot_resnames - STANDARD_AA\n", |
59 | | - "if non_standard:\n", |
60 | | - " print(f\"WARNING: chain {PROTEIN_CHAIN_ID} has non-standard residues: {non_standard}\")\n", |
61 | | - "print(f\"Protein chain {PROTEIN_CHAIN_ID}: {n_prot_res} residues\")\n", |
62 | | - "\n", |
63 | | - "lig_chain = chains[LIGAND_CHAIN_ID]\n", |
64 | | - "lig_resnames = {res.get_resname().strip() for res in lig_chain.get_residues()}\n", |
65 | | - "assert len(lig_resnames) == 1, f\"Ligand chain {LIGAND_CHAIN_ID} has {len(lig_resnames)} residues\"\n", |
66 | | - "LIG_RESNAME = next(iter(lig_resnames))\n", |
67 | | - "n_lig_atoms = sum(1 for _ in lig_chain.get_atoms())\n", |
68 | | - "print(f\"Ligand chain {LIGAND_CHAIN_ID}: residue(s) {lig_resnames}, {n_lig_atoms} atoms\")" |
69 | | - ] |
| 25 | + "source": "parser = PDBParser(QUIET=True)\nstructure = parser.get_structure(\"complex\", str(COMPLEX_PDB))\nmodel = structure[0]\nchains = {chain.id: chain for chain in model}\n\npdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract and fix protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\n\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")" |
70 | 26 | }, |
71 | 27 | { |
72 | 28 | "cell_type": "markdown", |
73 | 29 | "metadata": {}, |
74 | | - "source": [ |
75 | | - "## Step 1a: Fix protein" |
76 | | - ] |
| 30 | + "source": "## Step 2: Assign ligand topology\n\nExtract ligand chain, assign bond orders from a SMILES template, and write an SDF\nfor antechamber. The SMILES is needed because PDB files lack bond-order information." |
77 | 31 | }, |
78 | 32 | { |
79 | 33 | "cell_type": "code", |
80 | 34 | "execution_count": null, |
81 | 35 | "metadata": {}, |
82 | 36 | "outputs": [], |
83 | | - "source": "pdb_io = PDBIO()\npdb_io.set_structure(structure)\n\n# Extract protein chain\nprotein_pdb = WORKDIR / \"protein.pdb\"\npdb_io.save(str(protein_pdb), ChainSelect(PROTEIN_CHAIN_ID))\nprint(f\"Extracted protein chain {PROTEIN_CHAIN_ID} -> {protein_pdb}\")\n\n# Fix protein (add missing residues, atoms, hydrogens)\nprotein_fixed_pdb = WORKDIR / \"protein_fixed.pdb\"\nfix_pdb(protein_pdb, protein_fixed_pdb, pH=PH)\nprint(f\"Fixed protein -> {protein_fixed_pdb}\")" |
| 37 | + "source": "# Extract ligand chain to PDB for RDKit parsing\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\n\n# Assign bond orders from SMILES template\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"Ligand net charge: {ligand_net_charge}\")\n\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nmol_assigned = assign_topology(mol, template_mol)\n\n# Write SDF for antechamber\nlig_resnames = {res.get_resname().strip() for res in chains[LIGAND_CHAIN_ID].get_residues()}\nmol_assigned.SetProp(\"_Name\", next(iter(lig_resnames)))\n\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n w.write(mol_assigned)\nprint(f\"Ligand SDF -> {ligand_sdf}\")" |
84 | 38 | }, |
85 | 39 | { |
86 | 40 | "cell_type": "markdown", |
87 | | - "metadata": {}, |
88 | | - "source": [ |
89 | | - "## Step 1b: Assign ligand topology" |
90 | | - ] |
91 | | - }, |
92 | | - { |
93 | | - "cell_type": "code", |
94 | | - "execution_count": null, |
95 | | - "metadata": {}, |
96 | | - "outputs": [], |
97 | | - "source": "# Extract ligand chain\nligand_pdb = WORKDIR / \"ligand.pdb\"\npdb_io.save(str(ligand_pdb), ChainSelect(LIGAND_CHAIN_ID))\nprint(f\"Extracted ligand chain {LIGAND_CHAIN_ID} -> {ligand_pdb}\")\n\n# Validate SMILES and compute net charge\ntemplate_mol = Chem.MolFromSmiles(LIGAND_SMILES, sanitize=True)\nassert template_mol is not None, f\"Invalid SMILES: {LIGAND_SMILES}\"\nligand_net_charge = Chem.GetFormalCharge(template_mol)\nprint(f\"SMILES: {Chem.MolToSmiles(template_mol)}\")\nprint(f\"Net charge: {ligand_net_charge}\")\n\n# Assign bond orders from SMILES template\nmol = Chem.MolFromPDBFile(str(ligand_pdb), sanitize=True, removeHs=True)\nassert mol is not None, f\"RDKit failed to parse {ligand_pdb}\"\nmol_assigned = assign_topology(mol, template_mol)\n\n# Set molecule name\nmol_assigned.SetProp(\"_Name\", LIG_RESNAME)\n\n# Write SDF for antechamber\nligand_sdf = WORKDIR / \"ligand.sdf\"\nwith Chem.SDWriter(str(ligand_sdf)) as w:\n w.write(mol_assigned)\nprint(f\"Wrote topology-assigned ligand -> {ligand_sdf}\")" |
98 | | - }, |
99 | | - { |
100 | | - "cell_type": "markdown", |
101 | | - "metadata": {}, |
102 | | - "source": [ |
103 | | - "## Step 2: Assemble complex PDB for tleap" |
104 | | - ] |
105 | | - }, |
106 | | - { |
107 | | - "cell_type": "code", |
108 | | - "execution_count": null, |
109 | | - "metadata": {}, |
110 | | - "outputs": [], |
111 | | - "source": "# Re-parse fixed protein and graft the ligand chain onto it\nfixed_struct = parser.get_structure(\"fixed\", str(protein_fixed_pdb))\nfixed_model = fixed_struct[0]\n\n# Add ligand chain from original structure\nlig_chain_copy = copy.deepcopy(chains[LIGAND_CHAIN_ID])\nfixed_model.add(lig_chain_copy)\n\n# Write combined complex\ncomplex_pdb = WORKDIR / \"complex.pdb\"\npdb_io.set_structure(fixed_struct)\npdb_io.save(str(complex_pdb), ChainSelect([PROTEIN_CHAIN_ID, LIGAND_CHAIN_ID]))\nprint(f\"Assembled complex -> {complex_pdb}\")" |
112 | | - }, |
113 | | - { |
114 | | - "cell_type": "markdown", |
115 | | - "metadata": {}, |
116 | | - "source": [ |
117 | | - "## Steps 3-6: AmberTools, PQR, APBS\n", |
118 | | - "\n", |
119 | | - "Edit the constants at the top of `scripts/browndye/run_amber_apbs.sh`, then run:\n", |
120 | | - "\n", |
121 | | - "```bash\n", |
122 | | - "bash scripts/browndye/run_amber_apbs.sh\n", |
123 | | - "```\n", |
124 | | - "\n", |
125 | | - "3. AmberTools parameterization (antechamber, parmchk2, tleap) → prmtop/rst7\n", |
126 | | - "4. ParmEd convert to complex.pqr\n", |
127 | | - "5. APBS input generation using pdb2pqr inputgen\n", |
128 | | - "6. Run APBS" |
129 | | - ] |
| 41 | + "source": "## Next: run_amber_apbs.sh\n\nThe notebook produced `tmp/protein_fixed.pdb` and `tmp/ligand.sdf`. The shell script\npicks up from here -- it loads the protein and ligand **separately** into tleap\n(via `pdb4amber` and `antechamber`), combines them, and runs APBS.\n\n```bash\nconda activate ambertools\ncd examples/browndye && bash run_amber_apbs.sh\n```\n\nOutputs in `tmp/`:\n\n| File | Description |\n|------|-------------|\n| `complex.prmtop` | AMBER topology |\n| `complex.rst7` | AMBER coordinates |\n| `complex.pqr` | PQR with AM1-BCC charges and mbondi3 radii |\n| `complex.in` | APBS input (mg-auto, LPBE) |\n| `complex.dx` | Electrostatic potential map (OpenDX) |", |
| 42 | + "metadata": {} |
130 | 43 | } |
131 | 44 | ], |
132 | 45 | "metadata": { |
|
0 commit comments