Skip to content

ryanjosephkamp/the-lock-and-key

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Lock and Key — Rigid-Body Molecular Docking Engine

Week 21, Project 1 · Biophysics Portfolio · CS Research Self-Study

A physics-based rigid-body molecular docking engine that predicts how small-molecule drug candidates bind to protein targets. Implements a five-term scoring function calibrated against AutoDock4, Monte Carlo simulated annealing with sigmoidal cooling and multi-restart sampling, soft-core van der Waals potentials, binding-pocket carving with complementary pocket-lining atoms, binding-site-only scoring, redocking validation across six benchmark systems, and interactive Streamlit visualization.


Overview

Feature Description
Scoring Function AutoDock4-calibrated 5-term physics-based scorer
van der Waals Soft-core Lennard-Jones with Bondi radii + AMBER ff99 ε
Electrostatics Coulomb with distance-dependent dielectric (ε = 4r)
Desolvation Eisenberg-McLachlan solvation model
Hydrogen Bonds 10-12 potential for directional H-bonds
Torsional Penalty Entropy cost per rotatable bond
Monte Carlo Sigmoidal annealing with multi-restart Metropolis sampling
Grid Search Systematic translational + rotational enumeration
Pocket Modeling Binding-pocket carving with complementary lining atoms
Scoring Filter Ligand-centric proximity filter eliminates distant noise
Redocking RMSD < 2.0 Å validation against native poses
Preset Systems 6 well-characterized protein-ligand complexes

Benchmark Systems

System Protein Ligand Rotatable Bonds Difficulty
Trypsin-Benzamidine Serine protease Small inhibitor 1 Easy
HIV Protease-Indinavir Retroviral protease Peptidomimetic 7 Medium
CDK2-Staurosporine Kinase Natural product 2 Easy
Thrombin-PPACK Serine protease Tripeptide 5 Medium
Lysozyme-NAG3 Glycoside hydrolase Trisaccharide 4 Medium
Carbonic Anhydrase-Dorzolamide Metalloenzyme Sulfonamide 4 Hard

Key Results

  • Physics-based scoring with five calibrated energy terms following AutoDock4 conventions
  • Pocket carving with complementary lining atoms for realistic binding-site geometry
  • Soft-core van der Waals potential eliminates singularities and improves sampling through tight pockets
  • Sigmoidal cooling schedule maintains exploration temperature before rapid convergence
  • Multi-restart Monte Carlo with 8 independent trajectories for diverse pose generation
  • Ligand-centric scoring filter evaluates only protein atoms near the current ligand pose, reducing noise and computation
  • Optimized grid search with pre-computed rotation matrices, donor-acceptor masks, and vectorized pose analysis
  • Dual sampling strategies — Monte Carlo simulated annealing and grid-based search
  • Redocking validation across six diverse protein-ligand benchmark systems
  • Interactive exploration of score landscapes, energy decomposition, and binding site geometry
  • Scoring function lab — adjust weights in real time, see how scoring affects pose ranking

Project Structure

week_21_project_1/
├── app.py                          # Streamlit dashboard (8 pages)
├── main.py                         # CLI entry point (4 modes)
├── requirements.txt
├── .gitignore
├── README.md
├── week_21_project_1_outline.md    # Project specification
├── src/
│   ├── __init__.py                 # Package re-exports
│   ├── docking_engine.py           # Scoring, sampling, presets (~2080 lines)
│   ├── analysis.py                 # Analysis pipelines (~830 lines)
│   └── visualization.py           # Plotly + Matplotlib renderers (~1150 lines)
├── tests/
│   └── test_lock_and_key.py        # 26 test classes, 118 tests (~1120 lines)
├── docs/
│   ├── scientific_report.md
│   ├── w21p1_lock_and_key_ieee.tex
│   └── w21p1_lock_and_key_ieee.pdf
└── figures/                        # Generated, gitignored

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run the CLI

python main.py                                    # Default docking
python main.py --dock --system "CDK2-Staurosporine" --save
python main.py --redock --steps 500
python main.py --compare --save
python main.py --gallery --verbose

3. Launch the Streamlit Dashboard

streamlit run app.py

4. Run Tests

pytest tests/ -v

Theory — Molecular Docking in Brief

The docking score approximates the binding free energy:

$$S = w_{\text{vdW}} V_{\text{LJ}}^{\text{sc}} + w_{\text{elec}} V_{\text{elec}} + w_{\text{desolv}} V_{\text{desolv}} + w_{\text{hbond}} V_{\text{hbond}} + V_{\text{torsion}}$$

where $V_{\text{LJ}}^{\text{sc}}$ is a soft-core Lennard-Jones potential that avoids singularities at short distances, $V_{\text{elec}}$ is a screened Coulomb interaction with a 1.5 Å distance clamp to prevent singularities, $V_{\text{desolv}}$ models the cost of burying hydrated surfaces, $V_{\text{hbond}}$ captures directional hydrogen bonds, and $V_{\text{torsion}}$ penalizes ligand flexibility. Calibrated weights from Morris et al. (2009). Monte Carlo sampling uses a sigmoidal cooling schedule $T(t) = T_0 / (1 + e^{k(t - t_m)})$ with 8 independent restarts for diverse pose generation. Grid-based search systematically evaluates all translational and rotational combinations, with pre-computed rotation matrices and vectorized donor-acceptor mask operations for efficient pose analysis. A ligand-centric scoring filter restricts evaluation to protein atoms within 6.0 Å of the nearest ligand atom, and pocket-lining charges are dampened to 20% complementary strength to prevent electrostatic dominance over van der Waals equilibrium.


CLI Options

Flag Default Description
--dock Run docking (default mode)
--redock Redocking challenge with RMSD validation
--compare Cross-system comparison
--gallery Show all preset systems
--system Trypsin-Benzamidine Preset system name
--method monte_carlo Sampling method (monte_carlo / grid)
--steps 300 Number of sampling steps
--temperature 300.0 MC temperature (K)
--seed 42 Random seed
--save Save figures to figures/
--verbose Show additional output

Streamlit Dashboard Pages

  1. 🏠 Home — Project overview, scoring function table, system gallery
  2. 🔑 Dock a Ligand — Run docking with configurable parameters, 3D pose overlay
  3. 🎯 Redocking Challenge — Validate native pose recovery (RMSD < 2.0 Å)
  4. 📊 Score Explorer — Score-vs-rank, convergence, contact analysis, heatmaps
  5. 🏗️ Binding Site Inspector — Pocket geometry, residue composition, 3D map
  6. ⚖️ Cross-System Comparison — Redocking performance across benchmarks
  7. 🧪 Scoring Function Lab — Energy decomposition, custom weight explorer
  8. 📚 Theory & Mathematics — Mathematical foundations with LaTeX equations

Scoring Function Weights

Component Weight Physics
van der Waals 0.1662 Shape complementarity (Lennard-Jones 6-12)
Electrostatic 0.1209 Charge-charge interactions (Coulomb)
Desolvation 0.1406 Solvation penalty (Eisenberg-McLachlan)
Hydrogen Bond 0.0585 Directional H-bonds (10-12 potential)
Torsion 0.2983 Conformational entropy per rotatable bond

Testing

Comprehensive test suite with 26 test classes and 118 test methods covering:

  • Atom/Molecule dataclass construction and properties
  • Scoring functions — vdW (hard-core + soft-core), electrostatic, desolvation, H-bond energy
  • Pose scoring — total score and breakdown with binding-site filtering
  • RMSD calculation — identity, known values, multi-atom
  • Transforms — identity, translation, rotation distance preservation
  • Sampling — Monte Carlo (sigmoidal + linear cooling, multi-restart) and grid search
  • Pocket carving — clearance filtering, lining atom placement, atom count reduction
  • Preset builders — all 6 systems build correctly with pocket carving
  • Analysis pipelines — docking, redocking, scoring, comparison, binding site
  • Plotly renderer — all 11 static methods return go.Figure
  • Matplotlib renderer — all 8 static methods return plt.Figure
  • CLI parsing — all modes and flags
  • Integration — full build → dock → analyze → visualize pipeline
  • Edge cases — empty inputs, unknown systems, boundary values, soft-core potentials

Dependencies

  • numpy>=1.24
  • scipy>=1.10
  • matplotlib>=3.7
  • plotly>=5.14
  • streamlit>=1.28
  • pandas>=2.0
  • pytest>=7.3

Author

Ryan Kamp Department of Computer Science, University of Cincinnati kamprj@mail.uc.edu | GitHub

About

Week 21 Project 1: Physics-based rigid-body molecular docking engine with an AutoDock4-calibrated five-term scoring function, soft-core Lennard-Jones potentials, binding-pocket carving, Monte Carlo simulated annealing with sigmoidal cooling, and interactive Streamlit visualization; validates pose prediction across 6 protein-ligand benchmark systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages