Skip to content

Add support for N-terminal, protonated, disulfide-bonded cysteine#2194

Merged
j-wags merged 6 commits into
openforcefield:fix-nterm-disulfide-cysfrom
joelaforet:fix-nterm-disulfide-cys
Jun 8, 2026
Merged

Add support for N-terminal, protonated, disulfide-bonded cysteine#2194
j-wags merged 6 commits into
openforcefield:fix-nterm-disulfide-cysfrom
joelaforet:fix-nterm-disulfide-cys

Conversation

@joelaforet

@joelaforet joelaforet commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Fixes #2191.

This PR adds support for loading N-terminal, protonated, disulfide-bonded cysteine residues from PDB files.

Specifically, this covers a CYS residue with:

  • N-terminal NH3+
  • neutral degree-2 side-chain SG
  • no HG
  • external SG-SG disulfide connectivity

Changes made:

  • Adds missing CYS substructures in utilities/make_substructure_dict/_cif_to_substructure_dict.py.
  • Regenerates:
    • aa_residues_substructures_with_caps.json
    • aa_residues_substructures_explicit_bond_orders_with_caps.json
    • aa_residues_substructures_explicit_bond_orders_with_caps_explicit_connectivity.json
  • Adds a minimal regression PDB fixture for this chemistry.
  • Adds a regression test covering ingestion with Molecule.from_polymer_pdb.
  • Updates docs/releasehistory.md.
  • Updates the substructure-generator CIF download URL from ftp.wwpdb.org to files.wwpdb.org, since ftp.wwpdb.org did not resolve locally but the same wwPDB file was available from files.wwpdb.org.

Validation run locally:

mamba run -n openff-toolkit-ncy pytest openff/toolkit/_tests/test_molecule.py -k "cys_disulfide or mainchain_cyx_dipeptide or mainchain_cys_dipeptide or mainchain_cym_dipeptide"

Result:
4 passed
Additional checks:
The regenerated explicit-connectivity SMARTS library validates successfully.
The motivating protein, PDB=4CHA, now loads with Topology.from_pdb, giving 3519 atoms and 1 molecule.

@j-wags j-wags left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joelaforet! This looks quite good to me, but while this is just a fork the tests won't run. I'm going to retarget this PR to put the changes from your fork into a branch, then merge it, then I'll open a second PR to merge the changes from that branch into main (which will actually run the tests).

Sorry for the runaround in this process, the tests-fail-on-forks issue is downstream of us needing to test against OpenEye.

import requests

r = requests.get("https://ftp.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif")
r = requests.get("https://files.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) Oh nice, good catch!

@j-wags j-wags merged commit 71d4f4d into openforcefield:fix-nterm-disulfide-cys Jun 8, 2026
5 of 22 checks passed
@joelaforet

Copy link
Copy Markdown
Contributor Author

Thank you @j-wags !

j-wags added a commit that referenced this pull request Jun 8, 2026
) (#2195)

* fix(proteins): generate n-terminal disulfide cysteine templates

* chore(proteins): update cysteine substructure data

* test(proteins): cover n-terminal cysteine disulfide ingestion

* docs: note n-terminal cysteine disulfide support

* fix(utils): use reachable wwPDB CIF endpoint

* chore(proteins): align cysteine data with full regeneration

Co-authored-by: Joe Laforet Jr. <76799675+joelaforet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding parameters for CYS residues that are N-terminal, protonated, and participate in a disulfide bridge

2 participants