Skip to content

Added capsid (CA) variant reporting for gag gene#131

Open
rbaldwin-bugseq wants to merge 7 commits into
PoonLab:masterfrom
rbaldwin-bugseq:capsid
Open

Added capsid (CA) variant reporting for gag gene#131
rbaldwin-bugseq wants to merge 7 commits into
PoonLab:masterfrom
rbaldwin-bugseq:capsid

Conversation

@rbaldwin-bugseq
Copy link
Copy Markdown

This was a response to a previous PR (#109). It was tested by taking a subtype C sequence AB254155.1 and adding an artificial triple mutant sequence containing three lenacapavir resistance mutations in the capsid region of gag.

M66I (Methionine → Isoleucine at position 66) - Major, score: 60
Q67H (Glutamine → Histidine at position 67) - Major, score: 30
K70S (Lysine → Serine at position 70) - Major, score: 30

See the attached input sequence and results
ca_results.zip

A min overlap of 23 was selected based on the size of the protein region in the gag gene (231 aa) and the fact that resistant variant are distributed through the region. The IN region (288 aa) had 30 aa overlap for ~10% coverage so the ~10% coverage for CA seemed appropriate as well.

Unresolved question: existing behavior was to add +1 for pol gene, but it seemed that gag was using correct indexing, and so I did not implement this for gag. Why do we need +1 for pol and not gag?

rbaldwin-bugseq and others added 7 commits May 22, 2026 18:05
- Add is_capsid_resistance() method to identify CA gene mutations with CAI drug class
- Add isCapsidResistance boolean flag to mutation output
- Capsid resistance mutations include Major and Accessory types for lenacapavir
- Add CA gene nucleotide coordinates (1186-1878) from Gag region
- Add gag_start (790) reference point for CA amino acid position calculations
- Rename pol_nuc_map to gene_nuc_map to support multiple gene regions
- Update create_gene_map() to handle CA's Gag reference vs Pol reference for other genes
- Add CA to min_overlap dictionary requiring 60 AA minimum coverage
- Enables processing of CA gene mutations and LEN (lenacapavir) resistance scoring

Tested: CA gene now appears in alignedGeneSequences and drugResistance sections
Documents the working CA gene support including:
- Feature overview and verified functionality
- Example JSON output format
- Unit test results showing correct mutation identification
- Next steps for cascade pipeline integration
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
CA mutations are scored using standard HIVDB resistance system.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ArtPoon
Copy link
Copy Markdown
Contributor

ArtPoon commented May 29, 2026

Thanks for contributing this PR! I am travelling between conferences right now but will read over the changes once things have settled down

@immasushirolll
Copy link
Copy Markdown

Hi @rbaldwin-bugseq! I was hoping to also help with this. Was wondering if there's anything else to add to this PR other than to answer this question: Why do we need +1 for pol and not gag?

Do you think it'd be better to investigate this after this PR is merged and if there's a fix needed it's in a new PR?

Kindly let me know when you have the time, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants