Pangenome index generation by d-goryslavets · Pull Request #6 · sysbio-vo/pannotator

d-goryslavets · 2026-04-04T08:47:21Z

Implemented an interface to save the CDS annotation obtained upon pipeline completion in a shareable format (JSON file). Additionally, the generated index can be reused for subsequent runs to bypass Bakta annotation and speed up the computation. The pangenome index can therefore be published and shared between users annotating large datasets of related genomes.

…eins

…y DB; don't run Bakta annotation if all proteins were found in auxiliary DB Some hypotheticals might be filtered out during the final stage of the pipeline. To avoid computationally expensive alignment of such proteins, add all CDS features, whether they were marked as pseudogenes or not, to the auxiliary database. The proteins that overlap with RNA features are added to the auxiliary database for the same reason; Filter channel with proteins not found in auxiliary DB based on the FASTA entries number - if empty, don't run Bakta annotation processes.

d-goryslavets added 5 commits March 23, 2026 13:46

add skeleton for auxiliary database usage

089c5d0

add auxiliary database support for storing and reusing annotated prot…

64dc542

…eins

fix and refactor auxiliary database .py and .nf scripts

7a020e9

add extend_auxdb parameter

6d39df3

d-goryslavets added the enhancement New feature or request label Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pangenome index generation#6

Pangenome index generation#6
d-goryslavets wants to merge 5 commits intomainfrom
db-extension-for-annotation-reuse

d-goryslavets commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

d-goryslavets commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant