Skip to content

Pangenome index generation#6

Open
d-goryslavets wants to merge 5 commits intomainfrom
db-extension-for-annotation-reuse
Open

Pangenome index generation#6
d-goryslavets wants to merge 5 commits intomainfrom
db-extension-for-annotation-reuse

Conversation

@d-goryslavets
Copy link
Copy Markdown
Collaborator

Implemented an interface to save the CDS annotation obtained upon pipeline completion in a shareable format (JSON file). Additionally, the generated index can be reused for subsequent runs to bypass Bakta annotation and speed up the computation. The pangenome index can therefore be published and shared between users annotating large datasets of related genomes.

…y DB; don't run Bakta annotation if all proteins were found in auxiliary DB

Some hypotheticals might be filtered out during the final stage of the pipeline. To avoid computationally expensive alignment of such proteins, add all CDS features, whether they were marked as pseudogenes or not, to the auxiliary database. The proteins that overlap with RNA features are added to the auxiliary database for the same reason; Filter channel with proteins not found in auxiliary DB based on the FASTA entries number - if empty, don't run Bakta annotation processes.
@d-goryslavets d-goryslavets added the enhancement New feature or request label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant