Need MPI Chunking to  Support Large K Values

During the generation of sequence to k-mers matrix, **A**, each process keeps track of unique k-mers local to it. In order to generate **S** from these, processes have to communicate these to figure out the global set of unique k-mers.

The way this is done in the code is as follows.
1. Every process creates a boolean array of the size |Alph|k, where |Alph| is the size of the alphabet. For proteins, it's 25k.
2. This array serves as the process-local unique k-mer ID list.
3. Once, each process has found its list of k-mers, it participates in an `MPI_Allreduction` using `MPI_LOR`. 
4. This results in a globally unique k-mer list.

Currently, the code relies on a single `MPI_Allreduce`, which limits the size of the boolean array to be less than 231. This works for `k=6` with proteins but will fail for anything above that for proteins as the alphabet size is 25. 

The solution to this would be to use multiple `MPI_Allreduce` calls over parts of the boolean array.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need MPI Chunking to Support Large K Values #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Need MPI Chunking to Support Large K Values #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions