Skip to content

Need MPI Chunking to Support Large K Values #4

@esaliya

Description

@esaliya

During the generation of sequence to k-mers matrix, A, each process keeps track of unique k-mers local to it. In order to generate S from these, processes have to communicate these to figure out the global set of unique k-mers.

The way this is done in the code is as follows.

  1. Every process creates a boolean array of the size |Alph|k, where |Alph| is the size of the alphabet. For proteins, it's 25k.
  2. This array serves as the process-local unique k-mer ID list.
  3. Once, each process has found its list of k-mers, it participates in an MPI_Allreduction using MPI_LOR.
  4. This results in a globally unique k-mer list.

Currently, the code relies on a single MPI_Allreduce, which limits the size of the boolean array to be less than 231. This works for k=6 with proteins but will fail for anything above that for proteins as the alphabet size is 25.

The solution to this would be to use multiple MPI_Allreduce calls over parts of the boolean array.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions