Skip to content

Load Imbalance in Substitute Matrix #1

@esaliya

Description

@esaliya

The substitute matrix, S, shows a high load imbalance. Fixing this may require keeping a randomized mapping of k-mers to k-mer IDs.

See the email thread About Large Runs on 12/29/2019. Here's a logfile from a run that shows this effect and also fails.

Process Grid (p x p x t): 68 x 68 x 2

INFO: Program started on Sat Dec 28 20:04:12 2019

INFO: Job ID knl_fa_shuff_subs25/knl_fa_shuff_subs25_c61ed871-1547-4285-8082-f05137282334
Parameters...
  Input file (-i):                     /global/cscratch1/sd/esaliya/data/isolates/archaea/sanitized_2728834_impure_2729008_len_lte_2000_in_shuffled_isolates_proteins_archaea.fasta
  Original sequence count (-c):        2728834
  Kmer length (k):                     6
  Kmer stride (s):                     1
  Overlap in bytes (-O):               10000
  Max seed count (--sc):               1
  Gap open penalty (-g):               -11
  Gap extension penalty (-e):          -2
  Overlap file (--of):                 None
  Alignment file (--af):               knl_fa_shuff_subs25/knl_fa_shuff_subs25_align.txt
  Alignment write frequency (--afreq): 100000
  No align (--na):                     False
  Full align (--fa):                   True
  Xdrop align (--xa):                  False
  Banded align (--ba):                 False
  Index map (--idxmap):                knl_fa_shuff_subs25_archaea_idx_map.txt
  Alphabet (--alph):                   0
  Use substitute kmers (--subs):       True | sub kmers: 25
Creating fileknl_fa_shuff_subs25_archaea_idx_map.txt with 41438932 bytes
File knl_fa_shuff_subs25_archaea_idx_map.txt is actually 41438932 bytes seen from process 4623

INFO: Modfied sequence count
  Final sequence count: 2728822 (0.000440% removed)
Matrix A: 
Load imbalance: 3.118424
As a whole: 2728822 rows and 244140625 columns and 718716196 nonzeros
Matrix At: As a whole: 244140625 rows and 2728822 columns and 718716196 nonzeros
Matrix S: 
Load imbalance: 113.142021
As a whole: 244140625 rows and 244140625 columns and 723834658 nonzeros
Matrix AS: 
Load imbalance: 2.567925
As a whole: 2728822 rows and 244140625 columns and 10751320837 nonzeros
terminate called after throwing an instance of 'std::bad_alloc'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions