DDP and batch loss weighting are likely biased by local valid-residue counts

Each rank computes a local masked mean BCE loss, then DDP averages gradients across ranks. This is not equivalent to a global valid-residue-weighted loss when ranks or batches have different valid residue counts.

Evidence:

- `src/pepseqpred/core/train/trainer.py`
  - Loss is `(loss_raw * mask).sum() / mask.sum()` per local batch.
  - DDP then averages gradients across ranks equally.
- `src/pepseqpred/apps/train_ffnn_cli.py`
  - IDs are partitioned across ranks by estimated embedding file size, not by valid positive/negative residue count.

Why this can hurt:

- A rank with few valid residues can contribute the same gradient weight as a rank with many valid residues.
- Long proteins, label sparsity, and source/pathogen-specific label density can make this much worse.
- Multi-pathogen data is especially vulnerable if some groups have sparse labels or many uncertain residues.

Planning direction:

- Log valid residue count per batch and rank.
- Consider globally normalized loss using summed numerator and denominator across ranks.
- Alternatively ensure per-rank partitioning balances valid residues and positive residues, not just embedding file size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP and batch loss weighting are likely biased by local valid-residue counts #79

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DDP and batch loss weighting are likely biased by local valid-residue counts #79

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions