Skip to content

feat: speed up first_fit in sequence packing with a segment tree#15563

Open
fangwei123456 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
fangwei123456:feat/find_first_bin_that_fits_segment_tree
Open

feat: speed up first_fit in sequence packing with a segment tree#15563
fangwei123456 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
fangwei123456:feat/find_first_bin_that_fits_segment_tree

Conversation

@fangwei123456
Copy link
Copy Markdown

@fangwei123456 fangwei123456 commented Mar 30, 2026

Speed up first_fit in sequence packing with a segment tree

What does this PR do?

Replaces the O(n) linear scan in find_first_bin_that_fits with an O(log n) segment tree, reducing the overall first_fit packing complexity from O(n^2) to O(n log n). This significantly speeds up sequence packing for large datasets.

A backend parameter is added to first_fit with two options:

  • "segment_tree" (default) — uses a segment tree for O(log n) per-query lookup
  • "naive" — uses the original O(n) linear scan

The function signature and return type remain backward-compatible. Downstream callers (first_fit_decreasing, first_fit_shuffle, create_packing_strategy, fill_packing_strategy) require no changes.

This modification is particularly crucial for processing large datasets: in our own experiments, the time required to process 50GB of data was reduced from two and a half hours to just one minute.

Changes

  • nemo/utils/sequence_packing_utils.py

    • Added _SegmentTree class: a 1-indexed flat-array segment tree that stores per-bin remaining capacity, with internal nodes tracking the max of their children. Supports open_bin, query (leftmost bin with capacity >= s), and update in O(log n).
    • Added backend parameter ("segment_tree" | "naive") to first_fit.
    • Extracted _first_fit_naive and _first_fit_segment_tree as the two backend implementations.
    • Kept find_first_bin_that_fits with a deprecation note for backward compatibility.
  • tests/utils/test_first_fit_backends.py (new)

    • 13 parametrized cases + 1 large random test (5000 sequences) verifying both backends produce identical output.
    • 1 test for invalid backend error.
    • 1 performance benchmark asserting segment tree is faster than naive.

Performance

Benchmarked on 10,000 random sequences (lengths 1–500, pack_size=1024):

Backend Time
naive 0.612s
segment_tree 0.024s
Speedup 25x

Tests

pytest tests/utils/test_first_fit_backends.py -v --noconftest

All 16 tests pass, including correctness (both backends match) and performance (segment tree > 2x faster).

Signed-off-by: Wei Fang wei.fang@miromind.ai

Signed-off-by: wei.fang <wei.fang@miromind.ai>
@fangwei123456 fangwei123456 changed the title Speed up first_fit in sequence packing with a segment tree feat: speed up first_fit in sequence packing with a segment tree Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant