Skip to content

perf!: delta-aware BnbMetric via SelectionView/compute_view#53

Draft
evanlinjin wants to merge 4 commits into
bitcoindevkit:masterfrom
evanlinjin:perf/delta-aware-metric
Draft

perf!: delta-aware BnbMetric via SelectionView/compute_view#53
evanlinjin wants to merge 4 commits into
bitcoindevkit:masterfrom
evanlinjin:perf/delta-aware-metric

Conversation

@evanlinjin

Copy link
Copy Markdown
Member

Summary

Makes the branch-and-bound metric evaluator delta-aware, eliminating the dominant cost in run_bnb. A flamegraph showed ~65% of run_bnb_lowest_fee time was spent in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) — both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain.

Now BnB maintains a per-branch SelectionCache (running aggregates: value_sum, weight_sum, input_count, segwit_count, candidate_count). Inclusion expansions call cache.add(c)O(1). Metrics receive a &SelectionView<'_> (a handle over &CoinSelector + Cow<SelectionCache>) whose read methods (selected_value, input_weight, excess, is_target_met, drain, …) are all O(1).

Outside the hot path, cs.compute_view() builds a fresh cache once on demand, collapsing the ~15 duplicated O(|selected|) read methods that previously lived on CoinSelector into a single entry point.

Benchmarks

Criterion median; delta-aware vs the bitset baseline, vs the pre-bitset master:

bench master bitset delta-aware total speedup
clone/4096 5.68 µs 58 ns 52 ns 109×
run_bnb_lowest_fee/20 470 µs 417 µs 158 µs 3.0×
run_bnb_lowest_fee/50 26.0 ms 10.3 ms 5.4 ms 4.8×
run_bnb_lowest_fee/100 73.2 ms 40.2 ms 14.2 ms 5.2×
run_bnb_lowest_fee/200 1.27 s 1.10 s 196 ms 6.5×

The gain is largest at large n, where selected_value/input_weight recomputation dominated.

Commits

  1. perf!: delta-aware BnbMetric via SelectionView/compute_view — the core change. Introduces SelectionCache/SelectionView, moves the read methods off CoinSelector behind compute_view(), and makes the BnbMetric trait take &SelectionView. Adapted to the current trait API (target passed as a parameter; metric owns its change decision via drain(); Changeless<M>).
  2. bench: extend pool sizes to wallet (1k) and exchange (10M) scale
  3. perf: skip unselected scan in BnbIter via per-branch cursor — each Branch carries a cursor into candidate_order, so insert_new_branches jumps straight to the next undecided candidate instead of re-scanning selected/banned.
  4. docs: fix unresolved rustdoc links + rustfmt reflow

API changes (breaking)

  • New public SelectionView<'_>; SelectionCache is internal.
  • BnbMetric::{score,bound,drain} now take &SelectionView<'_> instead of &CoinSelector<'_>.
  • The ~15 read methods (selected_value, input_weight, excess, is_target_met, drain, fee, …) move from CoinSelector to SelectionView; reach them via cs.compute_view().

Testing

cargo fmt --check, cargo clippy --all-targets, RUSTDOCFLAGS="-D warnings" cargo doc --no-deps, and the full suite (33 lib+integration tests + 2 doctests) all pass. A proptest in src/selection_view.rs checks the cache invariant (cache.value_sum == cs.compute_view().selected_value()) against a from-scratch rebuild under random insert/deselect sequences.

🤖 Generated with Claude Code

@evanlinjin evanlinjin force-pushed the perf/delta-aware-metric branch 2 times, most recently from e94e9a1 to 89b4255 Compare July 2, 2026 17:16
evanlinjin and others added 4 commits July 2, 2026 17:21
Squash of the two-commit perf series (cd1017a "delta-aware BnbMetric via
SelectionView/SelectionCache" + 536a03a "replace duplicated CoinSelector
read methods with compute_view()"), rebased onto the new BnbMetric API
(target passed as a parameter; metric decides its own change output).

Squashed because the first commit's intermediate state (src/selection_cache.rs,
SelectionView-by-value) is fully superseded by the second (src/selection_view.rs,
Cow-backed cache, &SelectionView, compute_view()); replaying both would mean
resolving the same BnbMetric merge against master twice.

The flamegraph on fix/better-memory showed ~65% of run_bnb_lowest_fee time
was in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) -- both
O(|selected|) walks recomputed many times per branch via
excess/is_target_met/drain.

This makes the metric evaluator delta-aware. BnB maintains a SelectionCache
(running aggregates over the selection: value_sum, weight_sum, input_count,
segwit_count, candidate_count) per Branch; inclusion expansions call
cache.add(c), which is O(1). The metric trait now takes a
`&SelectionView<'_>` -- a handle over (&CoinSelector, Cow<SelectionCache>)
-- with O(1) versions of every read method (selected_value, input_weight,
excess, is_target_met, drain, ...).

CoinSelector's ~15 duplicated O(|selected|) read methods collapse into a
single `cs.compute_view()` entry point (fresh cache built once on demand);
the BnB hot path borrows the per-branch cache instead (zero clone).

Adapted to the master-side API changes:
- score/bound/drain take `target: Target` as a parameter (metrics no longer
  store target); BnbIter threads its `target` through to the metric.
- LowestFee owns the change decision (dust_relay_feerate + drain_weights,
  no change_policy) and implements `drain`; its delta-aware not-target-met
  bound loop is preserved.
- Changeless<M> wraps an inner metric, using &SelectionView + target.

All lib, integration and doc tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The original benches capped at 4k candidates for `clone` and 200 for
`run_bnb_lowest_fee`. Real callers span a much wider range -- a typical
wallet has ~1k UTXOs, a large exchange ~10M.

For the O(n)-ish operations (`new`, `clone`, `compute_view`) extend the
parameter list to 64 / 1k / 16k / 256k / 1M / 10M. These all scale
roughly linearly:

  clone/64        51 ns
  clone/1024      52 ns
  clone/16384    260 ns
  clone/262144   2.0 us
  clone/1048576  6.3 us
  clone/10000000  98 us

At 10M UTXOs the `Candidate` slice itself is ~320MB and the selector's
`candidate_order` Vec is ~80MB -- commented as a heads-up for memory-
constrained hosts.

Add new groups:

- `new`: cost of `CoinSelector::new(candidates)` -- allocations grow with
  pool size.
- `compute_view`: cost of building a SelectionView. Scales with
  |selected| rather than |pool|; benched against a fixed sparse
  selection of ~100 candidates regardless of pool size, matching how
  wallets actually use selection.

For `run_bnb_lowest_fee` add n=500 and n=1000. BnB's search space is
exponential but `max_rounds=100k` caps per-sample work, so these end up
measuring "first 100k rounds" -- per-round cost grows roughly linearly:

  run_bnb_lowest_fee/200   193 ms
  run_bnb_lowest_fee/500   211 ms
  run_bnb_lowest_fee/1000  377 ms

10M-scale BnB is intentionally not benchmarked: it's impractical at any
finite round budget, and real callers pre-filter / pre-group at that
scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flamegraph after the delta-aware refactor showed ~32% of run_bnb
time spent walking candidate_order and checking
Bitset::contains(selected) || Bitset::contains(banned) per element,
inlined into insert_new_branches's `cs.unselected().next()`. As BnB
descends and more candidates get selected/banned, each .next() call
scans further before finding the next viable candidate.

But BnB never re-considers a position: each branch's exploration only
moves forward in candidate_order. Inclusion advances by 1; exclusion
advances past every consecutive same-(value, weight) candidate. So we
can store a per-Branch cursor and avoid the scan entirely.

Add `Branch::cursor: usize` (the position the branch will expand on
next). The init branch starts at 0; insert_new_branches advances past
any pre-selected/pre-banned positions on demand, then expands at the
located cursor and hands children their new cursors directly.

Bench (run_bnb_lowest_fee, n = pool size):

  n=20     166 us -> 147 us  (11%)
  n=50    5.3 ms -> 4.5 ms   (16%)
  n=100   12.6 ms -> 11.5 ms  (9%)
  n=500   200 ms -> 171 ms   (14%)
  n=1000  365 ms -> 248 ms   (32%)

Largest win at large n where the unselected scan was burning the most
time. Flamegraph confirms the unselected-scan hot spot is gone; new top
is LowestFee::bound itself (the metric's float math + lookahead).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three rustdoc warnings:
- `SelectionView` referenced the pub(crate) `crate::bnb::BnbIter` in
  prose. Reworded to describe the behaviour without naming the internal
  type.
- `Drain` linked to `CoinSelector::drain`, which moved to
  `SelectionView::drain`. Retargeted.
- `metrics` module docs used `[CoinSelector::run_bnb]` / `bnb_solutions`
  with no `use` in module scope. Switched to fully-qualified
  `crate::CoinSelector::...` paths.

Also picks up rustfmt reflows across the changed files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@evanlinjin evanlinjin force-pushed the perf/delta-aware-metric branch from 89b4255 to ec6a713 Compare July 2, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant