perf!: delta-aware BnbMetric via SelectionView/compute_view#53
Draft
evanlinjin wants to merge 4 commits into
Draft
perf!: delta-aware BnbMetric via SelectionView/compute_view#53evanlinjin wants to merge 4 commits into
evanlinjin wants to merge 4 commits into
Conversation
e94e9a1 to
89b4255
Compare
Squash of the two-commit perf series (cd1017a "delta-aware BnbMetric via SelectionView/SelectionCache" + 536a03a "replace duplicated CoinSelector read methods with compute_view()"), rebased onto the new BnbMetric API (target passed as a parameter; metric decides its own change output). Squashed because the first commit's intermediate state (src/selection_cache.rs, SelectionView-by-value) is fully superseded by the second (src/selection_view.rs, Cow-backed cache, &SelectionView, compute_view()); replaying both would mean resolving the same BnbMetric merge against master twice. The flamegraph on fix/better-memory showed ~65% of run_bnb_lowest_fee time was in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) -- both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain. This makes the metric evaluator delta-aware. BnB maintains a SelectionCache (running aggregates over the selection: value_sum, weight_sum, input_count, segwit_count, candidate_count) per Branch; inclusion expansions call cache.add(c), which is O(1). The metric trait now takes a `&SelectionView<'_>` -- a handle over (&CoinSelector, Cow<SelectionCache>) -- with O(1) versions of every read method (selected_value, input_weight, excess, is_target_met, drain, ...). CoinSelector's ~15 duplicated O(|selected|) read methods collapse into a single `cs.compute_view()` entry point (fresh cache built once on demand); the BnB hot path borrows the per-branch cache instead (zero clone). Adapted to the master-side API changes: - score/bound/drain take `target: Target` as a parameter (metrics no longer store target); BnbIter threads its `target` through to the metric. - LowestFee owns the change decision (dust_relay_feerate + drain_weights, no change_policy) and implements `drain`; its delta-aware not-target-met bound loop is preserved. - Changeless<M> wraps an inner metric, using &SelectionView + target. All lib, integration and doc tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The original benches capped at 4k candidates for `clone` and 200 for `run_bnb_lowest_fee`. Real callers span a much wider range -- a typical wallet has ~1k UTXOs, a large exchange ~10M. For the O(n)-ish operations (`new`, `clone`, `compute_view`) extend the parameter list to 64 / 1k / 16k / 256k / 1M / 10M. These all scale roughly linearly: clone/64 51 ns clone/1024 52 ns clone/16384 260 ns clone/262144 2.0 us clone/1048576 6.3 us clone/10000000 98 us At 10M UTXOs the `Candidate` slice itself is ~320MB and the selector's `candidate_order` Vec is ~80MB -- commented as a heads-up for memory- constrained hosts. Add new groups: - `new`: cost of `CoinSelector::new(candidates)` -- allocations grow with pool size. - `compute_view`: cost of building a SelectionView. Scales with |selected| rather than |pool|; benched against a fixed sparse selection of ~100 candidates regardless of pool size, matching how wallets actually use selection. For `run_bnb_lowest_fee` add n=500 and n=1000. BnB's search space is exponential but `max_rounds=100k` caps per-sample work, so these end up measuring "first 100k rounds" -- per-round cost grows roughly linearly: run_bnb_lowest_fee/200 193 ms run_bnb_lowest_fee/500 211 ms run_bnb_lowest_fee/1000 377 ms 10M-scale BnB is intentionally not benchmarked: it's impractical at any finite round budget, and real callers pre-filter / pre-group at that scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flamegraph after the delta-aware refactor showed ~32% of run_bnb time spent walking candidate_order and checking Bitset::contains(selected) || Bitset::contains(banned) per element, inlined into insert_new_branches's `cs.unselected().next()`. As BnB descends and more candidates get selected/banned, each .next() call scans further before finding the next viable candidate. But BnB never re-considers a position: each branch's exploration only moves forward in candidate_order. Inclusion advances by 1; exclusion advances past every consecutive same-(value, weight) candidate. So we can store a per-Branch cursor and avoid the scan entirely. Add `Branch::cursor: usize` (the position the branch will expand on next). The init branch starts at 0; insert_new_branches advances past any pre-selected/pre-banned positions on demand, then expands at the located cursor and hands children their new cursors directly. Bench (run_bnb_lowest_fee, n = pool size): n=20 166 us -> 147 us (11%) n=50 5.3 ms -> 4.5 ms (16%) n=100 12.6 ms -> 11.5 ms (9%) n=500 200 ms -> 171 ms (14%) n=1000 365 ms -> 248 ms (32%) Largest win at large n where the unselected scan was burning the most time. Flamegraph confirms the unselected-scan hot spot is gone; new top is LowestFee::bound itself (the metric's float math + lookahead). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three rustdoc warnings: - `SelectionView` referenced the pub(crate) `crate::bnb::BnbIter` in prose. Reworded to describe the behaviour without naming the internal type. - `Drain` linked to `CoinSelector::drain`, which moved to `SelectionView::drain`. Retargeted. - `metrics` module docs used `[CoinSelector::run_bnb]` / `bnb_solutions` with no `use` in module scope. Switched to fully-qualified `crate::CoinSelector::...` paths. Also picks up rustfmt reflows across the changed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
89b4255 to
ec6a713
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the branch-and-bound metric evaluator delta-aware, eliminating the dominant cost in
run_bnb. A flamegraph showed ~65% ofrun_bnb_lowest_feetime was spent incs.selected_value()(47.5%) andcs.input_weight()(17.3%) — bothO(|selected|)walks recomputed many times per branch viaexcess/is_target_met/drain.Now BnB maintains a per-branch
SelectionCache(running aggregates:value_sum,weight_sum,input_count,segwit_count,candidate_count). Inclusion expansions callcache.add(c)—O(1). Metrics receive a&SelectionView<'_>(a handle over&CoinSelector+Cow<SelectionCache>) whose read methods (selected_value,input_weight,excess,is_target_met,drain, …) are allO(1).Outside the hot path,
cs.compute_view()builds a fresh cache once on demand, collapsing the ~15 duplicatedO(|selected|)read methods that previously lived onCoinSelectorinto a single entry point.Benchmarks
Criterion median; delta-aware vs the bitset baseline, vs the pre-bitset master:
The gain is largest at large
n, whereselected_value/input_weightrecomputation dominated.Commits
perf!: delta-aware BnbMetric via SelectionView/compute_view— the core change. IntroducesSelectionCache/SelectionView, moves the read methods offCoinSelectorbehindcompute_view(), and makes theBnbMetrictrait take&SelectionView. Adapted to the current trait API (targetpassed as a parameter; metric owns its change decision viadrain();Changeless<M>).bench: extend pool sizes to wallet (1k) and exchange (10M) scaleperf: skip unselected scan in BnbIter via per-branch cursor— eachBranchcarries a cursor intocandidate_order, soinsert_new_branchesjumps straight to the next undecided candidate instead of re-scanningselected/banned.docs: fix unresolved rustdoc links + rustfmt reflowAPI changes (breaking)
SelectionView<'_>;SelectionCacheis internal.BnbMetric::{score,bound,drain}now take&SelectionView<'_>instead of&CoinSelector<'_>.selected_value,input_weight,excess,is_target_met,drain,fee, …) move fromCoinSelectortoSelectionView; reach them viacs.compute_view().Testing
cargo fmt --check,cargo clippy --all-targets,RUSTDOCFLAGS="-D warnings" cargo doc --no-deps, and the full suite (33 lib+integration tests + 2 doctests) all pass. A proptest insrc/selection_view.rschecks the cache invariant (cache.value_sum == cs.compute_view().selected_value()) against a from-scratch rebuild under random insert/deselect sequences.🤖 Generated with Claude Code