feat(chain)!: taint-aware CanonicalView::balance via canonical ancestor walk#2235
feat(chain)!: taint-aware CanonicalView::balance via canonical ancestor walk#2235evanlinjin wants to merge 6 commits into
CanonicalView::balance via canonical ancestor walk#2235Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2235 +/- ##
==========================================
+ Coverage 78.65% 78.86% +0.20%
==========================================
Files 30 31 +1
Lines 5909 6076 +167
Branches 279 285 +6
==========================================
+ Hits 4648 4792 +144
- Misses 1185 1209 +24
+ Partials 76 75 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
1fb7787 to
0a9fa74
Compare
Add `Canonical::ancestors`, returning a `CanonicalAncestors` iterator that walks the canonical ancestors of a set of seed transactions. - Ancestors are yielded in reverse topological order (descendants before the ancestors they spend) and each transaction is visited exactly once. - Accumulation is a fold over the ancestor DAG: `map` computes each tx's own contribution from its `CanonicalTx`, and a tx's final accumulator is its contribution `Merge`d with the accumulators of every in-set descendant. - `should_walk` prunes the walk per-tx, with the chain position available so callers can decide cutoffs (e.g. stop at confirmed). Seeds are given as txids (looked up in the canonical set) and seed the accumulation without being yielded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a variant of `Canonical::ancestors` that also yields the seed transactions (each with its own final accumulator), not just their ancestors. Pruning, deduplication and the reverse-topological order are unchanged; the seeds are emitted alongside the walked ancestors. Useful when the per-seed contribution must be folded uniformly with the ancestry (e.g. a seed that taints itself), avoiding a separate pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace `balance`'s `trust_predicate` and `min_confirmations` parameters with two closures: * `does_taint(CanonicalTx) -> bool` decides whether a transaction taints its descendants. A pending output is `untrusted_pending` if it, or any of its unsettled ancestors, taints; otherwise `trusted_pending`. The unsettled ancestry of each pending output is walked via `ancestors_inclusive` (deduped across outputs, stopping at settled transactions). This lets callers demote unconfirmed coins received from (or chained on top of) a third party. * `is_settled(&ChainPosition) -> bool` decides the confirmed/pending boundary, generalizing the old numeric `min_confirmations` (e.g. it can treat shallowly-confirmed outputs as pending and taintable). `does_taint` replaces the per-spk `trust_predicate`: trust is now derived from ancestry rather than a per-script heuristic. Call sites and tests are updated; the old per-keychain trust test is recomputed for the new model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The balance "confirmed" bucket is now driven by the caller-supplied `is_settled` predicate (transactions we are confident will not be replaced), not strictly by confirmation status. Rename the field to match the concept and update `Display`, `Add`, `total`, `trusted_spendable` and all call sites. This is a breaking change: the serde key changes from "confirmed" to "settled" (no alias). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0a9fa74 to
7737e30
Compare
Add `CanonicalView::classify_outpoints`, which pairs each unspent output with a chain-level `Eligibility` (`Settled` / `Immature` / `TrustedPending` / `UntrustedPending`) computed from `is_settled` and the `does_taint` ancestry walk. This is the primitive for coin selection / coin control: the caller decides how to aggregate, and can layer wallet-specific categories (e.g. "locked") on top — rather than being constrained to the fixed `Balance` buckets. `CanonicalView::balance` is re-expressed as a thin fold over `classify_outpoints`, adding each output's value to the `Balance` bucket that matches its `Eligibility`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a286ab4 to
ab0c5a2
Compare
… walk Close review-identified coverage gaps: - `test_balance_taint_stops_at_settled_ancestor`: a settled ancestor that `does_taint` would flag must not taint its unsettled descendant — isolates the `!is_settled` guard in the taint walk. - `test_classify_immature_and_settled`: `classify_outpoints` returns `Immature` for an immature coinbase and `Settled` for a mature output. - `ancestors_inclusive_yields_seeds`: the inclusive walk yields the seeds (with their own accumulators) and `len()` accounts for them. - `ancestors_multiple_seeds_dedup_shared_ancestor`: an ancestor shared by two seeds is visited once. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c9c6f24 to
71c35d9
Compare
|
@oleonardolima It will be ideal if this can reuse logic from #2219. |
Dmenec
left a comment
There was a problem hiding this comment.
Hey @evanlinjin thank you for pushing this.
As there are many things involved, I tried to go point-by-point on some thoughts I have in here.
Related to your questions:
The longer-term idea is that classify_outpoints is the real API and Balance is just one possible fold — wallets that want categories like "locked"/"reserved" aggregate themselves. This PR keeps Balance (as a fold) for now; a follow-up could move it down into bdk_wallet and drop it from bdk_chain.
Agree, classify_outpoints + Eligibility is a good primitive, Balance is just a convenience fold on top so it makes sense to me. I think moving Balance out of bdk_chain would be a breaking change for consumers calling it directly, so maybe the migration needs care.
Is replacing trust_predicate with ancestry-based does_taint the right call, or should both coexist? (does_taint can't express per-output trust, but per-tx ancestry is arguably the more principled model.)
As said in one comment below, trust_predicate had this identifier that provided specific assumptions per-output. I don't think that's a problem though, that policy was always wallet-specific. It makes more sense for bdk_wallet to handle it externally.
bdk_walletwill need its balance call updated (out of this repo's tree).
This is the part I'm least sure about, so I'll just lay out what I found and leave it open.
While looking into the walk I tried a different approach: a per-UTXO memoized DFS (one bool cached per tx, shared across UTXOs so each ancestor is visited once) instead of the topological accumulator pass. Same does_taint/is_settled contract, and I checked it gives the same Eligibility as classify_outpoints on a diamond. Benchmarked both across a few topologies:
| topology | size | memoized | classify_outpoints | speedup |
|---|---|---|---|---|
| fan-in | 10w×3d | 9.0 µs | 21.9 µs | 2.4x |
| fan-in | 50w×5d | 77.7 µs | 192.9 µs | 2.5x |
| fan-in | 100w×10d | 300 µs | 878 µs | 2.9x |
| fan-in | 500w×20d | 3.3 ms | 14.5 ms | 4.4x |
| disjoint | 10w×3d | 9.9 µs | 24.4 µs | 2.5x |
| disjoint | 50w×5d | 83.9 µs | 194.6 µs | 2.3x |
| disjoint | 100w×10d | 311 µs | 842 µs | 2.7x |
| disjoint | 500w×20d | 3.3 ms | 14.0 ms | 4.2x |
| untrusted-fan-in | 10w×3d | 8.1 µs | 24.2 µs | 3.0x |
| untrusted-fan-in | 50w×5d | 73.2 µs | 209.7 µs | 2.9x |
| untrusted-fan-in | 100w×10d | 295 µs | 940 µs | 3.2x |
| untrusted-fan-in | 500w×20d | 3.2 ms | 16.3 ms | 5.1x |
| diamond | 5s×20u | 13.5 µs | 39.2 µs | 2.9x |
| diamond | 10s×50u | 49.2 µs | 169.1 µs | 3.4x |
| diamond | 20s×100u | 158 µs | 627 µs | 4.0x |
| diamond | 50s×500u | 1.8 ms | 8.3 ms | 4.6x |
(w = number of parallel chains, d = depth of each chain; for diamond s = shared ancestor txs, u = UTXOs spending all of them.)
Both walk the same O(U+S) txs but the BTreeSet accumulator grows with the number of downstream UTXOs, while the memoized walk just stores one bool per tx and stops early once does_taint fires.
What I genuinely can't decide is where this belongs. It doesn't use CanonicalAncestors at all, just the public tx/txout, so it feels a bit out of place sitting next to classify_outpoints in bdk_chain, not really taking advantage of the ancestor-walk machinery this PR adds. It could just as easily live in bdk_wallet. But then direct bdk_chain consumers of balance() wouldn't benefit.
I'm not sure if it makes sense here, wallet-side, or not at all, some thoughts?
Code + bench: https://github.com/Dmenec/bdk/tree/bench/trust-classification
| /// Confirmed and immediately spendable balance | ||
| pub confirmed: Amount, | ||
| /// Settled balance: outputs whose transaction we are confident will not be replaced (e.g. | ||
| /// confirmed deeply enough), as determined by the balance query's `is_settled` predicate. | ||
| pub settled: Amount, |
There was a problem hiding this comment.
Agree, settled better reflects that the threshold is a wallet policy decision.
| /// Pending (not settled), and neither it nor any of its unsettled ancestors taints it. | ||
| TrustedPending, | ||
| /// Pending (not settled), and it or one of its unsettled ancestors taints it. | ||
| UntrustedPending, |
There was a problem hiding this comment.
I like the taint concept more than saying some previous transaction is untrusted, what's actually untrusted is the pending UTXO, not the ancestor.
| pub fn classify_outpoints( | ||
| &self, | ||
| outpoints: impl IntoIterator<Item = OutPoint>, | ||
| mut does_taint: impl FnMut(CanonicalTx<ChainPosition<A>>) -> bool, |
There was a problem hiding this comment.
trust_predicate let callers express per-output trust by keychain. does_taint only sees the ancestor transaction, which does not allow for a "blind trust assumption" (i.e. this output belongs to my keychain so trust it).
Not sure if this policy now has to be layered on top of the Eligibility results.
| for (descendants, c_tx) in self.ancestors_inclusive::<BTreeSet<Txid>, _, _>( | ||
| seeds.iter().copied(), | ||
| |c_tx| core::iter::once(c_tx.txid).collect(), | ||
| |c_tx| !is_settled(&c_tx.pos), | ||
| ) { |
There was a problem hiding this comment.
does_taint is called eagerly for every unsettled ancestor here and before the returned iterator is built. So if a caller provides an expensive closure, we'd have to evaluate it against every walked ancestor upfront, even if we only end up needing a small part of the UTXOs.
| /// Builds a diamond of confirmed transactions and returns the chain, graph and the four txs. | ||
| /// | ||
| /// ```text | ||
| /// tx_a (2 outputs) | ||
| /// / \ | ||
| /// tx_b tx_c | ||
| /// \ / | ||
| /// tx_d (root we walk ancestors from) | ||
| /// ``` |
There was a problem hiding this comment.
Maybe checking with different topolgies would be useful to cover.
For instance: Disjoint seeds with no shared ancestry or a seed that is itself an ancestor of another seed.
|
@Dmenec Thank you for the detailed review, analysis and experimentation on top. Would you like to take over this work? I agree that |
Warning
This PR — including this description — is currently entirely AI-generated.
It still needs proper human review (by me, @evanlinjin) before it should be taken seriously. Treat everything below as a draft proposal, not a vetted design.
Description
This is an alternative direction to #2221. Where #2221 makes
CanonicalView::balanceless restrictive by removing themin_confirmationsparameter, this PR instead tries to make it more useful — handing control to the caller via closures, deriving trust from ancestry rather than a per-script heuristic, and exposing a per-output spend-eligibility classifier that's the building block for coin control.The core new primitive is
CanonicalView::classify_outpoints, built on a reusable, sans-TxGraphcanonical ancestor walk.balancebecomes a thin fold over it.What's here (commit by commit)
1. Add
CanonicalAncestorsreverse-topological walk.Canonical::ancestors(seeds, map, should_walk)walks the canonical ancestors of a set of seed txids backwards (the seeds are the leaf-most txs, closer to the leaves of the ancestry DAG than its roots). Each tx is visited once in reverse-topological order, folding aMergeable accumulator from descendants into ancestors (diamonds merge correctly).should_walkprunes per-tx with theChainPositionavailable; the iterator isExactSizeIterator.2. Add
Canonical::ancestors_inclusive. A variant that also yields the seeds (each with its own final accumulator), so a per-seed contribution can be folded uniformly with the ancestry.3. Taint-aware
CanonicalView::balance(breaking). Replacestrust_predicate+min_confirmationswith two closures:does_taint(CanonicalTx) -> bool: a pending output is untrusted if it, or any of its unsettled ancestors, taints. The unsettled ancestry is walked once (deduped across outputs, stopping at settled txs) viaancestors_inclusive. This lets callers demote unconfirmed coins received from — or chained on top of — a third party. Trust is now derived from ancestry instead of a per-script heuristic.is_settled(&ChainPosition) -> bool: generalizes the numericmin_confirmationsinto a caller-defined "confident this won't be replaced" boundary, and is the sole authority on it — a settled output is never dropped or tainted (even an unconfirmed output a caller chooses to treat as settled is counted, not lost).balancealso drops itsOidentifier generic and now takes bareOutPoints.4. Rename
Balance::confirmedtoBalance::settled(breaking). The bucket is driven byis_settled(confidence a tx won't be replaced), not strictly confirmation status, so the field is renamed to match. Breaking serde key change (confirmed→settled, no alias).5. Add
classify_outpointsspend-eligibility classifier.CanonicalView::classify_outpointspairs each unspent output with a chain-levelEligibility(Settled/Immature/TrustedPending/UntrustedPending) fromis_settled+ thedoes_taintwalk. This is the primitive for coin selection / coin control (e.g. "prefer settled coins, fall back to trusted-pending"): the caller decides how to aggregate and can layer wallet-specific categories like "locked" on top, instead of being constrained to the fixedBalancebuckets.balanceis re-expressed as a thin fold over it.Direction / open questions for human review
classify_outpointsis the real API andBalanceis just one possible fold — wallets that want categories like "locked"/"reserved" aggregate themselves. This PR keepsBalance(as a fold) for now; a follow-up could move it down intobdk_walletand drop it frombdk_chain.trust_predicatewith ancestry-baseddoes_taintthe right call, or should both coexist? (does_taintcan't express per-output trust, but per-tx ancestry is arguably the more principled model.)Balance::confirmed→Balance::settled: worth the rename / serde break, or keep the conventional name?bdk_walletwill need itsbalancecall updated (out of this repo's tree).Notes to reviewers
classify_outpointsancestry-taint test, and a regression test that a settled-everythingis_settlednever drops an unconfirmed output. Existing balance tests were migrated (the old per-keychain trust test is recomputed for the new model, since none of its txs actually spend third-party coins).cargo fmt,clippy --all-features --all-targets -p bdk_chain, andRUSTDOCFLAGS="-D warnings" cargo docare clean; thebdk_chainsuite + doctests pass.Changelog notice
Changed
CanonicalView::balancenow takesdoes_taintandis_settledclosures instead oftrust_predicateandmin_confirmations, and accepts bareOutPoints.Balance::confirmedtoBalance::settled.Added
CanonicalView::classify_outpoints+Eligibility: per-output spend-eligibility classification (the building block for coin control).Canonical::ancestors/Canonical::ancestors_inclusive: reverse-topological canonical ancestor walk with aMergeable accumulator.🤖 Generated with Claude Code