htlcswitch+channeldb: batch fwd pkg loads at startup#10826
Conversation
In this commit, we extend `GlobalFwdPkgReader` (and its `SwitchPackager` implementation) with a new `LoadChannelFwdPkgsSet` method that loads fwd pkgs for a set of channels within a single read transaction. The existing `LoadChannelFwdPkgs` is per-channel, so callers that want to read fwd pkgs for the entire active channel set end up opening one read tx per channel. On nodes with 1k+ channels, this gets expensive fast, especially on remote-tx backends (etcd, postgres) where each tx carries network round-trip overhead. The new query takes a `[]lnwire.ShortChannelID`, walks the `fwd-packages` root bucket once, and returns a flat `[]*FwdPkg` across the requested set. Each `FwdPkg` already carries its originating channel via its `Source` field, so callers that only need to operate on the pkgs (e.g. the switch's reforward path) don't need a keyed map. Channels with no fwd pkgs on disk are silently skipped.
In this commit, we add a pair of benchmarks that exercise the prior
per-channel-tx path against the new batched `LoadChannelFwdPkgsSet`
query, across a matrix of channel-count + pkgs-per-channel scenarios
(up to 2k channels). The bench names are structured so they can be
paired up under a common name and fed to benchstat.
On a local bolt backend the per-iteration time is roughly at parity
for both variants (bolt read tx setup is cheap, so amortizing it
across channels doesn't move the needle much), while the batched path
consistently drops ~10% of allocs/op and ~4% of B/op across the
matrix. The bigger win lands on the remote-tx backends (etcd,
postgres) where the per-tx network round-trip dominates and the
batched query collapses N round-trips into one.
Sample benchstat output on darwin/arm64 (M4 Max), `-benchtime=2s
-count=6`:
```
│ old (per-tx) │ new (batched) │
│ sec/op │ sec/op vs base │
LoadChannelFwdPkgs/channels=100/pkgs=1-16 352.4µ ± 5% 341.9µ ± 3% ~
LoadChannelFwdPkgs/channels=500/pkgs=1-16 1.954m ± 3% 1.959m ± 3% ~
LoadChannelFwdPkgs/channels=1000/pkgs=1-16 4.036m ± 6% 3.990m ± 7% ~
LoadChannelFwdPkgs/channels=2000/pkgs=1-16 8.819m ± 18% 8.120m ± 44% ~
LoadChannelFwdPkgs/channels=1000/pkgs=4-16 15.86m ± 4% 17.11m ± 32% ~
geomean 3.296m 3.265m -0.93%
│ B/op │ B/op vs base │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16 11.57Mi ± 0% 11.04Mi ± 0% -4.61%
geomean 9.478Mi 9.107Mi -3.91%
│ allocs/op │ allocs/op vs base │
LoadChannelFwdPkgs/channels=1000/pkgs=1-16 160.4k ± 0% 144.4k ± 0% -9.97%
geomean 123.3k 112.5k -8.74%
```
In this commit, we rework `reforwardResponses` to load fwd pkgs for every active channel in a single read transaction via the new `SwitchPackager.LoadChannelFwdPkgsSet` query, instead of opening one read tx per channel. The prior loop opened a fresh `kvdb.View` for each channel returned by `FetchAllChannels` so that it could call `LoadChannelFwdPkgs(source)`. On nodes with 1k+ channels this adds up to 1k+ read txs at startup, which is particularly painful on remote-tx backends (etcd, postgres) where each tx is a network round-trip. We now collect the set of non-pending, non-local sources up front and hand the whole set to `LoadChannelFwdPkgsSet`, which walks the `fwd-packages` bucket once and returns a flat slice of fwd pkgs across all channels. `reforwardSettleFails` already keys off `fwdPkg.Source` for its ack/forward routing, so flattening the result doesn't lose any information: each pkg still knows the channel it came from.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request optimizes the startup process of the Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a batched method for loading forwarding packages, LoadChannelFwdPkgsSet, designed to improve performance for nodes with a large number of channels by reducing database transaction overhead. The htlcswitch has been updated to utilize this new method during the re-forwarding of responses on startup, and comprehensive benchmarks and unit tests have been added to verify the implementation. Feedback focuses on optimizing the batched load logic by reducing redundant bucket lookups, utilizing the standard library for integer-to-string conversion in tests, and ensuring cross-platform compatibility for file paths.
| } | ||
|
|
||
| for _, height := range heights { | ||
| fwdPkg, err := loadFwdPkg(fwdPkgBkt, source, height) |
There was a problem hiding this comment.
The current implementation of loadChannelFwdPkgsSet calls loadFwdPkg for each height, which internally re-fetches the sourceBkt from fwdPkgBkt using source. Since sourceBkt is already retrieved in the outer loop (line 656), this results in redundant bucket lookups for every forwarding package. For nodes with many channels and packages, this can add up, especially on remote backends where even in-transaction lookups may have overhead. Consider refactoring loadFwdPkg to accept the sourceBkt directly to avoid this redundancy.
| func itoa(n int) string { | ||
| if n == 0 { | ||
| return "0" | ||
| } | ||
|
|
||
| var buf [20]byte | ||
| i := len(buf) | ||
| for n > 0 { | ||
| i-- | ||
| buf[i] = byte('0' + n%10) | ||
| n /= 10 | ||
| } | ||
|
|
||
| return string(buf[i:]) | ||
| } |
There was a problem hiding this comment.
Reimplementing itoa is unnecessary as the standard library provides strconv.Itoa. Using the standard library is more idiomatic, robust (e.g., handling negative numbers), and easier to maintain. Since this is a test file, the overhead of an additional import is negligible and preferred over custom implementations of basic utilities.
| func makeFwdPkgBenchDB(b *testing.B) kvdb.Backend { | ||
| b.Helper() | ||
|
|
||
| path := b.TempDir() + "/fwdpkg.db" |
There was a problem hiding this comment.
🔴 PR Severity: CRITICAL
🔴 Critical (2 files)
⬛ Excluded (tests) (2 files)
AnalysisBoth non-test files in this PR touch CRITICAL packages:
The PR modifies two distinct critical packages simultaneously, which satisfies the multi-critical-package bump rule (though already at the ceiling). This warrants expert review to ensure the interaction between the forwarding package persistence changes and the switch logic remains consistent and safe. To override, add a |
In this PR, we rework the switch's startup reforward path so that all
forwarding packages across the active channel set are loaded in a single
read transaction, instead of opening one read tx per channel. For nodes
with 1k+ channels, the old per-channel-tx approach can take a while at
startup, and the cost gets dramatically worse on remote-tx backends
(etcd, postgres) where each tx is a network round-trip.
The shape of the change is small. We extend the
GlobalFwdPkgReaderinterface (and its
SwitchPackagerimplementation) with a newLoadChannelFwdPkgsSet(tx, sources)query that walks thefwd-packagesroot bucket once and returns a flat[]*FwdPkgacrossthe requested set. Each
FwdPkgalready carries its originatingchannel via its
Sourcefield, andreforwardSettleFailsalreadykeys off
fwdPkg.Sourcefor ack/forward routing, so a flat slice isall the switch needs.
Switch.reforwardResponsesnow collects the set of non-pending,non-local channels up front and hands the whole set to the batched
query, replacing the prior loop that called
loadChannelFwdPkgs(andopened a fresh
kvdb.View) per channel.Benchmarks
A new pair of benchmarks in
channeldbexercises the priorper-channel-tx path against the new batched query across a matrix of
channel-count + pkgs-per-channel scenarios, so we can compare both via
benchstat. On a local bolt backend the per-iteration time is roughly
at parity for both variants (bolt read tx setup is cheap, so
amortizing it across channels doesn't move the needle much), while the
batched path consistently drops ~10% of allocs/op and ~4% of B/op
across the matrix. The bigger win lands on the remote-tx backends
where the per-tx network round-trip dominates.
Sample benchstat output on darwin/arm64 (M4 Max),
-benchtime=2s -count=6:See each commit message for a detailed description w.r.t the
incremental changes.
Test plan
go test ./channeldb/...).TestSwitchPackagerLoadChannelFwdPkgsSetcovers correctness:batched output matches the per-channel concatenation, channels with
no fwd pkgs on disk are silently skipped, each pkg's
Sourcetracksits originating channel, and an empty input is a no-op.
htlcswitchbuilds and existing reforward path is exercised bythe integration of
reforwardResponseswith the new query.go test -run NONE -bench BenchmarkLoadChannelFwdPkgs ./channeldb/.