Skip to content

multi: harden Go module supply chain, internalize small deps#10836

Open
Roasbeef wants to merge 10 commits into
lightningnetwork:masterfrom
Roasbeef:module-hardening
Open

multi: harden Go module supply chain, internalize small deps#10836
Roasbeef wants to merge 10 commits into
lightningnetwork:masterfrom
Roasbeef:module-hardening

Conversation

@Roasbeef
Copy link
Copy Markdown
Member

In this PR, we tighten the Go module supply chain around the lnd build and start chipping away at the long tail of small external deps that the daemon was carrying for one or two call sites each. The motivation is the recent wave of supply-chain incidents in the broader Go ecosystem (proxy desync, deleted-then-republished modules, malicious toolchain directives in transitives, etc, etc), plus the realization that a fair number of our direct deps are single-maintainer modules of a few hundred lines that we'd rather just own inside the tree.

See each commit message for the detailed per-change writeup w.r.t. the incremental changes. The overall shape is: one commit on the build/CI guardrails, one on shrinking what the daemon's default build pulls in, and then five commits that either vendor a small dep into internal/ or swap it for a stdlib equivalent.

Build + CI lockdown

The first commit (build+ci: ...) is the highest-leverage low-risk change in the PR. We set GOFLAGS=-mod=readonly in both the Makefile and the top-level workflow env block, which means any go build / go test / go vet that wants to mutate go.mod or go.sum now fails loudly; the only sanctioned path for module updates is make tidy-module, which explicitly clears GOFLAGS before invoking tidy. We also set GOTOOLCHAIN=local everywhere so a malicious or accidental toolchain directive in any transitive can't cause the go command to silently download and execute a different Go binary.

On the CI side, we drop the ,direct fallback from GOPROXY (so CI can never bypass the checksum-immortalized Google proxy and fall through to a raw VCS fetch), and we add two new static-check steps: go mod verify re-hashes the local module cache against go.sum to catch a tampered cache, and govulncheck ./... (pinned to v1.1.4, not floating on @latest) walks the call graph and flags known CVEs that actually reach our compiled code. No source changes in this commit.

Postgres fixture gating

The second commit moves the dockertest-based postgres fixtures behind the test_db_postgres build tag. The fixtures were previously only excluded on a handful of unusual GOOS/GOARCH combinations, which meant the standard Linux/macOS/Windows builds unconditionally compiled them and pulled in the full docker SDK tail (docker/docker, docker/cli, moby/*, opencontainers/*, containerd/continuity, sirupsen/logrus, go-viper/mapstructure, xeipuuv/gojsonschema, Microsoft/go-winio, and a long list of indirects) into the lnd daemon's dependency graph even though no production code path reached any of them.

We ship companion postgres_fixture_disabled.go stubs under the inverse build tag so the cross-package test consumers in invoices/, batch/, payments/db/, and graph/db/ that call sqldb.NewTestPgFixture unconditionally keep compiling. The stub NewTestPgFixture is a no-op constructor returning an empty fixture, so parametrized tests that have both a sqlite and a postgres branch keep running the sqlite branch unaffected; only NewTestPostgresDB / NewTestPostgresDBWithVersion call t.Skip, which means just the postgres sub-test ends up skipped. After this change, go list -deps ./cmd/lnd no longer reports any of the docker SDK packages or their tail.

Internalized deps

Commits 3-6 each take an external dep that lnd was using for a tiny call surface and move it into internal/ so we own the code outright. The criteria: small (under ~250 LoC), few call sites in our tree, single-maintainer upstream, and the upstream wire format / API is effectively frozen so there's nothing meaningful to "stay current with":

  • tv42/zbase32 (commit 3) becomes internal/zbase32. Two call sites in rpcserver.go, used by SignMessage / VerifyMessage.
  • kkdai/bstream (commit 4) becomes aezeed/internal/bstream. Two call sites in aezeed/cipherseed.go, used for the 11-bit-per-word bit packing of the cipher seed.
  • coreos/go-systemd v0 (commit 5) becomes signal/internal/sdnotify. Four call sites in signal/signal.go, used to talk to systemd under Type=notify. The v22 module path stays as an indirect via etcd; this commit only kills the unmaintained v0 namespace.
  • jackpal/gateway + jackpal/go-nat-pmp (commit 6) become nat/internal/pmp. Five call sites in nat/pmp.go, used by lnd --nat for IGD port mapping over the RFC 6886 wire protocol.

In each case the port is a faithful copy of the upstream code (all MIT-licensed, attribution preserved in package doc) reduced to the API surface we actually call. None of these touches a wire format anyone else sees: aezeed mnemonics, signed-message envelopes, and NAT-PMP packets all serialize byte-for-byte the same as before.

Brotli, swapped for stdlib gzip

The final commit drops andybalholm/brotli and switches the encryptdebugpackage / decryptdebugpackage flow in lncli to the standard library's compress/gzip. brotli was lncli's only consumer of the module, used to shrink the highly-repetitive JSON payload (peer IDs, channel state, hex-encoded fields) before the ECIES envelope. gzip at BestCompression compresses that payload nearly as well on this kind of data, so we get to drop ~25k lines of pure-Go brotli tables in exchange for a marginally larger encrypted blob.

There is one user-visible change worth calling out: the bytes inside the ECIES envelope are now a gzip stream rather than a brotli stream, so a debug package produced by an older lncli cannot be decrypted by this lncli. decryptdebugpackage sniffs the first two bytes against the RFC 1952 gzip magic (0x1f 0x8b) and surfaces a specific error pointing the user at an older lncli build when it sees a brotli prefix. Debug packages are short-lived support artifacts shared and consumed by the same lncli binary, not stored state, so the impact window is small.

Testing approach

For each vendored package we ship both golden vectors (lifted directly from the upstream test suite, e.g. the canonical tv42/zbase32 "foobar" -> "c3zs6aubqe" pair) and pgregory.net/rapid property tests for the round-trip invariants. The golden vectors guard against bit-level divergence from upstream (which would silently break user-visible state like signed messages or aezeed mnemonics); the rapid tests give us coverage across the input space rather than just hand-picked examples.

The NAT-PMP port additionally includes a fake server bound to 127.0.0.1 on an ephemeral port that lets each test dictate the reply bytes, so we cover the RFC-mandated response-opcode check (request|0x80), the non-zero result-code rejection, and the retry-loop timeout when the server drops every packet. The sdnotify port has end-to-end tests that bind a real AF_UNIX/SOCK_DGRAM listener, point NOTIFY_SOCKET at it, and read the datagram off the socket to confirm exact byte equality.

The existing aezeed mnemonic golden tests in cipherseed_test.go continue to lock in the wire format end-to-end and would catch any unintended bit-packing behavior change the new microtests might miss.

Aggregate impact

go.mod loses six direct deps (tv42/zbase32, kkdai/bstream, coreos/go-systemd v0, jackpal/gateway, jackpal/go-nat-pmp, andybalholm/brotli). go list -deps ./cmd/lnd no longer reports the docker SDK tail (around 25 packages) or any of the four vendored modules. CI gains go mod verify, govulncheck, and the new module-mutation guardrails.

Roasbeef added 7 commits May 22, 2026 15:14
In this commit, we tighten the guardrails around how the Go module
graph can change during a build. Today the per-module hashes in
go.sum are well-pinned, but `go mod tidy` (and any plain `go build`
that fetches a new transitive) can silently rewrite the pin set in
the same PR diff, which has been the entry vector for a handful of
recent supply-chain incidents in the Go ecosystem.

We add four overlapping mitigations.

First, we set `GOFLAGS=-mod=readonly` in both the Makefile and the
top-level workflow env block. Any `go build` / `go test` / `go vet`
that wants to mutate go.mod or go.sum now fails loudly; the only
sanctioned path for module updates is `make tidy-module`, which
explicitly clears GOFLAGS before invoking tidy. `make vendor` does
the same so the vendor-directory regeneration path still works.

Second, we set `GOTOOLCHAIN=local` in both the Makefile and CI.
Go's auto-toolchain behavior means that a malicious or accidental
`toolchain` directive in any transitive go.mod can cause the
running `go` command to silently download and execute a different
Go binary. Pinning to `local` makes the system-installed toolchain
the only one we will ever run, regardless of what go.mod claims.

Third, we drop the `,direct` fallback from CI's GOPROXY. With
`GOPROXY=https://proxy.golang.org` (no fallback) CI cannot bypass
the Google-maintained, checksum-immortalized module proxy and pull
directly from VCS - exactly the path that a few recent incidents
used to substitute different bytes for a published version.
Developer envs are unaffected; this is a CI-only restriction.

Fourth, we add a `go mod verify` step and a pinned-version
`govulncheck ./...` step to the static-checks job. The first
re-hashes every module in the local cache against go.sum and
catches a tampered cache that slipped past the proxy; the second
walks the call graph and reports known CVEs that actually reach
our compiled code paths. govulncheck is pinned to v1.1.4 so the
install itself is reproducible.

None of these changes touch source code, and the only behavioral
difference for contributors is that `go build` outside the tidy
target will now refuse to silently add a new dep - the workflow
is to run `make tidy-module` explicitly, then re-build.
In this commit, we move the two dockertest-based postgres test
fixtures (sqldb/postgres_fixture.go and sqldb/v2/postgres_fixture.go)
behind the `test_db_postgres` build tag so the daemon's default
build no longer drags the docker SDK into its dependency graph.

The fixtures were previously only excluded on a handful of unusual
GOOS/GOARCH combinations - the standard Linux/macOS/Windows builds
unconditionally compiled them, which meant the entire docker tail
(github.com/ory/dockertest/v3, docker/docker, docker/cli, moby/*,
opencontainers/{go-digest,image-spec,runc}, containerd/continuity,
sirupsen/logrus, go-viper/mapstructure, dario.cat/mergo,
xeipuuv/gojsonschema, Microsoft/go-winio, and a long tail of
indirects) ended up linked into the `lnd` daemon graph even though
no production code path reached them.

To avoid breaking the cross-package test consumers in invoices/,
batch/, payments/db/, and graph/db/ that call sqldb.NewTestPgFixture
unconditionally, we ship companion stub files
(postgres_fixture_disabled.go in both v1 and v2) under the inverse
build tag. The stubs expose the same exported surface
(TestPgFixture, NewTestPgFixture, NewTestPostgresDB, GetConfig,
TearDown, RandomDBName, PostgresTag) but make every postgres-
requiring helper call t.Skip with a clear "rerun with
-tags=test_db_postgres" hint. NewTestPgFixture itself remains a
no-op constructor returning an empty fixture, so parametrized
tests that have both a sqlite and a postgres branch keep running
their sqlite branch unaffected; only the postgres sub-test ends
up skipped.

We also add the `test_db_postgres` tag to the two
fixture-internal test files (sqldb/migrations_test.go and
sqldb/v2/postgres_fixture_test.go) that reach into the now-gated
helpers directly. Those tests are postgres-specific anyway and
the CI matrix already runs them with the right tag.

After this change `go list -deps ./cmd/lnd` no longer reports any
of the docker SDK packages or their transitive tail.
In this commit, we replace the external github.com/tv42/zbase32
dependency with a small internal fork at internal/zbase32. lnd
uses exactly two functions from that module - EncodeToString and
DecodeString, both inside rpcserver.go for the SignMessage /
VerifyMessage signature envelope - and the upstream module is a
straightforward ~150 LoC implementation of z-base-32 (Zooko's
human-friendly variant of base32, defined in Phil Zimmermann's
"human oriented base-32 encoding" memo).

Carrying a small dependency for two call sites was the textbook
case for in-treeing: there is essentially nothing to "stay current
with" upstream (the spec has been frozen for decades), the
upstream module sees one commit every few years, and a future
compromise of the tv42/zbase32 tag history would silently change
how every signed-message envelope on the wire is encoded.

The port is a faithful copy of the upstream code (MIT-licensed,
copyright Tommi Virtanen) reduced to the API surface lnd actually
uses. We drop the `Bits` variants (which lnd never calls) and the
fuzz harness, and we tighten one or two doc comments to point at
lnd's usage. The alphabet table and the encode/decode bit-shifting
arithmetic are byte-for-byte identical, which is important because
any divergence would silently re-encode every existing signed-
message string into a different (but still valid-looking) value.

To lock that property in, the new test file includes:

  - Golden vectors lifted directly from the canonical
    tv42/zbase32 test suite (empty, "f", "fo", "foo", "foob",
    "fooba", "foobar"). Encoding any of these must produce the
    exact same z-base-32 string the upstream module produces.

  - A rapid property test asserting DecodeString reverses
    EncodeToString for arbitrary byte slices up to 256 bytes,
    matching the round-trip invariant the SignMessage call path
    relies on.

  - An alphabet-table paranoia check that pins the 32-character
    alphabet to its known value, so a future "tidy up" PR that
    reorders the table would fail loudly instead of silently
    invalidating user signatures.

  - A negative test verifying that the decoder surfaces a
    CorruptInputError (with the correct offset) when a transcribed
    character is not in the alphabet, since '0' / 'o' confusion
    is the most common user-input failure mode here.

After this commit the daemon's go.mod no longer lists
github.com/tv42/zbase32 in any form.
In this commit, we replace the external github.com/kkdai/bstream
dependency with a small internal fork at aezeed/internal/bstream.
The aezeed cipher seed format uses bit-level packing to fit 24
mnemonic words (each an 11-bit index into the wordlist) into a
33-byte enciphered seed, and that bit-packing was the only thing
aezeed used the bstream module for - exactly two call sites
(NewBStreamReader and NewBStreamWriter) plus ReadBits / WriteBits
/ Bytes on the returned object.

We move the file into the aezeed namespace under internal/ so it
is only importable from within aezeed itself. The package becomes
a sibling of the cipher seed implementation rather than an
arms-length module that downstream tooling has to track.

The port preserves the upstream API (MIT-licensed, by Mark
Chiang) verbatim for the names lnd actually calls and trims
WriteBit / WriteOneByte / ReadBit / ReadByte to unexported
helpers since they are internal implementation details of
WriteBits / ReadBits. Crucially, the bit-shifting arithmetic is
byte-for-byte identical to upstream: the aezeed wire format is
consensus-frozen across every 24-word mnemonic users have ever
generated, and any divergence in bit ordering or padding would
silently invalidate every existing seed.

To guard against that, the new test file exercises:

  - The byte-aligned WriteBits(b, 8) / ReadBits(8) round-trip,
    since 8-bit chunks are the path WriteBits / ReadBits fall
    through to for the high-order parts of an 11-bit word.

  - A two-word 11-bit packing round-trip on a hand-picked pair
    of values that hits both the aligned and the unaligned code
    paths inside encode / decode.

  - A rapid property test packing arbitrary 11-bit words (the
    exact aezeed use case) and verifying readback matches.

  - A separate rapid property test for the pure-byte stream
    pattern, since the same primitive is used in both modes.

  - A read-past-end check verifying the reader surfaces io.EOF
    rather than silently returning zero bytes.

The existing aezeed mnemonic golden vectors (in cipherseed_test.go)
remain unchanged and continue to lock in the wire format end-to-end;
they would catch any unintended behavior change in the port that
the new microtests miss.

After this commit github.com/kkdai/bstream drops out of the direct
`require` block in go.mod; it remains as a `// indirect` because
btcsuite/btcwallet still pulls it.
In this commit, we replace the external github.com/coreos/go-systemd
v0 dependency with a small in-tree implementation at
signal/internal/sdnotify. The signal package called exactly one
function out of that module - daemon.SdNotify, used to send
"READY=1" / "STOPPING=1" / "STATUS=..." messages to a systemd
service supervisor when lnd is run under a "Type=notify" unit.

The sd_notify protocol itself is trivial: write the state string
to the AF_UNIX/SOCK_DGRAM socket whose path is in the NOTIFY_SOCKET
environment variable, and treat an unset NOTIFY_SOCKET as a silent
no-op (the common case outside of systemd). The whole production
surface lnd needs is around thirty lines, which is hard to justify
as an external supply-chain liability - especially since the v0
module path coreos/go-systemd has not seen real maintenance in
years (active development moved to coreos/go-systemd/v22, which
lnd still pulls in transitively via etcd).

The new package exposes the same three names the signal package
was using (SdNotify, SdNotifyReady, SdNotifyStopping) with the
same call signature so signal.go's call sites change only in
their import path. We preserve the unsetEnvironment semantics
from upstream so child processes forked after a notify call do
not inherit the socket path, matching the upstream contract.

The accompanying test exercises:

  - The no-NOTIFY_SOCKET path: returns (false, nil) the way
    callers expect, so unattended dev builds keep working.

  - The happy path: binds a real AF_UNIX/SOCK_DGRAM listener,
    points NOTIFY_SOCKET at it, calls SdNotify, and reads the
    datagram back off the socket to confirm exact byte equality.

  - The unsetEnvironment flag: confirms NOTIFY_SOCKET is cleared
    after the call so a subsequent fork cannot inherit it.

  - A bad-path test pointing NOTIFY_SOCKET at a nonexistent
    socket file and asserting the error is surfaced rather than
    swallowed - signal.go's caller uses that error to log a hint
    about systemd configuration, so silent swallowing would
    regress operator UX.

After this commit github.com/coreos/go-systemd drops out of
go.mod entirely. The v22 module path remains as an indirect via
etcd; this commit only kills the unmaintained v0 namespace.
In this commit, we replace the two external jackpal modules used
by the NAT package (github.com/jackpal/go-nat-pmp and
github.com/jackpal/gateway) with a single internal fork at
nat/internal/pmp. The nat package previously used these for IGD
port mapping when the user runs `lnd --nat`: gateway discovery
finds the default router, and go-nat-pmp speaks the small RFC 6886
wire protocol against it.

Both upstream modules are tiny, lightly maintained, and authored
by the same maintainer outside of any organization. nat/pmp.go
calls into them at exactly four points (DiscoverGateway plus
NewClientWithTimeout / GetExternalAddress / AddPortMapping). The
NAT-PMP wire format is twelve bytes per request, the result codes
are defined inline in the RFC, and the gateway discovery is just
a per-OS parser over `route -n get default`, `ip route show`, or
`netstat -rn` output. In-treeing trades roughly 250 lines of code
for cutting two external dependencies and the supply-chain risk
that each carries.

The port preserves the upstream public API verbatim for the names
nat/pmp.go calls: Client, NewClientWithTimeout, GetExternalAddress,
GetExternalAddressResult, AddPortMapping, AddPortMappingResult,
DiscoverGateway. The retry schedule from RFC 6886 section 3.1
(nine doubling retries starting at 250ms, capped by the caller's
overall timeout) is preserved exactly; the response validation
(version byte, opcode|0x80, result code) is preserved exactly; the
section 3.2 requirement to ignore replies whose source IP does not
match the configured gateway is preserved exactly.

Gateway discovery is split across five files (one per OS family)
so the dead-code analysis cleanly drops per-platform branches and
so the BSD/Solaris and Windows specifics live next to their per-
OS exec.Command invocations. We add a sixth `unimplemented` file
that returns a clear "not implemented on $GOOS" error for any
target outside the supported set, mirroring upstream behavior.

The accompanying tests exercise:

  - Each per-OS route-table parser against a representative
    output sample taken from real machines (Linux iproute2,
    Linux legacy `route -n`, macOS `route -n get`, FreeBSD/
    Solaris `netstat -rn`, Windows `route print`).

  - A negative path for every parser asserting errNoGateway is
    surfaced rather than returning a zero IP, so a future
    iproute2 output-format change fails loudly.

  - A NAT-PMP fake server bound to 127.0.0.1 on an ephemeral
    port that lets each test dictate the reply bytes. We cover
    the happy GetExternalAddress path, the RFC-mandated opcode
    flip check (request|0x80), the non-zero result-code rejection,
    and the retry-loop timeout when the server drops every packet.

After this commit github.com/jackpal/gateway and
github.com/jackpal/go-nat-pmp drop out of go.mod entirely.
In this commit, we drop the external github.com/andybalholm/brotli
dependency from lncli's encryptdebugpackage / decryptdebugpackage
flow and switch to the standard library's compress/gzip. brotli
was lncli's only consumer of the module, used to shrink the
highly-repetitive JSON payload (peer IDs, channel state, hex-
encoded fields) before the payload was ECIES-encrypted to a
developer's pubkey for support bug reports.

gzip at BestCompression compresses the lncli debug JSON nearly as
well as brotli on this kind of payload, and it ships with the Go
runtime, so we get to drop ~25k lines of pure-Go brotli tables
and one more external maintainer from the lncli supply chain
without giving up any meaningful compression ratio.

The wire format of an encrypted debug package changes as a side
effect: the bytes inside the ECIES envelope are now a gzip stream
instead of a brotli stream. To give users running an older lncli
a clear error path instead of a generic "unable to decompress"
panic, decryptdebugpackage now sniffs the first two bytes of the
decrypted payload against the RFC 1952 gzip magic (0x1f 0x8b). A
mismatch surfaces a specific error pointing the user at an older
lncli build, since older brotli-encoded packages cannot be read
by this lncli.

This is a one-time format break confined to support packages
generated and consumed by the same lncli binary; debug packages
are short-lived artifacts shared in support flows, not stored
state, so the impact window is small.

The new cmd_debug_test.go covers:

  - A gzip round-trip on a representative repetitive payload,
    asserting the compressed output begins with the gzip magic
    that the decrypt path sniffs for.

  - The negative path - a synthetic non-gzip prefix - to lock in
    the "this is likely an older brotli-encoded package" error
    branch decryptdebugpackage now relies on.

After this commit github.com/andybalholm/brotli drops out of
go.mod entirely.
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on hardening the lnd build pipeline and reducing the project's external dependency surface. By internalizing small, stable utility modules and switching to standard library alternatives where possible, the project reduces its exposure to supply-chain vulnerabilities. Additionally, the PR introduces stricter build-time and CI-time constraints to ensure reproducible and secure builds.

Highlights

  • Supply Chain Security: Implemented strict build guardrails including -mod=readonly and GOTOOLCHAIN=local, and added CI checks like go mod verify and govulncheck to mitigate supply-chain risks.
  • Dependency Reduction: Internalized several small, single-maintainer dependencies (zbase32, bstream, sdnotify, nat-pmp) and replaced the brotli dependency with the standard library's gzip.
  • Build Optimization: Moved heavy Postgres test fixtures behind a build tag to significantly reduce the daemon's dependency footprint.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/main.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label May 22, 2026
@github-actions
Copy link
Copy Markdown

Severity CRITICAL - rpcserver.go (core server coordination). sqldb/* files (HIGH). aezeed/, cmd/, nat/, signal/, internal/zbase32/* (MEDIUM). .github/*, Makefile (LOW). 23 non-test files and ~1169 lines changed (both exceed bump thresholds but already at CRITICAL). <!-- pr-severity-bot -->

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reduces the project's dependency footprint by inlining internal forks of several utility packages, including bstream, zbase32, pmp, and sdnotify. It also transitions debug package compression from brotli to gzip, updates the Makefile to enforce read-only module behavior, and introduces build tags to make Postgres test fixtures optional. Feedback identifies a logic error in the NAT-PMP retry loop, potential undefined behavior in bit shifting, and an optimization opportunity to avoid repeated allocations in the zbase32 decoder. Additionally, several style guide violations were noted regarding function comment formatting and missing spacing in switch statements.

Comment on lines +153 to +154
for tries := uint(0); (tries < natRetries && finalDeadline.IsZero()) ||
time.Now().Before(finalDeadline); {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The retry loop condition logic is flawed when a finalDeadline is provided. The || operator causes the tries < natRetries check to be ignored if a deadline is set. This could lead to an excessive number of retries and potential overflow of the backoff shift if the deadline is far in the future. It should use && to ensure both constraints are respected.

Suggested change
for tries := uint(0); (tries < natRetries && finalDeadline.IsZero()) ||
time.Now().Before(finalDeadline); {
for tries := uint(0); tries < natRetries && (finalDeadline.IsZero() ||
time.Now().Before(finalDeadline)); {

// WriteBits appends the low `count` bits of `data` to the stream, MSB
// first.
func (b *BStream) WriteBits(data uint64, count int) {
data <<= uint(64 - count)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential undefined behavior or zero shift if count is 0. While aezeed currently uses fixed bit counts, adding a guard for count > 0 would make this internal utility more robust against future usage or invalid inputs.

Suggested change
data <<= uint(64 - count)
if count <= 0 {
return
}
data <<= uint(64 - count)

// lookup table maps the number of input characters in the
// current quantum to the number of output bytes produced.
if bits < 0 {
lookup := []int{0, 1, 1, 2, 2, 3, 4, 4, 5}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The lookup slice is allocated on every iteration of the loop. Moving this to a package-level constant or variable would improve efficiency by avoiding unnecessary allocations.

var decodeLookup = []int{0, 1, 1, 2, 2, 3, 4, 4, 5}

// decode is the inverse of encode. When bits < 0 the routine consumes the
// full src buffer; otherwise it decodes exactly `bits` bits.
func decode(dst, src []byte, bits int) (int, error) {
	olen := len(src)
	off := 0
	for len(src) > 0 {
		// ... existing code ...
		if bits < 0 {
			off += decodeLookup[j]
			continue
		}
		// ... existing code ...
	}
	return off, nil
}

Comment on lines +81 to +88
switch protocol {
case "udp":
opcode = 1
case "tcp":
opcode = 2
default:
return nil, fmt.Errorf("unknown protocol %q", protocol)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The switch statement stanzas should be separated by newlines to adhere to the repository style guide.

References
  1. Use spacing between case and select stanzas. (link)

// big-endian helpers; using encoding/binary would pull a tiny extra
// abstraction layer for two-byte and four-byte reads where the
// hand-rolled version is clearer.
func beUint16(b []byte) uint16 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Every function must have a comment beginning with the function name to adhere to the repository style guide.

References
  1. Function comments must begin with the function name. (link)

// best available route-table tool. The same fallback ladder is preserved
// from upstream jackpal/gateway: `route -n` first (universally available
// on older distros), then iproute2 `ip route show`, then `ip route get`.
func DiscoverGateway() (net.IP, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DiscoverGateway function is missing a comment describing its purpose and assumptions, which is required by the repository style guide.

References
  1. Every function must be commented with its purpose and assumptions. (link)

"os/exec"
)

// DiscoverGateway returns the IPv4 default gateway on macOS by parsing
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function comment for DiscoverGateway must begin with the function name to adhere to the repository style guide.

References
  1. Function comments must begin with the function name. (link)

"syscall"
)

// DiscoverGateway returns the IPv4 default gateway on Windows by parsing
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function comment for DiscoverGateway must begin with the function name to adhere to the repository style guide.

References
  1. Function comments must begin with the function name. (link)

"os/exec"
)

// DiscoverGateway returns the IPv4 default gateway on the BSD family +
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function comment for DiscoverGateway must begin with the function name to adhere to the repository style guide.

References
  1. Function comments must begin with the function name. (link)

"runtime"
)

// DiscoverGateway returns an error on platforms where lnd does not know
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function comment for DiscoverGateway must begin with the function name to adhere to the repository style guide.

References
  1. Function comments must begin with the function name. (link)

Roasbeef added 2 commits May 22, 2026 16:36
In this commit, we remove the last external consumer of
github.com/ltcsuite/ltcd from the lnd source tree. The only call
site was inside channeldb/migration_01_to_11/migration_11_invoices.go
(plus its sibling test), where the historical "version 11" invoice
migration iterated over `litecoinCfg.{MainNet,SimNet,RegressionNet,
TestNet4}Params` for the sole purpose of extracting their four
`Bech32HRPSegwit` strings ("ltc", "sltc", "bcrt", "tltc"). It then
fed each one into a hand-built `bitcoinCfg.Params` so the legacy
zpay32 decoder could try-decode old invoice payment requests
against both bitcoin and litecoin HRPs.

lnd dropped litecoin support years ago, but the migration itself
still has to run for anyone upgrading from a pre-version-11
channeldb (otherwise their old litecoin invoices would fail to
deserialize on startup). The dependency, however, was carrying its
whole tail along for the ride: ltcsuite/ltcd pulled in ltcsuite/
ltcutil, btcsuite/golangcrypto, btcsuite/snappy-go, and
btcsuite/goleveldb, none of which we actually used.

We replace the chaincfg lookup with an inlined
`litecoinBech32HRPs` string slice containing the same four HRPs in
the same iteration order, including the duplicated "bcrt" entry
(litecoin regtest used the same HRP as bitcoin regtest, so the
original code was effectively trying that one twice; we preserve
the behavior verbatim to avoid any chance of changing legacy
migration outcomes). The migration test gets the same treatment:
where it previously read `litecoinCfg.MainNetParams.Bech32HRPSegwit`
to build a payment-request fixture, it now uses the literal "ltc"
with an inline comment explaining why.

The existing migration unit tests pass unchanged. The dropped
deps are: github.com/ltcsuite/ltcd, github.com/ltcsuite/ltcutil
(indirect), github.com/btcsuite/golangcrypto (indirect),
github.com/btcsuite/snappy-go (indirect), and
github.com/btcsuite/goleveldb (indirect, was pulled by ltcd via
its own block-storage stack).
In this commit, we drop the github.com/jedib0t/go-pretty/v6
dependency and the two indirect Unicode-width modules it pulls
along with it (github.com/mattn/go-runewidth and
github.com/rivo/uniseg) by hand-rolling a tiny in-tree ASCII
table renderer under cmd/commands/internal/asciitable. lncli had
exactly one consumer of go-pretty - the `trackpayment` live
payment table in cmd_payments.go - and the upstream module is
~6000 LoC across alternate renderers (HTML, CSV, markdown),
styles (bold, double, rounded, ColoredBright, etc.), ANSI
coloring, runewidth-aware width handling, sorting, captions,
footers, and page-break logic, almost none of which we use.

The replacement is ~200 LoC and exposes the same call surface
that cmd_payments.go was using: Writer, Row, ColumnConfig,
NewWriter, AppendHeader, AppendRow, SetColumnConfigs,
SetOutputMirror, Render, plus AlignLeft / AlignRight /
AlignDefault constants. The auto-alignment heuristic mirrors
go-pretty's StyleDefault (right-align numeric kinds via
reflect.Kind, left-align everything else) so the trackpayment
table renders byte-for-byte the same way as before.

The test suite includes a byte-exact assertion against the
expected go-pretty StyleDefault output for a representative
HTLC_STATE / ATTEMPT_TIME / CHAN_OUT layout, plus narrower
tests for the no-output-mirror no-op, numeric vs. string auto-
alignment, short-row degenerate cases, and fmt.Stringer cell
formatting (lnrpc proto enums satisfy Stringer and the live
table renders their human-readable names).

We deliberately drop everything we don't need: ANSI color
output, Unicode width counting via go-runewidth + uniseg
(trackpayment cells are pure ASCII - HTLC states, formatted
decimals, integer chan IDs, hex pubkey aliases joined by "->"),
HTML / CSV / Markdown alternate renderers, automatic wrapping,
captions, footers, page separators, sorting. utf8.RuneCountInString
is sufficient for our column sizing.

After this commit github.com/jedib0t/go-pretty/v6,
github.com/mattn/go-runewidth, and github.com/rivo/uniseg all
drop out of go.mod entirely.
@starius
Copy link
Copy Markdown
Collaborator

starius commented May 23, 2026

I happened to work in the same direction at the same time :)
https://github.com/starius/lnd/tree/cut-test-deps

The last commit in the LND branch also makes use of similar shrinking in BTCD and neutrino:
https://github.com/starius/neutrino/tree/cut-test-deps
https://github.com/starius/btcd/tree/cut-test-deps

I put both branches to AI and compared them. Hopefully some of my changes can be re-applied here. Below is the comparison of the branches written by AI.


Package graph counts:

PR #10836:
  lnd:   924 packages
  lncli: 933 packages

cut-test-deps:
  lnd:   760 packages
  lncli: 770 packages

Package families still present in the PR #10836 lnd and lncli package graphs with the tag set above:

testing
github.com/stretchr/testify/{assert,require,mock,...}
pgregory.net/rapid
github.com/lightningnetwork/lnd/lntest/{mock,channels,wait}
github.com/fergusstrange/embedded-postgres
go.etcd.io/etcd/server/v3/...
go.etcd.io/etcd/raft/v3/...

Concrete compiled import paths found in PR #10836:

cmd/lnd -> lnd -> kvdb -> kvdb/postgres -> github.com/fergusstrange/embedded-postgres

cmd/lnd -> lnd -> cluster -> kvdb/etcd -> go.etcd.io/etcd/server/v3/embed

cmd/lnd -> lnd -> lnwire -> pgregory.net/rapid

cmd/lnd -> lnd -> channeldb -> github.com/stretchr/testify/require

cmd/lnd -> lnd -> contractcourt -> github.com/lightningnetwork/lnd/lntest/mock

cmd/lnd -> lnd -> lightninglabs/neutrino/headerfs -> github.com/stretchr/testify/mock

github.com/kkdai/bstream is still present in the compiled lnd and lncli package graphs in PR #10836:

cmd/lnd -> lnd -> btcwallet/chain -> btcd/btcutil/gcs -> github.com/kkdai/bstream

cmd/lncli -> cmd/commands -> lnd -> btcwallet/chain -> btcd/btcutil/gcs -> github.com/kkdai/bstream

github.com/kkdai/bstream is also still present in the module graph through btcd, btcd/btcutil, btcwallet, and neutrino.

github.com/coreos/go-systemd/v22/journal is still present in the compiled lnd and lncli package graphs when kvdb_etcd is enabled:

cmd/lnd -> lnd -> cluster -> go.etcd.io/etcd/client/v3 -> go.etcd.io/etcd/client/pkg/v3/logutil -> github.com/coreos/go-systemd/v22/journal

cmd/lncli -> cmd/commands -> lnd -> cluster -> go.etcd.io/etcd/client/v3 -> go.etcd.io/etcd/client/pkg/v3/logutil -> github.com/coreos/go-systemd/v22/journal

Package families not present in the compiled lnd/lncli graphs after PR #10836:

github.com/andybalholm/brotli
github.com/tv42/zbase32
github.com/jackpal/gateway
github.com/jackpal/go-nat-pmp
github.com/coreos/go-systemd   // old v0 module path
github.com/ory/dockertest/v3 and Docker/OpenContainers/containerd package tail

Module graph entries still present in PR #10836 even when not linked into the binaries:

github.com/andybalholm/brotli
  via github.com/golang-migrate/migrate/v4

github.com/ory/dockertest/v3 and Docker/OpenContainers/containerd tail
  via root go.mod indirects and lnd submodules such as kvdb/sqldb

github.com/kkdai/bstream
  via btcd, btcd/btcutil, btcwallet, and neutrino

github.com/coreos/go-systemd/v22
  via etcd modules and opencontainers/runc

Improvement vectors:

  • Gate or split kvdb/postgres/fixture.go so github.com/fergusstrange/embedded-postgres is not part of production kvdb_postgres builds.
  • Gate or split kvdb/etcd/embed.go so go.etcd.io/etcd/server/v3/... and go.etcd.io/etcd/raft/v3/... are not part of production kvdb_etcd builds.
  • Gate lnwire rapid/test generator files behind a dev/test build tag so pgregory.net/rapid, testing, and testify are not part of the production lnwire package.
  • Move DB helper code that imports testing or testify/require out of production files in channeldb, graph/db, payments/db, invoices, kvdb, and related packages.
  • Gate local mock files that import testify/mock behind a dev/test build tag.
  • Gate upstream mock files in btcd and neutrino, or consume updated upstream versions once released:
btcd/mempool/mocks.go
neutrino/headerfs/store_mock.go
neutrino/chainimport/{file_source,http_source,iter}_mock.go
  • Check whether kkdai/bstream can be removed from production binary graphs by addressing the btcd/btcutil/gcs path used through btcwallet/chain, btcd, and neutrino.
  • Check whether the coreos/go-systemd/v22/journal path through etcd logging can be avoided for production kvdb_etcd builds.
  • After each change, compare go list -deps for both ./cmd/lnd and ./cmd/lncli with the release-style tag set above and scan for testing, testify, rapid, lntest, Docker/OpenContainers/containerd, embedded Postgres, and embedded etcd server packages.

In this commit, we remove the github.com/miekg/dns direct
dependency by replacing the two SRV-query call sites in lnd
(tor.LookupSRV and discovery.DNSSeedBootstrapper.fallBackSRVLookup)
with a small in-tree package built on golang.org/x/net/dns/dnsmessage.
miekg/dns is a ~50k LoC general-purpose DNS toolkit that supports
DNSSEC, zone transfers, EDNS, dynamic updates, TSIG, and a long
list of other facilities none of which lnd touches; we used it
purely for "send one TCP-framed SRV query, parse SRV records out
of the answer section, check the RCode."

dnsmessage already lives in the indirect dep graph (we depend on
golang.org/x/net for many other things) so this swap promotes
x/net from indirect to direct and removes miekg/dns entirely. No
new module enters the graph.

The new package lives at tor/dnsclient inside the tor submodule
since both consumers reach DNS through tor's dialer (LookupSRV is
defined in package tor; bootstrapper.go in the main lnd module
imports tor for its Net abstraction). Hosting dnsclient there lets
tor.LookupSRV use it as an intra-module relative import and lets
discovery import it cross-module via the existing
github.com/lightningnetwork/lnd/tor dependency. A temporary
`replace github.com/lightningnetwork/lnd/tor => ./tor` lands in the
main go.mod alongside the existing kvdb/sqldb/queue replaces;
remove it once a new tor module version containing the dnsclient
package ships.

The package exposes one function:

	func QuerySRV(conn net.Conn, name string) (
		[]*net.SRV, dnsmessage.RCode, error)

plus an RCodeText helper that preserves the spelled-out RCode
strings lnd's bootstrap logs have historically used ("non-existent
domain" rather than dnsmessage's "NXDOMAIN" mnemonic). The TCP
framing follows RFC 1035 §4.2.2 (2-byte big-endian length prefix
+ message bytes). Transaction IDs are pulled from crypto/rand
rather than math/rand since the SOCKS-exit-to-DNS-server hop is
the only path where an off-circuit observer could spoof responses
and the ID is the only entropy preventing that.

We deliberately keep the package minimal: no UDP path, no EDNS,
no DNSSEC, no recursion, no caching, no AXFR/IXFR. Callers are
expected to dial the destination DNS server themselves (almost
always through a SOCKS proxy) and hand QuerySRV the resulting
net.Conn.

The test suite stands up a fake TCP DNS server on 127.0.0.1 that
hand-crafts response bytes, then exercises: a 2-record SRV
round-trip with arrival-order preservation, NXDOMAIN handling, the
non-SRV-record skip branch (CNAMEs mixed into the answer section
by some resolvers), name-validation rejection without any wire
I/O, and the RCodeText map + dnsmessage fallback. The discovery
package and full tor module tests pass unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

severity-critical Requires expert review - security/consensus critical

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants