Skip to content

txmgr: enable spamoor TxBatcher in NewWalletPoolByPrivkey#171

Merged
pk910 merged 1 commit into
masterfrom
research/walletpool-init-batcher
May 11, 2026
Merged

txmgr: enable spamoor TxBatcher in NewWalletPoolByPrivkey#171
pk910 merged 1 commit into
masterfrom
research/walletpool-init-batcher

Conversation

@pk910
Copy link
Copy Markdown
Member

@pk910 pk910 commented May 11, 2026

Summary

  • NewWalletPoolByPrivkey creates the RootWallet but never calls InitTxBatcher, so spamoor's processFundingRequests falls back to one tx per recipient with sequential nonces.
  • Funding 100 child wallets (the eoa-transactions-test default) therefore emits 100 sequential-nonce txs from the root wallet.
  • Recent EL txpool changes — notably Erigon's MaxNonceGap=64 zombie-eviction landed in release/3.4 via erigontech/erigon#19449 (commit 93e5c77719) — reject ~33 of those txs as nonce-gapped zombies when they arrive out-of-order on parallel TCP connections (INTERNAL_ERROR: nonce gap too large).
  • One-line fix: rootWallet.InitTxBatcher(ctx, s.txpool) — matches the pattern in both spamoor entry points (cmd/spamoor/main.go:223, cmd/spamoor-daemon/main.go:258). All 100 fundings now ride in a single outer batcher-contract tx, so the root wallet only uses 1–2 nonces total.

Why this manifested now in ethpandaops/assertoor-test

The CI Run scheduled test workflow has been failing daily since 2026-04-29 on Erigon-paired matrix entries (latest-releases-1, -7, -9). The post-failure Claude analysis in those runs flagged Erigon's txpool log [txpool] stat pending=0 baseFee=0 queued=44 and eth_sendRawTransaction err=\"INTERNAL_ERROR: nonce gap too large\" as the immediate signal — but the upstream reason that signal fires is the missing InitTxBatcher call here.

Note the 5–10 tx/slot the test phase emits is well below MaxNonceGap=64. The 64-gap is hit only during the funding burst, and only because the batcher path was disabled.

Reproduction (Erigon v3.4.1, shuffled-order burst)

```
$ python3 repro_nonce_gap.py --rpc http://127.0.0.1:32894 --count 100 --shuffle
Funder addr 0x8943545177806ED17B9F23F0a21ee5948eCaa776
On-chain nonce: 100
Submitting 100 txs to http://127.0.0.1:32894 in parallel
Submitted in 0.39s
Accepted: 67 Rejected: 33

Rejection reasons:
33 {'code': -32000, 'message': "INTERNAL_ERROR: nonce gap too large..."}
```
Matches the production failure logs exactly.

End-to-end validation

2-pair Kurtosis devnet (reth+lighthouse, erigon+lighthouse), running the same eoa-transactions-test playbook the CI uses, with the patched image:

Failing CI run (2026-05-11 / latest-releases-7) Local devnet w/ patch
eoa-transactions-test outcome failure (timeout after 67m) success (all 35 sub-tasks)
nonce gap too large rejections 75+ in one funding burst 0
Erigon-built blocks with txs > 0 2 / 85 8 / 10
Final [txpool] stat pending=0 queued=44 pending=0 queued=0

Both Erigon-specific sub-tests now pass:

  • Wait for block proposal with >= 5 transactions from 2-erigon-lighthouse
  • Check if legacy EOA transactions can be sent via 2-erigon-lighthouse
  • Check if dynfee EOA transactions can be sent via 2-erigon-lighthouse

Test plan

  • go build ./... clean
  • 2-pair Kurtosis devnet (reth+lighthouse, erigon+lighthouse): eoa-transactions-test passes end-to-end (35/35 sub-tasks).
  • Erigon log shows zero nonce gap too large rejections over the full run.
  • CI run against ethpandaops/assertoor-test's latest-releases-* matrix once the assertoor image is rebuilt.

Follow-ups (not blocking, separate issues)

  • Erigon's MaxNonceGap eviction in the addTxns (RPC-submit) path is over-aggressive even outside this test pattern: a legitimate burst of out-of-order parallel consecutive-nonce submissions will still trip it. The eviction makes sense on real on-chain state changes (addTxnsOnNewBlock) but should be deferred while a burst is still arriving. Worth a separate upstream PR.
  • Lighthouse / Prysm don't yet recognise the execution_payload_available beacon-event topic — assertoor's subscription returns BAD_REQUEST: unable to parse query every 10s. Non-blocking warning, but noisy.

The recent eoa-transactions-test failures on Erigon-paired runs in
ethpandaops/assertoor-test (latest-releases-1, -7, -9 since 2026-04-29)
trace back to here: NewWalletPoolByPrivkey creates the RootWallet but
never calls InitTxBatcher, so processFundingRequests in spamoor's
walletpool.go takes the fall-back path of one EOA tx per recipient with
sequential nonces.

When funding 100 child wallets (the eoa-transactions-test default),
that produces 100 sequential-nonce txs from the root wallet that arrive
at Erigon's HTTP server in random order on parallel TCP connections.
Erigon v3.4.1 (PR #19449, commit 93e5c77719) immediately evicts any
queued tx with nonce gap > MaxNonceGap (default 64) from the on-chain
nonce, rejecting ~33 of them with "INTERNAL_ERROR: nonce gap too large".
Funding stalls, child wallets stay at 0 ETH, the 30-min foreground
task times out.

The batcher contract path bundles all funding recipients into one
outer tx — funding 100 wallets uses 1-2 root-wallet nonces total,
well below any per-account threshold on any EL.

Both spamoor entry points (cmd/spamoor/main.go:223,
cmd/spamoor-daemon/main.go:258) already call InitTxBatcher right after
InitRootWallet. This makes assertoor consistent with that pattern.
@pk910 pk910 merged commit a348243 into master May 11, 2026
6 checks passed
@pk910 pk910 deleted the research/walletpool-init-batcher branch May 11, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants