feat(inprocess): in-process N-validator harness#3642
Conversation
RegisterLocalServices constructs the EVM HTTP/WS listeners in detached goroutines that panic on a bind failure. An in-process host running N apps in one process needs (a) the listener handles to Stop() at teardown and (b) a single node's bind failure to be a reportable error, not a process-wide panic that kills all N. Add evmHTTPServer/evmWSServer handles (EVMHTTPServer/EVMWebSocketServer getters), and SetEVMServeErr to redirect Start() failures to a buffered channel. With no channel set (production seid) behavior is unchanged: the listener still panics on bind failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stands up N sei-chain validators in one Go process reaching real CometBFT consensus, each serving its own RPC stack (Tendermint RPC + EVM JSON-RPC HTTP/WS + gRPC), with deterministic teardown. Gated behind the inprocess build tag so the heavy bring-up never enters a normal seid build. The load-bearing recipe (vs testutil/network): empty genesis valset (derive from InitChain), full P2P mesh, EVM enabled on per-node loopback ports, metrics off, raised conn-tracker ceiling for the loopback burst. Productionization: fresh per-run chain-id (no cross-run genesis collision), partial-startup cleanup, per-node EVM serve-error channel, idempotent Close. Handle methods mirror the SDK sei.NodeHandle signatures by name so a future adapter satisfies the interface structurally — without importing the SDK (its module graph + grpc replace conflict would break the seid build). Test: TestInProcessNetwork stands up N=4, asserts each node serves its RPC stack, and round-trips a tx (broadcast on node0, observed on node1). go test -tags inprocess -run TestInProcessNetwork -v -timeout 300s ./inprocess/ Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
PR SummaryMedium Risk Overview
The integration YAML runner gains an Reviewed by Cursor Bugbot for commit 4e143fc. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit a869e15. Configure here.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3642 +/- ##
==========================================
- Coverage 59.12% 58.15% -0.97%
==========================================
Files 2259 2176 -83
Lines 186489 176898 -9591
==========================================
- Hits 110255 102871 -7384
+ Misses 66353 64935 -1418
+ Partials 9881 9092 -789
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
The harness never starts a cosmos gRPC listener (servergrpc.StartGRPCServer is only on the seid start path), so enabling GRPC in app.toml and exposing Node.GRPC() advertised a port nothing binds. Remove the gRPC surface entirely: harness serves TM RPC + EVM (HTTP/WS) only. REST stays an honest "" parity stub. Also: move the harness-only app.App accessors (SetEVMServeErr, EVMHTTPServer, EVMWebSocketServer) behind //go:build inprocess in app/app_inprocess.go so production App's public surface stays unchanged; remove the dead wireMesh path (collectGentxs is authoritative for persistent-peers); correct serve-error wording to listener-start (construct -time bind is still fail-fast); state the metrics-off and 0.0.0.0-bind invariants as standing conditions; stripScheme via strings.CutPrefix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
Explicitly set GRPC.Enable/GRPCWeb.Enable=false so app.toml matches the "gRPC stays off" comment and can't collide on the fixed default port if the standard start path is ever wired. Scope doc.go recipe #5's bare "listeners" to consensus/RPC, and note on EVMRPC/EVMWS that the URL dials loopback while the listener binds 0.0.0.0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit a1ae06c. Configure here.
…unner execer (C2) Wire the integration_test/runner to drive a real bank query/tx suite against the C1 inprocess.Network — no docker. Runner seam: extract execCmd into an `execer` interface. The docker-exec arm stays the zero-value default (existing yaml_integration runs unaffected). A new build-tagged in-process arm (runner_inprocess.go, tag `inprocess`) runs each command on the host against a `seid` it builds once, redirected to a node via a PATH shim that prepends `--home "$SEID_HOME"` — so opaque sourced helpers that call bare `seid` land on the right node without rewriting the commands. Harness bridge: keyring moves into the node home (so host `seid --home` resolves it), each home gets a client.toml pinning test keyring + chain-id + that node's loopback RPC, and Options.ExtraKeys genesis-funds non-validator signing keys (admin on node 0) mirroring the docker localnode topology the suites sign as. bank_module/send_funds_test.yaml is GREEN in-memory (N=3, the min topology that leaves block-sync and forms consensus): a real admin->bank-test send plus historical balance queries at distinct heights, all four verifiers passing. go test -tags inprocess -run TestInProcessBankModule ./integration_test/runner/ Out of scope (process/binary boundary): upgrade + statesync suites. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 3a6be99. Configure here.
…geting + cleanup The N-floor was documented as a >2/3 voting-power quorum (N<3 stay in block-sync). That is wrong. The real constraint is CometBFT's block-sync handoff, verified against sei-tendermint and empirically: - N=1 produces blocks as solo proposer IF onlyValidatorIsUs fires — which needs state.Validators.Size()==1 at the blockSync decision. Recipe #1's empty genesis valset leaves size 0 there (decision precedes InitChain), so the solo node fell into block-sync and hung at height 1. Fixed by pinning the single validator into genesis for N=1. - N=2 deadlocks: each node has exactly 1 peer and BlockPool.IsCaughtUp requires >1. Start now rejects N=2 loudly instead of hanging. - N>=3 works (>=2 peers each). Bank suite stays at N=3. Corrected the false call-site comment + Options doc; added a doc.go recipe entry. Guard test now asserts N=2 rejected. Hardening: - F2: shim injects --node (client subcommands only; --node is not root-persistent, so keys/* would break) so RPC targeting is explicit, not client.toml-only. writeClientConfig returns its error (keyring-backend=test resolves only from it). Fixed the stale "injects same values defensively" comment. - F5: t.Cleanup removes the temp build dir holding seid.real + shim. - repoRoot surfaces the Getwd error instead of degrading to ".". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 8403b1e. Configure here.
waitEVMServing now selects on the node's serveErr channel alongside the poll tick, so a reported EVM listener-start failure short-circuits with the real error instead of polling eth_blockNumber until the ctx deadline and masking it as a generic timeout. Consumption is non-destructive: the received error is re-sent (non-blocking, slot just freed) so Node.ServeErr() still observes it after WaitReady returns. Production seid (nil channel -> panic in app.go) is untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
…leanup The harness's P2P mesh is derived implicitly: collectGentxs mutates each node's tmCfg.P2P.PersistentPeers in place. Correct the doc to describe that mechanism (it never set PersistentPeers itself) and add a post-collectGentxs guard that fails loudly for N>=2 if the wiring didn't land, turning a fragile silent dependency into a fast failure. Replace the rot-prone "recipe #N" taxonomy with self-describing named invariants referenced at point-of-use (empty-valset, gentx-derived peer mesh, EVM-enable injection, metrics-off constraint, loopback bind scope / 0.0.0.0 EVM caveat, loopback conn-tracker ceiling, validator-count rule). Also: nolint:gosec on the seid build exec (consistency with siblings); drop the F5 step-tag comment; probeInterval var -> const; document that ServeErr() must be read after WaitReady, not concurrently; add a test asserting metrics stay off. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 30459ac. Configure here.
The diversion (route a listener Start() failure to a channel instead of panicking) only softened a rare EVM port-bind collision. For a test harness a loud panic on a rare event is fine, and the diversion's production footprint is not worth it. Production app.go reverts to the original bare panic(err) serve goroutines; the sole retained production change is keeping the constructed EVM HTTP/WS handles so the harness can Stop() them at teardown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 4e143fc. Configure here.
Consolidate the package doc (lead with the validator-count rule, centralize the N=1 mechanism, distill the invariant prose) and strip work-item provenance from code comments — "productionizes the spike", "C2 end-to-end proof", "proven live by", "the point of this demo". Collapse the N-count re-derivation in the runner test to a cross-reference; the canonical statement lives in the package doc. No constraint dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

in-process N-validator harness
Stands up N sei-chain validators in a single Go process, reaching real CometBFT consensus and each serving its own RPC stack (Tendermint RPC + EVM JSON-RPC HTTP/WS), with deterministic teardown. The in-process provisioning foundation for the SDK "local" provider.
Gated behind the
inprocessbuild tag — the heavy sei-tendermint/sei-cosmos bring-up never enters a normalseidbuild (verified:go build ./cmd/seidis unaffected). The harness-onlyapp.Appaccessors live inapp/app_inprocess.gobehind the same tag, so productionapp.App's public surface does not widen.What's here
inprocess/package:Start(ctx, Options) (*Network, error), per-nodeNodehandles,WaitReady, idempotentClose.app/app.go: EVM listenerStop()handles + a redirectable serve-error channel;app/app_inprocess.go(build-tagged): theSetEVMServeErr/EVMHTTPServer/EVMWebSocketServeraccessors. Production seid behavior is unchanged when no channel is set (still panics on a listener-start failure).Served surface
TM RPC + EVM JSON-RPC HTTP/WS. No gRPC: the harness never calls
servergrpc.StartGRPCServer, so the cosmos gRPC server stays off (enabling it would advertise a port nothing binds). REST is an honest""parity stub (part of the SDK handle shape; not started by the harness).The load-bearing recipe (vs
testutil/network)genDoc.Validators = nil— derive the valset fromInitChain.testutil/networkpins[]{self}, which fails consensus replay for N>1.nodeID@127.0.0.1:p2pPortpersistent-peers across all N (wired via the gentx memos incollectGentxs) — without the mesh nodes never gossip and consensus never forms for N>1.TestAppOptshard-disables the listeners and no node serves EVM.Instrumentation.Prometheus = false— metrics off avoids the dup-registry panic from the process-wide registries. Invariant: metrics must stay off until the evmrpc/EVM-keeper metrics are de-globalized — re-enabling Prometheus without that reintroduces the panic.MaxIncomingConnectionAttemptsraised for the loopback conn-tracker burst — without the raise the burst trips the per-IP cap and peers are rejected.Productionization beyond the spike
NewEVM*Server) is still synchronous fail-fast — it panics and kills all N.sei.NodeHandle/NetworkHandlesignatures by name (Name,EVMRPC,TendermintRPC,REST,WaitReady(ctx),Object) so a future thin adapter satisfies the interface structurally — without importing the SDK (its module graph + toolchain skew would break the seid build).Test
TestInProcessNetworkstands up N=4, asserts each node serves TM RPC + EVM, and round-trips a tx (broadcast on node0, observed on node1's independent RPC). PlusTestStartRejectsZeroValidatorsandTestFreshChainIDPerRun.All three pass; full suite ~17s.
For reviewers
Closedeliberately does NOT close it: thesync.Oncenever re-fires, so a secondStartin the same process would inherit a closed pool. De-globalizing it inevmrpcis the proper fix if repeated Start/Close in one process is needed. Today's tests run one network per process.Closed before its EVM start-signal fires, its 2 serve goroutines park (blocked on the start-signal receive) until process exit. Bounded undergo test; un-defer if the harness is embedded in a long-lived process.readiness.go) duplicate the SDK'sWaitHeightAdvances/WaitEVMServingstdlib-only, marked for a mechanical swap once the SDK toolchain skew is resolved.Draft, no reviewers — full Coral review-gate (idiomatic + sei-network + systems) + Bugbot before a human reviewer is added.
🤖 Generated with Claude Code