Skip to content

feat: pluto create cluster#305

Open
varex83 wants to merge 2 commits intomainfrom
bohdan/cmd-create-cluster
Open

feat: pluto create cluster#305
varex83 wants to merge 2 commits intomainfrom
bohdan/cmd-create-cluster

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented Mar 31, 2026

Summary

Implements the initial pluto create cluster command and wires it into the CLI.

This adds local cluster artifact generation, including cluster definitions/locks, validator key material, deposit data, and node-specific setup, along with supporting validation and error handling for network, threshold, keymanager, and withdrawal address configuration.

Testing

To keep PR scope smaller, tests will come in a separate PR.

@iamquang95
Copy link
Copy Markdown
Collaborator

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @iamquang95's task in 3m 16s —— View job


Code Review: feat: pluto create cluster

  • Read changed files
  • Review create_cluster.rs
  • Review error.rs
  • Review create_dkg.rs and CLI wiring
  • Review cluster helpers changes
  • Post review feedback

Overall this is a solid port of the Go create cluster command. The structure is clean, error types are well-organized, and the happy path looks correct. A number of issues are worth addressing before merge, ranging from a memory leak to logic bugs.


Bugs / Correctness Issues

1. Double signature + hash verification when using --definition-file

load_definition already calls verify_signatures + verify_hashes (lines 1216-1217), but validate_definition repeats both (lines 1045-1047). Beyond wasted work, this means two RPC calls to the execution client for every definition-file run.

Fix this →


2. detect_node_dirs is skipped when using --definition-file

validate_create_config (line 1080) calls detect_node_dirs(args.nodes.unwrap_or(0)). When only --definition-file is supplied, args.nodes is None, so 0 is passed and no directories are checked. The real node count is only set at line 426 from def.operators.len() — after the config validation has already run. This means the "node directory already exists" pre-flight check is silently skipped for definition-file usage.


3. validate_threshold uses args.nodes.unwrap_or(0) as node count

If a user passes --threshold 5 --definition-file ./def.json (but no --nodes), number_of_nodes evaluates to 0, so threshold (5) > number_of_nodes (0) is true and the command fails with ThresholdTooHigh. The threshold check should either be deferred until after the definition is loaded, or be guarded when no node count is available.

Fix this →


4. Misleading error variant for u64 → usize cast failure

validate_addresses (line 1237) uses CreateClusterError::ValueExceedsU8 when num_validators doesn't fit in usize. The value being checked has nothing to do with u8; the error name and message will confuse users on 32-bit targets (the only realistic trigger). Consider a dedicated ValueExceedsUsize variant or using TryFrom with a custom message.


Memory Issues

5. Box::leak in From<TestnetConfig> for network::Network (create_cluster.rs:336–354)

Box::leak causes permanent memory allocation that cannot be freed. If this conversion is ever called more than once (e.g., in tests or future interactive usage), memory accumulates. Consider using Arc<str>, Cow<'static, str>, or changing network::Network to own String fields.

Fix this →


Code Quality / Design

6. Duplicate agg_sign function

create_cluster.rs:1349 defines a private fn agg_sign(share_sets, message) -> Result<Vec<u8>>.
cluster/helpers.rs:173 defines pub fn agg_sign(secrets, message) -> Result<Signature> with the identical algorithm.
The CLI version wraps the result in Vec<u8> but otherwise does the same thing. The cluster helper should be reused directly (calling .to_vec() on the result), eliminating the duplication.

Fix this →


7. Redundant double check in create_deposit_datas / sign_deposit_datas

create_deposit_datas checks secrets.len() != withdrawal_addresses.len() at line 808, then immediately calls sign_deposit_datas which repeats the identical check at line 831. The inner check is unreachable and should be removed.


8. Broken keymanager URL check — message says "https", check requires "http"

create_cluster.rs:1100:

if keymanager_url.scheme() != HTTP_SCHEME {
    return Err(CreateClusterError::InvalidKeymanagerUrlScheme { addr: addr.clone() });
}

The error message reads "Keymanager URL does not use https protocol" (error.rs:173), but the check enforces plain HTTP — HTTPS is rejected. The code comment at line 169 already acknowledges this as a possible Go-parity bug. This should be clarified and the error message fixed regardless. Fix this →


9. No rollback on partial keymanager import failure

write_keys_to_keymanager verifies all connections first (lines 748-756) and then imports per-node. If import succeeds for node 0 but fails for node 1, the cluster ends up in an inconsistent state (some nodes have keys, others don't) with no cleanup or error recovery. At minimum this should be documented as a known limitation, and ideally a warning should be emitted identifying which nodes were successfully imported.


10. Module coupling — validate_withdrawal_addrs belongs in create_dkg.rs

create_cluster.rs:52 imports validate_withdrawal_addrs from create_dkg. This creates an awkward coupling where create_cluster depends on create_dkg. Since both commands use the same validation, it should live in a shared location (e.g., a validation.rs module, or moved into create_cluster with create_dkg importing from there if ordering allows).


Minor / Nits

11. Stale "charon" branding in user-facing strings

Several strings still say "charon" where "pluto" is appropriate:

  • create_cluster.rs:1409: "Created charon cluster:""Created pluto cluster:"
  • create_cluster.rs:1421: "charon-enr-private-key\tCharon networking..." → use "pluto"
  • create_cluster.rs:1464–1468: split-keys warning refers to "charon cluster"
  • cli.rs:142–143: description strings reference "local charon cluster" and "charon p2p keys"

Fix this →


12. write_lock_to_api passes lock.clone() unnecessarily

publish_lock(lock.clone()) at line 615 clones the entire lock struct. If publish_lock accepts a &Lock or Lock by value, a borrow would be sufficient. Minor allocation concern.


Summary table

# Severity Area
1 Medium Double verify in definition-file path
2 Medium Node dir check skipped for definition-file
3 Medium validate_threshold wrong when --nodes absent
4 Low Misleading ValueExceedsU8 error variant
5 Medium Box::leak memory leak in From<TestnetConfig>
6 Low Duplicate agg_sign
7 Low Redundant double check
8 Medium HTTP/HTTPS check vs error message mismatch
9 Low No rollback on partial keymanager import
10 Low Module coupling
11 Low Stale "charon" branding
12 Trivial Unnecessary clone

@emlautarom1 emlautarom1 linked an issue Apr 3, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement charon create cluster

2 participants