Single-node cluster expansion (FTT1/FTT2) by schmidt-scaled · Pull Request #1097 · simplyblock/sbcli

schmidt-scaled · 2026-06-12T12:59:26Z

Summary

Adds single-node cluster expansion: integrate a newcomer storage node and rotate sec/tert LVS roles across the cluster, reusing the existing restart/activate primitives.

Feature

--expansion CLI flag + donor-side helpers; orchestrator triggers expansion migration after integrate_new_node_into_cluster.
Planner derives role moves from host topology (compute_role_diff_topology, _host_rotation_layout, _validate_topology); sec/tert slot naming aligned to lvstore_stack_secondary/tertiary + tertiary_node_id.
Executor (SpdkMoveExecutor) drives create-primary / create-sec/-tert / re-home moves; donor cleanup via teardown_non_leader_lvstore.
Offline protocol-simulation harness: expands the planner's move list into the per-move RPC sequence and asserts structural invariants, emitting reviewable RPC-by-RPC traces — no FoundationDB/executor required.

Shared-path integration & hardening

Expansion reuses recreate_lvstore_on_non_leader / create_lvstore (called with activation_mode=True), so a few changes land on the shared restart/activate path:

recreate_lvstore_on_non_leader: per-LVS lvstore_ports rebuild + early connect_to_hublvol (the only hublvol connect that runs under activation_mode).
create_lvstore: always create the primary hublvol (newcomer gains secondaries post-creation via rotation).
Hardening: None-guard the hublvol.nvmf_port derefs in the port rebuild (legacy/single-node nodes with no hublvol previously raised AttributeError — soft-failing secondary recreation on restart, hard-failing cluster activate). Removed a dead duplicate elif STATUS_DOWN branch in the monitor.

Known residual risk (reviewers please note)

lvstore_ports source-of-truth changed (rebuilt from node's own ports + back-refs vs copied from primary). Expected equivalent for single-secondary restart; worth a restart/activate smoke test.
connect_to_hublvol fires unconditionally during activation_mode; harmless (try/except) during activate but premature — clean fix would be a dedicated param to separate activate-defer vs expansion-connect-now.

Test

Local pytest suite running; CI runs the full suite + lint + cli-gen check against this PR.

🤖 Generated with Claude Code

        argument = subcommand.add_argument('--spdk-sys-mem', help='System memory reserved for non-SPDK use (e.g. 2G, 4096M). Overrides the auto-calculated minimum_sys_memory. If not set, the value is derived from the node configuration.', type=str, dest='spdk_sys_mem')
        if self.developer_mode:
            argument = subcommand.add_argument('--spdk-proxy-image', help='The SPDK proxy image URI.', type=str, dest='spdk_proxy_image')
+        argument = subcommand.add_argument('--expansion', help='After adding the node, integrate it into the cluster by re-homing existing secondary/tertiary roles. Use this to add a single node to an FTT2 cluster instead of staging FTT+1 nodes at once.', default=False, dest='expansion', action='store_true')


Hamdy-khader

Looks good, approved.

mxsrc

I couldn't do a thorough review yet, but superficially there are a few things that should be changed before merging this. In addition to the comments on specific files the tests that are introduced do not succeed and their execution takes >2h, making them prohibitively expensive for any reasonable checks during development. The long execution time may be a result of the failure, but if it's not, we should introduce some sort of multi-step testing to make test driven development feasible, relegating long running tests to the last step. The failing unittests are probably a false-positive, I assume the general feature works as intended, still, this should be addressed before integration.

The actual logic for the feature seems to not use the established task mechanism that other asynchronous tasks use, like storage node addition. Is that a deliberate choice? Generally, I think the code should be placed in a shared module, e.g. simplyblock_core/controllers/cluster_expansion/... instead of at the top-level of simplyblock_core. I'll do a full review of the logic tomorrow, right now this is just a high-level comment.

mxsrc · 2026-06-13T17:43:57Z

+SIMPLY_BLOCK_DOCKER_IMAGE=public.ecr.aws/simply-block/simplyblock:feature-single-node-expansion
+SIMPLY_BLOCK_SPDK_ULTRA_IMAGE=public.ecr.aws/simply-block/ultra:main-latest


This should probably not be part of this PR.

mxsrc · 2026-06-13T17:45:47Z

@@ -0,0 +1,177 @@
+# Single-node cluster expansion (FTT2) — implementation plan


This file should not be added to the repo. I think it may make sense to document the changes, but that should be placed in docs/, and not take the form of a planning document.

mxsrc · 2026-06-13T17:46:10Z

@@ -0,0 +1,163 @@
+# Step 6 design — final


This should not be introduced to the repository.

mxsrc · 2026-06-13T17:47:57Z

+            "trsvcid": str(trsvcid),
+            "trtype": trtype,
+        }
+        return self._request2("bdev_nvme_remove_trid", params)


New functions should utilize the _request3 helper which uses exception-based error handling.

mxsrc · 2026-06-14T14:53:49Z

+class ExpandPlanError(Exception):
+    """Raised by ``execute_expand_plan`` when a fatal precondition is not
+    met (e.g., trying to start a fresh plan while one is already in
+    progress, or resuming an incompatible-schema state). Distinct from the
+    underlying executor exception that triggers an abort, which is
+    re-raised as-is after the abort is persisted."""


There is a generic PreconditionError in simplyblock_core/exceptions that fits this purpoes.

…nsion - CLI: --expansion on `storage-node add`, integrates newcomer via cluster_expand_executor - Cluster model: expand_state cursor for resumability - RPC: bdev_nvme_remove_trid for surgical path removal - storage_node_ops: teardown_non_leader_lvstore, reattach_sibling_failover Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…o_cluster Move the FN_NEW_DEV_MIG queueing out of add_node's tail and into the post-integration step in clibase. The OLD path queues per-device migration inside add_node, which (a) runs before the newcomer has any lvstore_stack and (b) relies on cluster status flipping back to ACTIVE later for the runner to pick it up. With --expansion, queue post-integration so tasks are scheduled against the post-rotation lvstore_stack and an ACTIVE cluster. - add_node: new expansion=False kwarg; skips the implicit migration loop when set (caller's responsibility). - clibase: pass expansion=expansion through; after integrate_new_node_into_cluster succeeds, iterate the newcomer's online nvme_devices and call add_new_device_mig_task per device. Scope matches the OLD trigger (newcomer's devs only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: rename secondary node attributes and update lvstore handling for tertiary nodes * fix: streamline hublvol creation and error handling in cluster expansion

…xpansion

Planner: add compute_role_diff_topology / _host_rotation_layout / _validate_topology to derive role moves from host topology; align sec/tert slot naming (lvstore_stack_secondary/tertiary, tertiary_node_id) across executor, planner, and teardown. Tests/tooling: add an offline protocol generator (protocol_generator.py, extract_rpc_timing.py) that expands the planner's move list into the per-move RPC sequence the executor would issue, plus a pytest harness (test_expansion_protocols.py) asserting structural invariants and emitting reviewable RPC-by-RPC protocol traces (protocols/*.txt). No FoundationDB/executor required. Expand planner/executor unit coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

recreate_lvstore_on_non_leader: None-guard the hublvol.nvmf_port derefs when building lvstore_ports. A node with an lvstore but no hublvol (legacy/upgrade or single-node clusters) previously raised AttributeError here — soft-failing secondary/tertiary recreation on node restart and hard-failing cluster activate (cluster_ops only catches LVSRestartRequiredError). Falls back to 0, matching get_hublvol_port()'s sentinel. storage_node_monitor.check_node: drop the dead `elif STATUS_DOWN` branch. The preceding `if status in (UNREACHABLE, DOWN)` already handles DOWN, so the elif was unreachable duplicate (rebase artifact: main and the feature branch independently implemented DOWN->ONLINE recovery). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The --expansion flag was added to the generated cli.py and clibase.py logic but never to cli-reference.yaml (the generator's source of truth), so `generate.sh` stripped it and the cli-up-to-date check failed. Add the flag to the YAML and regenerate cli.py (now emits default=False). Also clear ruff F401/F841: drop unused imports across the expansion test suite and the now-unused RPCClient import in storage_node_ops (teardown switched to donor_node.rpc_client()); remove an unused local in the planner's host-rotation layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…restart Two regressions surfaced by CI on the shared restart/activate path: 1. recreate_lvstore_on_non_leader connected the hublvol unconditionally near the top AND again in the port-blocked retry block (the latter under `if not activation_mode`). On a normal restart both fired, so the hublvol was attached twice (test_lvs_role_assignment / test_secondary_promotion: connect_to_hublvol called 2x, expected 1x). Gate the early connect under `if activation_mode` so expansion/activate (which skip the retry block) still connect once, while restart connects only in the retry block — restoring main's single-connect behavior. 2. Restore the RPCClient import in storage_node_ops (kept as a test patch target; flagged unused by ruff after teardown switched to donor_node.rpc_client()). Its removal broke 6 teardown tests that patch simplyblock_core.storage_node_ops.RPCClient. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The executor calls storage_node_ops.recreate_lvstore_on_non_leader, but test_spdk_move_executor patched the unrelated recreate_lvstore_on_sec, so the real (DB-touching) function ran and failed on the unmocked FDB. Repoint the 8 patches and update the two assertions to the real call signature (node, primary, primary, activation_mode=True). test_nvmeof_security: the lvstore_ports rebuild now writes the restarting node, so patch sec_node.write_to_db too (previously only primary_node was patched), avoiding the FDB-write SystemExit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rtifacts) Address review comments on the cluster-expansion PR: - Replace custom ExpandPlanError with the generic PreconditionError from simplyblock_core.exceptions (now available after rebasing onto main). - bdev_nvme_remove_trid: use the exception-based _request3 helper instead of _request2. Caller already wraps the call in try/except and ignores the return value, so behavior is unchanged. - Drop the per-branch dev image tag change to simplyblock_core/env_var (restore main's values) — a dev artifact that shouldn't ship in the PR. - Remove the planning documents single_node_expansion_plan.md and step6_design.md from the repo root. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Address review feedback to use the established task mechanism instead of running the multi-minute role-rebalance inline in the CLI handler. - Add FN_CLUSTER_EXPAND task type + add_cluster_expand_task() wrapper and get_active_cluster_expand_task() dedup (one expansion per cluster). - New services/tasks_runner_cluster_expand.py drives integrate_new_node_into_cluster() in the background, mapping orchestrator outcomes to task done/suspended. Retry-by-resume: an aborted plan is re-armed (planner.expand_state_rearm) so the orchestrator resumes from the persisted cursor rather than recomputing a diff against a half-mutated topology. - clibase --expansion now queues the task and returns immediately; the post-integration new-device-migration trigger moves into the runner's success path (must run after the rotation lands). - Register TasksRunnerClusterExpand in docker-compose-swarm.yml and the collect_logs service lists. Testing: - Fast unit tier (no FDB/SPDK): test_tasks_runner_cluster_expand.py mocks integrate + DB to verify the task lifecycle in milliseconds; expand_state_rearm covered in test_cluster_expand_planner.py. - Sleep-injection seam: an autouse fast_sleep fixture in tests/expansion_sim/conftest.py no-ops time.sleep so the real-FDB sim suite is a logic check (ms) instead of the prior multi-hour wall-clock. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The repo-wide tests/conftest.py installs a minimal `fdb` stub (no api_version) so the unit suite runs without FoundationDB. Because db_controller does a top-level `import fdb`, that stub shadowed the real client and the expansion_sim suite could never connect — the reason the introduced FDB-backed tests failed regardless of a running FDB. _ensure_real_fdb() pops the stub, re-imports the genuine client, and rebinds it on already-imported simplyblock modules. _require_fdb() checks for the cluster file first (so CI without FDB still skips cleanly) and skips with a clear message if the real client is unavailable. The suite must run as its own pytest invocation, separate from the stubbed unit tier — the intended multi-step layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-code-quality Bot found potential problems Jun 12, 2026

View reviewed changes

schmidt-scaled requested review from Hamdy-khader and mxsrc June 12, 2026 13:13

github-code-quality Bot found potential problems Jun 12, 2026

View reviewed changes

Comment thread simplyblock_core/storage_node_ops.py Dismissed

Hamdy-khader approved these changes Jun 13, 2026

View reviewed changes

mxsrc requested changes Jun 13, 2026

View reviewed changes

mxsrc reviewed Jun 14, 2026

View reviewed changes

michixs and others added 12 commits June 15, 2026 22:03

fix: update Docker image for single node expansion feature

2f5657b

Feature single node expansion hmdy (#1023)

06b9cc9

* fix: rename secondary node attributes and update lvstore handling for tertiary nodes * fix: streamline hublvol creation and error handling in cluster expansion

fix: handle empty lvstore case in storage node operations

9eb0b10

fix: add error handling for reattaching sibling failover in cluster e…

3247dbb

…xpansion

schmidt-scaled force-pushed the feature-single-node-expansion branch from 1c7dda9 to da41ba7 Compare June 15, 2026 20:15

schmidt-scaled and others added 2 commits June 15, 2026 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-node cluster expansion (FTT1/FTT2)#1097

Single-node cluster expansion (FTT1/FTT2)#1097
schmidt-scaled wants to merge 14 commits into
mainfrom
feature-single-node-expansion

schmidt-scaled commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hamdy-khader left a comment

Uh oh!

mxsrc left a comment

Uh oh!

mxsrc Jun 13, 2026

Uh oh!

mxsrc Jun 13, 2026

Uh oh!

mxsrc Jun 13, 2026

Uh oh!

mxsrc Jun 13, 2026

Uh oh!

mxsrc Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		SIMPLY_BLOCK_DOCKER_IMAGE=public.ecr.aws/simply-block/simplyblock:feature-single-node-expansion
		SIMPLY_BLOCK_SPDK_ULTRA_IMAGE=public.ecr.aws/simply-block/ultra:main-latest

		@@ -0,0 +1,177 @@
		# Single-node cluster expansion (FTT2) — implementation plan

Conversation

schmidt-scaled commented Jun 12, 2026

Summary

Feature

Shared-path integration & hardening

Known residual risk (reviewers please note)

Test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hamdy-khader left a comment

Choose a reason for hiding this comment

Uh oh!

mxsrc left a comment

Choose a reason for hiding this comment

Uh oh!

mxsrc Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

mxsrc Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

mxsrc Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

mxsrc Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

mxsrc Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants