Lvol migration fresh#1098
Draft
EbiRider wants to merge 243 commits into
Draft
Conversation
…hots on secondary before the lvol exist
…fore the migration
…his should be reverted beofre merging to main
…vol final migration function
* snapshot: fix delete race that produced stuck snapshots Three independent fixes that together close the "Cannot remove snapshot because it is open" / EBUSY (-16) state where the snapshot ends up with non-zero open_ref but no clone entries and can only be cleared by restarting the host node. 1. Bump random VUID space from 10k to 1M and dedupe against existing CLN_/LVOL_/SNAP_ bdev-name numeric suffixes. With ~10k lvols+snaps the legacy 10k range hit ~50% birthday-collision probability, producing repeated SPDK "lvol with name already exists" rejections that triggered the async-delete-then-reuse sequence below. 2. snapshot_controller.add and .clone reject ops on a target that is in pending deletion (lvol STATUS_IN_DELETION; snapshot STATUS_IN_DELETION or deleted=True). Closes the window between an async delete being issued and a fresh create slipping through against the same blob, which left snapshot parent metadata partially overwritten by the new clone's lineage. 3. snapshot_controller.delete blocks the snapshot's hard-delete while any clone's SPDK-side delete is still in flight. Previously any IN_DELETION clone was treated as "already gone" and the snap delete proceeded to call SPDK, which returned EBUSY because the clone's bdev was still open. Now a clone counts as gone only when its deletion_status field has been set (i.e. the leader's delete_lvol_from_node returned). Otherwise the snapshot is soft-deleted; the clone's own delete-completion path will re-trigger the hard delete once SPDK has actually released it. Tests: tests/test_snapshot_delete_race.py covers all three fixes (10 tests, all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: defer snapshot deletion if clones are in deletion state * fix: update snapshot deletion status handling during clone deletion * fix: remove unnecessary checks for deletion status in snapshot handling * fix: update Docker image tag for snapshot delete race fix * fix: reduce randomness range for snapshot ID generation to improve performance --------- Co-authored-by: schmidt-scaled <schmidt@scaled.cloud> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s, verifying transfer of snapshots
…ere migrated lvol couldn't be deleted
…ation logic - Updated `tasks_runner_lvol_migration` to handle cases where no new snapshots are migrated but preexisting snapshots exist on the target node. - Enhanced logic to determine the correct composite name for the last snapshot on the target. - Added `subsys_port` assignment to lvol objects in `migration_controller`.
- Added support for recognizing bdev aliases in `node_bdev_names` mapping. - Improved `check_bdev` logic to validate `top_bdev` if the initial check fails.
… to be more straightforward
…igration and connect lvol
8417063 to
2061357
Compare
…nces during cleanup - Introduced `special_delete` logic to manage snapshots with open references. - Enhanced deletion process to handle remaining snapshot instances and synchronize cleanup across nodes. - Updated `rpc_client` to support `special_delete` flag in `delete_lvol` requests.
…ttps://github.com/simplyblock/sbcli into lvol-migration-fresh
- Added logic to create a new snapshot if `snap_plan` is empty during migration. - Integrated snapshot creation via `snapshot_controller` with proper error handling.
- Introduced `pypass_lvol_migration_check` flag in `snapshot_controller` to allow snapshot creation during active migrations. - Updated `tasks_runner_lvol_migration` and `migration_controller` to utilize the bypass flag for intermediate and migration-related snapshots.
…flow now create intermediate snapshots if chain is initally empty
| _src_node = db.get_storage_node_by_id(migration.source_node_id) | ||
| if _src_node.secondary_node_id: | ||
| src_node_ids.add(_src_node.secondary_node_id) | ||
| except KeyError: |
| try: | ||
| sec_rpc.listeners_del(nqn, nic.trtype.lower(), | ||
| nic.ip4_address, sec_port) | ||
| except Exception: |
| try: | ||
| tgt_rpc.listeners_del(nqn, nic.trtype.lower(), | ||
| nic.ip4_address, tgt_port) | ||
| except Exception: |
| if tgt_sec_node is None: | ||
| try: | ||
| tgt_sec_node = db.get_storage_node_by_id(tgt_node.secondary_node_id) | ||
| except KeyError: |
| if tgt_ter_node is None: | ||
| try: | ||
| tgt_ter_node = db.get_storage_node_by_id(tgt_node.tertiary_node_id) | ||
| except KeyError: |
| snap_bdev_info = leader_node.rpc_client().get_bdev(snap.snap_bdev) | ||
| if snap_bdev_info[0]["driver_specific"]["lvol"]["open_ref"] > 1: | ||
| special_delete = True | ||
| except Exception: |
| snap_bdev_info = leader_node.rpc_client().get_bdev(snap.snap_bdev) | ||
| if snap_bdev_info[0]["driver_specific"]["lvol"]["open_ref"] > 1: | ||
| special_delete = True | ||
| except Exception: |
| snap_bdev_info = rpc_client.rpc_client().get_bdev(snap.snap_bdev) | ||
| if snap_bdev_info[0]["driver_specific"]["lvol"]["open_ref"] > 1: | ||
| special_delete = True | ||
| except Exception: |
* fix: refactor migration hub logic and streamline `transfer_hublvol` creation - Centralized migration hub logic by introducing `transfer_hublvol` creation in `storage_node`. - Updated `tasks_runner_lvol_migration` to leverage `transfer_hublvol` for unified hub management. - Refactored `rpc_client` methods to simplify hublvol creation and deletion. * fix: refactor migration hub logic and streamline `transfer_hublvol` creation - Centralized migration hub logic by introducing `transfer_hublvol` creation in `storage_node`. - Updated `tasks_runner_lvol_migration` to leverage `transfer_hublvol` for unified hub management. - Refactored `rpc_client` methods to simplify hublvol creation and deletion. * fix: update `transfer_hublvol` checks and remove redundant task claim logic - Refined `transfer_hublvol` checks to validate `bdev_name` instead of `uuid`. - Removed commented-out `claim_task` logic in `tasks_runner_restart`. * fix: update `get_bdev` calls to `get_bdevs` for accurate bdev info retrieval - Replaced `get_bdev` with `get_bdevs` in `snapshot_monitor` and `snapshot_controller` to ensure proper invocation. - Maintained `special_delete` logic to handle snapshots with open references consistently. * prepare for merge * Fix linter issues
…/sbcli into lvol-migration-fresh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this aims to add the the migration feature to the sbcli and web api
this includes a few new calls
migration list
migrate
pre-create-migration
migrate-cancel