Skip to content

<fix>[xinfini]: fail blacklist to prevent split-brain on VM start#3888

Open
ZStack-Robot wants to merge 1 commit into5.5.1from
sync/yaohua.wu/bugfix/ZSTAC-84963
Open

<fix>[xinfini]: fail blacklist to prevent split-brain on VM start#3888
ZStack-Robot wants to merge 1 commit into5.5.1from
sync/yaohua.wu/bugfix/ZSTAC-84963

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

ZSTAC-84963

Root Cause

XInfiniStorageController.blacklist// todo 空操作,直接 comp.success()ExternalPrimaryStorageKvmFactory.beforeStartVmOnKvm 在目标 host 启动 VM 前会查询存储侧活跃 client,若发现旧 host 仍持有 volume,会先 nodeSvc.deactivate(...),失败时 fallback 到 nodeSvc.blacklist(...) 兜底。xinfini 的 blacklist 假装成功,导致旧 host 仍可写入 volume,新 host 同时启动,发生脑裂。

注意:beforeStartVmOnKvm 调用 blacklist 时使用 NopeCompletion(忽略 comp.fail()),所以仅靠 comp.fail() 不能阻断 VM 启动,必须抛异常。

Changes

Module Change
plugin/xinfini XInfiniStorageController.blacklist 改为抛 OperationFailureException,明确拒绝并阻断 VM 启动

Test

  • N/A(代码审查发现,行为变更:xinfini 上 deactivate 失败的卷不再静默放行 VM 启动,而是 fail-fast 阻断)

Behavior

  • 修复前:xinfini 上 deactivate 失败 → blacklist 假装成功 → VM 启动 → 脑裂风险
  • 修复后:xinfini 上 deactivate 失败 → blacklist 抛异常 → VM 启动被阻断(保守策略,待 xinfini 后续支持 volume 路径隔离能力后再恢复 fallback)

Related: ZSTAC-84963

sync from gitlab !9768

XInfiniStorageController.blacklist was a no-op (// todo,
comp.success()). When ExternalPrimaryStorageKvmFactory's
beforeStartVmOnKvm fell back to blacklist after a deactivate
failure, the no-op silently allowed the VM to start while a
stale client on another host still held write access, risking
a split-brain on the volume.

1. Why?
xinfini does not implement volume path isolation yet. The
blacklist no-op masked the missing capability and let the
caller proceed as if the stale client had been fenced.

2. How?
Throw OperationFailureException from blacklist with a clear
message. Because beforeStartVmOnKvm invokes blacklist with
NopeCompletion (which ignores fail()), only an exception can
abort the VM start. This makes the unsupported case explicit
and conservative until xinfini supports path isolation.

3. Side effects?
VM start is aborted when an old active client cannot be
deactivated on xinfini. This is the safe behavior; previously
such VMs would start with split-brain risk.

# Summary of changes (by module):
- plugin/xinfini: throw OperationFailureException in blacklist
  instead of silently returning success.

Related: ZSTAC-84963
Change-Id: Ib0092f39a7b23cf7b06442ac9616cb2ce39240b7
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 5ba0aea8-8779-468a-90ee-23f119443d71

📥 Commits

Reviewing files that changed from the base of the PR and between da511c3 and 024c445.

📒 Files selected for processing (1)
  • plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java

Walkthrough

blacklist() 方法现已主动拒绝操作而非报告成功。该改动将无操作的 comp.success() 调用替换为立即抛出 OperationFailureException 异常,包含详细信息(主机UUID/IP、卷路径和协议),表示不支持卷路径隔离。

Changes

Cohort / File(s) Summary
XInfini 存储控制器
plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageController.java
blacklist() 方法逻辑修改:将成功响应改为抛出包含主机UUID/IP、卷路径和协议信息的 OperationFailureException,明确指出不支持卷路径隔离功能。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 一个小小的改动降临,
黑名单不再假装成功,
异常如实而起,
隔离梦想被坦诚拒绝——
诚实的代码,最闪耀!✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题遵循指定的[scope]: 格式,长度65字符,完整准确地描述了主要变更内容(修复blacklist以防止VM启动时的脑裂)。
Description check ✅ Passed 描述详细阐述了根本原因、具体变更、测试情况和行为变更,与代码变更高度相关且信息充分。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/yaohua.wu/bugfix/ZSTAC-84963

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants