Skip to content

ZSTAC-81797: iSCSI fencer use platform compact hostId instead of storage bdc id#3423

Open
zstack-robot-2 wants to merge 1 commit into5.5.12from
sync/zstackio/fix/ZSTAC-81797@@2
Open

ZSTAC-81797: iSCSI fencer use platform compact hostId instead of storage bdc id#3423
zstack-robot-2 wants to merge 1 commit into5.5.12from
sync/zstackio/fix/ZSTAC-81797@@2

Conversation

@zstack-robot-2
Copy link
Collaborator

问题

iSCSI fencer 的 KvmIscsiNodeServer 使用 HeartbeatVolumeTO.hostId(存储侧 bdc id,递增可达 2000+)作为心跳写入偏移量的乘数。

offset = host_id * heartbeat_required_space (1MB)

当 host_id=2000 时,offset=2000MB,超过 2GB 心跳卷容量,导致心跳写入失败,物理机反复重联后失联。

根因对比

代码路径 hostId 来源 值范围 状态
CBD fencer (KvmCbdNodeServer) ExternalPrimaryStorageHostRefVO.hostId [1, 999] ✅ 正确
CBD check (CbdHostHeartbeatChecker) ExternalPrimaryStorageHostRefVO.hostId [1, 999] ✅ 正确
iSCSI fencer (KvmIscsiNodeServer) heartbeatVol.getHostId() = bdc id 递增, 2000+ ❌ BUG

修复

KvmIscsiNodeServer: 使用 ExternalPrimaryStorageHostRefVO.hostId(平台管理的紧凑 ID [1, 999]),与 CBD fencer 保持一致。

联动 MR: premium fix/ZSTAC-81797@@2

sync from gitlab !9274

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

概述

本次变更引入了基于指数移动平均(EMA)的自适应 ping 延迟跟踪机制、主机追踪器的抖动延迟启动、数据库 GPU 及 Mdev 设备模式迁移,以及异步定时器的延迟启动功能。涉及计算、存储、数据库架构优化。

变更清单

cohort / 文件 变更摘要
Ping 延迟自适应追踪
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java, compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java
引入 EMA 类追踪单主机 ping 延迟,计算自适应超时(全局超时的 1-3 倍),KVMHost 集成 EMA 并在 ping 响应时更新延迟数据。
主机追踪器抖动延迟启动
compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java, core/src/main/java/org/zstack/core/thread/AsyncTimer.java
替换同步启动为带随机抖动的延迟启动,避免大量主机同时重连导致的雷鸣羊群效应;AsyncTimer 新增 startWithDelay() 方法支持延迟调度。
iSCSI 节点服务主机 ID 分配
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java
在故障处理及自护栏流程中添加主机 ID 分配逻辑,当引用缺失或 ID 为 0 时通过 ExternalHostIdGetter 分配,记录相关日志。
数据库 GPU 设备范围迁移
conf/db/upgrade/V6.0.0.1__gpu_add_scope.sql
向 GpuDeviceVO 添加 scope(默认 'VM')和 chassisUuid 列;标记虚拟化 GPU(PciDeviceVO.virtStatus = 'HAMI_VIRTUALIZED')的 scope 为 'CONTAINER';创建 scope 列索引。
BareMetal2 GPU 设备迁移
conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql
条件化迁移 BareMetal2 GPU 设备至统一的 GpuDeviceVO,通过虚拟宿主机关联,包含 PCI 基础数据和 GPU 扩展数据迁移步骤。
MdevDeviceVO 子表迁移
conf/db/upgrade/V6.0.0.3__mdev_subtable.sql
将 MdevDeviceVO 从独立表转换为 PciDeviceVO 的子表,迁移字段及资源引用、创建外键关系,更新 AccountResourceRefVO 关联以保持一致性。
单元测试补充
compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java, compute/src/test/java/org/zstack/compute/host/PingLatencyEmaTest.java
新增 HostTrackJitterTest 验证抖动分布均匀性及范围,PingLatencyEmaTest 验证 EMA 计算、自适应超时夹取、收敛行为。

预估审查工作量

🎯 4 (复杂) | ⏱️ ~50 分钟

诗歌

兔儿跳跃带抖动,
EMA 计时更聪慧,
GPU 迁移归一统,
MdevDeviceVO 成子表妙,
自适应心跳永不哽! 🐰⏱️


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error The title does not follow the required '[scope]: ' format and exceeds 72 characters (79 characters). Reformat the title to follow '[scope]: ' pattern and keep it under 72 characters, e.g., 'fix[iscsi]: Use platform hostId instead of storage bdc id'.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description is clearly related to the changeset, explaining the problem, root cause, and fix for the iSCSI fencer hostId issue.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/zstackio/fix/ZSTAC-81797@@2
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

247-248: 建议去除 Line 247 的魔法值 999,改为具名常量。

这能减少后续修改范围并避免语义漂移。

♻️ 建议重构
 public class KvmIscsiNodeServer implements Component, KVMStartVmExtensionPoint, VmInstanceMigrateExtensionPoint,
         KVMConvertVolumeExtensionPoint, KVMDetachVolumeExtensionPoint, KVMAttachVolumeExtensionPoint,
         KVMPreAttachIsoExtensionPoint, KvmSetupSelfFencerExtensionPoint {
     private static final CLogger logger = Utils.getLogger(KvmIscsiNodeServer.class);
+    private static final int MAX_EXTERNAL_PRIMARY_STORAGE_HOST_ID = 999;
...
-                            ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
+                            ref = new ExternalHostIdGetter(MAX_EXTERNAL_PRIMARY_STORAGE_HOST_ID).getOrAllocateHostIdRef(
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid());

As per coding guidelines: “避免使用魔法值(Magic Value),应替换为常量或枚举。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 247 - 248, Replace the magic number 999 used when constructing
ExternalHostIdGetter with a named constant; define a descriptive static final
constant (e.g., DEFAULT_EXTERNAL_HOST_ID_GENERATOR_VERSION or
EXTERNAL_HOST_ID_GENERATOR_SEED) in the KvmIscsiNodeServer class (or a nearby
constants holder) and use it in the call to new ExternalHostIdGetter(...) so the
getOrAllocateHostIdRef invocation remains ref = new
ExternalHostIdGetter(CONSTANT).getOrAllocateHostIdRef(param.getHostUuid(),
param.getPrimaryStorage().getUuid()); ensure the constant name documents the
meaning of 999.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 244-249: After calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...) you must re-check that ref is
non-null and ref.getHostId() is a valid non-zero ID before using it to avoid
NullPointerException or invalid IDs; update the block around ref/param (where
logger.warn is used and getOrAllocateHostIdRef is invoked) to validate ref !=
null and ref.getHostId() != 0, handle the failure case (log an error and
abort/throw/return appropriate response) instead of proceeding to use
ref.getHostId(), and ensure any subsequent logic that relied on a successful
allocation (e.g., the code after line 257 that accesses ref.getHostId()) is
guarded by this check.

---

Nitpick comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 247-248: Replace the magic number 999 used when constructing
ExternalHostIdGetter with a named constant; define a descriptive static final
constant (e.g., DEFAULT_EXTERNAL_HOST_ID_GENERATOR_VERSION or
EXTERNAL_HOST_ID_GENERATOR_SEED) in the KvmIscsiNodeServer class (or a nearby
constants holder) and use it in the call to new ExternalHostIdGetter(...) so the
getOrAllocateHostIdRef invocation remains ref = new
ExternalHostIdGetter(CONSTANT).getOrAllocateHostIdRef(param.getHostUuid(),
param.getPrimaryStorage().getUuid()); ensure the constant name documents the
meaning of 999.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: fca80eec-18c0-44d0-a7da-1321344251fa

📥 Commits

Reviewing files that changed from the base of the PR and between 9826a0e and 33a3a40.

📒 Files selected for processing (1)
  • plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java

Comment on lines +244 to +249
if (ref == null || ref.getHostId() == 0) {
logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
param.getHostUuid(), param.getPrimaryStorage().getUuid()));
ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
param.getHostUuid(), param.getPrimaryStorage().getUuid());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "[1/3] Locate ExternalHostIdGetter implementation"
fd 'ExternalHostIdGetter\.java$' 

echo -e "\n[2/3] Show ExternalHostIdGetter class and getOrAllocateHostIdRef method"
fd 'ExternalHostIdGetter\.java$' -x cat {} 

echo -e "\n[3/3] Check KvmIscsiNodeServer.java around lines 240-270"
fd 'KvmIscsiNodeServer\.java$' -x cat {} | head -n 270 | tail -n 40

Repository: MatheMatrix/zstack

Length of output: 11605


在分配 hostId 后添加二次校验,防止空指针和无效 ID。

getOrAllocateHostIdRef(...) 方法在分配失败场景下(如 ConstraintViolationException 或重复条目)会返回 null,最终的 find() 查询也可能无法检索到实体。第 257 行直接调用 ref.getHostId() 存在空指针风险。

🔧 建议修复
                        if (ref == null || ref.getHostId() == 0) {
                            logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
                                    param.getHostUuid(), param.getPrimaryStorage().getUuid()));
                            ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
                                    param.getHostUuid(), param.getPrimaryStorage().getUuid());
                        }
+
+                        if (ref == null || ref.getHostId() <= 0) {
+                            trigger.fail(operr("failed to get valid hostId for hostUuid[%s], primaryStorageUuid[%s]",
+                                    param.getHostUuid(), param.getPrimaryStorage().getUuid()));
+                            return;
+                        }

                         KvmSetupSelfFencerCmd cmd = new KvmSetupSelfFencerCmd();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (ref == null || ref.getHostId() == 0) {
logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
param.getHostUuid(), param.getPrimaryStorage().getUuid()));
ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
param.getHostUuid(), param.getPrimaryStorage().getUuid());
}
if (ref == null || ref.getHostId() == 0) {
logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
param.getHostUuid(), param.getPrimaryStorage().getUuid()));
ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
param.getHostUuid(), param.getPrimaryStorage().getUuid());
}
if (ref == null || ref.getHostId() <= 0) {
trigger.fail(operr("failed to get valid hostId for hostUuid[%s], primaryStorageUuid[%s]",
param.getHostUuid(), param.getPrimaryStorage().getUuid()));
return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 244 - 249, After calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...) you must re-check that ref is
non-null and ref.getHostId() is a valid non-zero ID before using it to avoid
NullPointerException or invalid IDs; update the block around ref/param (where
logger.warn is used and getOrAllocateHostIdRef is invoked) to validate ref !=
null and ref.getHostId() != 0, handle the failure case (log an error and
abort/throw/return appropriate response) instead of proceeding to use
ref.getHostId(), and ensure any subsequent logic that relied on a successful
allocation (e.g., the code after line 257 that accesses ref.getHostId()) is
guarded by this check.

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/fix/ZSTAC-81797@@2 branch from 33a3a40 to d0545cb Compare March 5, 2026 12:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

240-249: ⚠️ Potential issue | 🔴 Critical

分配 hostId 后需要二次校验,否则 Line 257 可能空指针。

当前仅在分配前判断 ref,但分配后直接在 Line 257 使用 ref.getHostId();若分配失败返回 null 或无效 ID,会导致流程在运行时中断。建议在分配后立即校验并 trigger.fail(...) 返回。

🔧 建议修复
                         if (ref == null || ref.getHostId() == 0) {
                             logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid()));
                             ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid());
                         }
+                        if (ref == null || ref.getHostId() <= 0) {
+                            trigger.fail(operr("failed to get valid hostId for hostUuid[%s], primaryStorageUuid[%s]",
+                                    param.getHostUuid(), param.getPrimaryStorage().getUuid()));
+                            return;
+                        }

                         KvmSetupSelfFencerCmd cmd = new KvmSetupSelfFencerCmd();

Also applies to: 257-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 240 - 249, After calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...) in KvmIscsiNodeServer,
immediately re-validate the returned ExternalPrimaryStorageHostRefVO (the
variable ref) before any use; if ref is null or ref.getHostId() == 0, log a
clear error and call trigger.fail(...) (with a descriptive message including
hostUuid and primaryStorageUuid) and return to avoid a NullPointerException when
later accessing ref.getHostId(); apply the same post-allocation check wherever
ExternalHostIdGetter.getOrAllocateHostIdRef is used.
🧹 Nitpick comments (1)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

247-248: 999 提取为具名常量,避免魔法值。

Line 247 直接使用 999 降低可读性且不利于后续统一调整,建议提取为类常量(如 MAX_EXTERNAL_HOST_ID)并复用。

As per coding guidelines “避免使用魔法值(Magic Value):直接使用未经定义的数值应替换为常量或枚举”。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 247 - 248, The code uses a magic number 999 when constructing
ExternalHostIdGetter; add a class constant in KvmIscsiNodeServer (e.g. private
static final int MAX_EXTERNAL_HOST_ID = 999) and replace the literal with that
constant in the call to new
ExternalHostIdGetter(...).getOrAllocateHostIdRef(param.getHostUuid(),
param.getPrimaryStorage().getUuid()); ensure the constant name is descriptive
(MAX_EXTERNAL_HOST_ID or EXTERNAL_HOST_ID_MAX) and reuse it wherever that limit
is applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 240-249: After calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...) in KvmIscsiNodeServer,
immediately re-validate the returned ExternalPrimaryStorageHostRefVO (the
variable ref) before any use; if ref is null or ref.getHostId() == 0, log a
clear error and call trigger.fail(...) (with a descriptive message including
hostUuid and primaryStorageUuid) and return to avoid a NullPointerException when
later accessing ref.getHostId(); apply the same post-allocation check wherever
ExternalHostIdGetter.getOrAllocateHostIdRef is used.

---

Nitpick comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 247-248: The code uses a magic number 999 when constructing
ExternalHostIdGetter; add a class constant in KvmIscsiNodeServer (e.g. private
static final int MAX_EXTERNAL_HOST_ID = 999) and replace the literal with that
constant in the call to new
ExternalHostIdGetter(...).getOrAllocateHostIdRef(param.getHostUuid(),
param.getPrimaryStorage().getUuid()); ensure the constant name is descriptive
(MAX_EXTERNAL_HOST_ID or EXTERNAL_HOST_ID_MAX) and reuse it wherever that limit
is applicable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: ccf42c51-be52-4d77-9c55-b958b8c425d2

📥 Commits

Reviewing files that changed from the base of the PR and between 33a3a40 and d0545cb.

📒 Files selected for processing (1)
  • plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/fix/ZSTAC-81797@@2 branch 2 times, most recently from 6533eee to 3fa9b47 Compare March 5, 2026 13:03
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

5020-5030: ⚠️ Potential issue | 🟠 Major

改用 System.nanoTime() 计算 RTT 差值,避免 NTP 时钟回拨污染 EMA。

当前代码在第 5020 和 5030 行使用 System.currentTimeMillis() 计算耗时。遇到 NTP 回拨会导致负数被传入 pingLatencyEma.update(),虽然 computeAdaptiveTimeout() 对负数的 EMA 有防护,但负值已经污染了 EMA 计算结果。建议改用 System.nanoTime() 获得单调递增的时间差,并在调用侧补充 Math.max(1L, latencyMs) 保护,避免极小值影响 EMA 稳定性。

建议修改
-                        final long pingStartMs = System.currentTimeMillis();
+                        final long pingStartNs = System.nanoTime();

                        restf.asyncJsonPost(pingPath, cmd, new JsonAsyncRESTCallback<PingResponse>(trigger) {
                            `@Override`
                            public void fail(ErrorCode err) {
                                trigger.fail(err);
                            }

                            `@Override`
                            public void success(PingResponse ret) {
-                                pingLatencyEma.update(System.currentTimeMillis() - pingStartMs);
+                                long latencyMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - pingStartNs);
+                                pingLatencyEma.update(Math.max(1L, latencyMs));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 5020 -
5030, Replace the use of System.currentTimeMillis() for RTT measurement with
System.nanoTime() to avoid NTP clock skew: record a pingStartNs before calling
restf.asyncJsonPost in KVMHost, compute latencyMs = Math.max(1L,
(System.nanoTime() - pingStartNs) / 1_000_000L) in the success callback, and
pass that latencyMs into pingLatencyEma.update(...) (also ensure any other
places that compute RTT for computeAdaptiveTimeout() use the same change).
🧹 Nitpick comments (6)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (2)

23-24: 建议使用明确的导入代替通配符导入。

通配符导入 org.zstack.header.storage.addon.primary.* 可能降低代码可读性,建议仅导入实际使用的类(如 ExternalPrimaryStorageHostRefVOExternalPrimaryStorageHostRefVO_HeartbeatVolumeTOHeartbeatVolumeTopology)。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 23 - 24, Replace the wildcard import
org.zstack.header.storage.addon.primary.* with explicit imports of only the
classes used in this file (e.g. ExternalPrimaryStorageHostRefVO,
ExternalPrimaryStorageHostRefVO_, HeartbeatVolumeTO, HeartbeatVolumeTopology,
ExternalHostIdGetter) to improve readability; update the import block at the top
of KvmIscsiNodeServer.java so each required class is imported explicitly instead
of using the * import.

247-248: 避免使用魔法值 999,建议定义为常量。

魔法值 999 表示 hostId 的最大范围上限,应定义为有意义的常量以提高可读性和可维护性。根据 PR 描述,此值代表平台管理的紧凑 hostId 范围 [1, 999]

♻️ 建议重构

在类中定义常量:

private static final int MAX_EXTERNAL_HOST_ID = 999;

然后使用:

-                            ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
+                            ref = new ExternalHostIdGetter(MAX_EXTERNAL_HOST_ID).getOrAllocateHostIdRef(
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid());

根据编码规范:避免使用魔法值(Magic Value),直接使用未经定义的数值应替换为枚举或常量。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 247 - 248, 在 KvmIscsiNodeServer 中替换魔法值 999:在类里新增有意义的常量(例如 private
static final int MAX_EXTERNAL_HOST_ID = 999)并将 ExternalHostIdGetter 的构造调用从 new
ExternalHostIdGetter(999) 改为 new
ExternalHostIdGetter(MAX_EXTERNAL_HOST_ID),同时检查并更新类中其它使用同一含义数值的地方以复用该常量(定位符:类
KvmIscsiNodeServer、构造调用 ExternalHostIdGetter(...)、方法 getOrAllocateHostIdRef)。
utils/src/main/java/org/zstack/utils/string/StringSimilarity.java (1)

290-297: 注释中的阶段数描述与实现不一致。

Line 290 写的是 “two-phase strategy”,但 Line 294-297 实际是三阶段(Phase 1/2/3)。建议改成 “three-phase strategy”,避免后续维护误解。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utils/src/main/java/org/zstack/utils/string/StringSimilarity.java` around
lines 290 - 297, The comment in StringSimilarity (around the javadoc describing
matching strategy) incorrectly says "two-phase strategy" while the subsequent
bullets enumerate Phase 1/2/3; update the textual summary to "three-phase
strategy" (or otherwise reconcile the header with the bullets) so the
description matches the implementation and bullets in the StringSimilarity class
documentation.
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

172-173: 将 EMA 参数提取为具名常量或资源配置,避免魔法值。

Line 173 直接使用 0.33,语义不清晰且后续调参成本高。建议至少提取为具名常量(更优是走可配置项)。

♻️ 建议修改
+    private static final double PING_LATENCY_EMA_ALPHA = 0.3D;
+    private static final int PING_TIMEOUT_EMA_MULTIPLIER = 3;
     // Per-host ping latency EMA for adaptive timeout (ZSTAC-67534)
-    private final PingLatencyEma pingLatencyEma = new PingLatencyEma(0.3, 3);
+    private final PingLatencyEma pingLatencyEma = new PingLatencyEma(PING_LATENCY_EMA_ALPHA, PING_TIMEOUT_EMA_MULTIPLIER);

As per coding guidelines “避免使用魔法值(Magic Value)”。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 172 - 173,
The code uses magic numbers when constructing PingLatencyEma in KVMHost (new
PingLatencyEma(0.3, 3)); extract these literals into named constants (e.g.,
PING_LATENCY_EMA_ALPHA and PING_LATENCY_EMA_WINDOW) or ideally make them
configurable via a resource/config property, then construct pingLatencyEma using
those symbols so the meaning is clear and values can be tuned without code
changes.
compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java (1)

25-29: 当前用例主要在测试 ThreadLocalRandom,未直接覆盖 HostTrack 真实实现

这三处都直接生成随机数,和生产抖动逻辑没有直接绑定。建议将生产抖动计算抽成可调用方法(或注入随机源),这里直接断言该方法行为,避免“测试通过但实现回归”。

Also applies to: 48-52, 64-69

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java`
around lines 25 - 29, Refactor the jitter calculation so tests exercise the
actual production logic instead of calling ThreadLocalRandom directly: add a
callable method on the HostTrack class (e.g., calculateJitterBucket or
computeJitterMs) that accepts intervalMs and bucketCount and an injectable
random source (or LongSupplier) used to derive jitterMs and bucket index, then
update HostTrackJitterTest to invoke that method (instead of using
ThreadLocalRandom in the three loops) and assert its returned jitterMs/bucket
distribution; ensure tests cover both using a real random supplier and
deterministic suppliers to catch regressions in
HostTrack.calculateJitterBucket/computeJitterMs behavior.
conf/tools/install.sh (1)

106-106: 请为 cd $cwd 增加失败保护

cd 失败后脚本会继续执行,后续安装步骤可能在错误目录运行。建议改为 cd "$cwd" || exit 1

建议修改
-        cd $cwd
+        cd "$cwd" || exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/tools/install.sh` at line 106, 将当前的 cd $cwd 替换为对变量加引号并在失败时退出的写法(即使用
"$cwd" 并在 cd 失败时立即退出脚本),确保在脚本无法切换到目标目录时不会继续执行后续安装步骤;定位并修改包含 cd $cwd 的行即可。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java`:
- Around line 12-15: The PingLatencyEma constructor should validate its inputs:
ensure alpha is >0 and <=1 and timeoutMultiplier is >0; if values are out of
range, throw an IllegalArgumentException with a clear message. Update the
PingLatencyEma(double alpha, int timeoutMultiplier) constructor to perform these
checks before assigning fields (validate alpha and timeoutMultiplier, then set
this.alpha and this.timeoutMultiplier), so invalid configs fail fast.
- Around line 8-9: PingLatencyEma 中的读-改-写操作 emaMs = alpha * latencyMs + (1 -
alpha) * emaMs 不是原子的,会在并发异步回调下丢失更新;在 update()(类名 PingLatencyEma)中用同步保护(例如给实例加
synchronized 块 或 用 java.util.concurrent.atomic.DoubleAccumulator/AtomicReference
并使用 CAS 循环)确保对字段 emaMs 的更新原子;同时在构造函数校验参数范围:alpha 必须在 (0,1) 之间,timeoutMultiplier
必须为正整数,遇到非法值应抛出 IllegalArgumentException(引用字段 alpha、emaMs、timeoutMultiplier 与方法
update() 与构造器以定位修改点)。

In `@compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java`:
- Around line 14-15: Replace the Chinese test comments in HostTrackJitterTest
with clear, spell-checked English consistent with repository
guidelines—specifically update the class/test header that references "AC: Jitter
测试 — 重连时间戳呈均匀分布 (非全部 t=0)" and the other comments around lines referenced (also
applies to the comments at 56-58) to English; ensure the description mentions
that the test simulates the jitter logic from HostTrackImpl.trackHost() for 3000
hosts and keep phrasing concise and consistent with other tests.

In `@compute/src/test/java/org/zstack/compute/host/PingLatencyEmaTest.java`:
- Around line 8-10: Replace the Chinese sentence in the Javadoc comment in
PingLatencyEmaTest (the comment that currently reads "AC: 合成延迟序列 [100, 200, 150,
300, 250]ms → EMA 产出正确自适应超时") with an English equivalent, e.g. "AC: synthetic
latency series [100, 200, 150, 300, 250] ms → EMA yields correct adaptive
timeout", so the class/method comment in PingLatencyEmaTest uses only English.

In `@conf/db/upgrade/V5.5.12__schema.sql`:
- Around line 25-33: The UPDATE using STR_TO_DATE on table ModelEvaluationTaskVO
with format '%b %d, %Y %h:%i:%s %p' can return NULL if MySQL's lc_time_names
isn't English; before running this migration, verify and if necessary set the
session lc_time_names to an English locale (e.g., 'en_US') or run a SET SESSION
lc_time_names='en_US' prior to executing the UPDATE, or alternatively adjust the
migration to parse a locale-independent representation (e.g., store month
numbers) instead of relying on STR_TO_DATE; ensure the change references the
same Json_getKeyValue(`opaque`, 'end_time') usage and that the migration
includes the session locale set or validation step.

In `@conf/db/upgrade/V6.0.0.1__gpu_add_scope.sql`:
- Around line 7-33: Replace dynamic DATABASE() usage with the fixed schema
'zstack' and fully qualify all table references: change TABLE_SCHEMA =
DATABASE() checks to TABLE_SCHEMA = 'zstack' and prefix DDL/DML table names
(GpuDeviceVO, PciDeviceVO) with zstack.`...` (e.g. zstack.`GpuDeviceVO`,
zstack.`PciDeviceVO`) including the ALTER TABLE and UPDATE statements, and
ensure the CREATE INDEX targets zstack.`GpuDeviceVO` (index name
idxGpuDeviceVOScope) to avoid operating against the wrong schema during
upgrades.

In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql`:
- Around line 8-83: The script currently uses TABLE_SCHEMA = DATABASE() and
unqualified table names, which makes the upgrade depend on the current
connection DB; change the schema checks and all table references to the fixed
'zstack' schema. Replace TABLE_SCHEMA = DATABASE() with TABLE_SCHEMA = 'zstack'
in the IF condition and fully qualify every table reference (e.g.,
zstack.`BareMetal2ChassisGpuDeviceVO`, zstack.`BareMetal2ChassisVO`,
zstack.`HostEO`, zstack.`BareMetal2ChassisPciDeviceVO`, zstack.`PciDeviceVO`,
zstack.`GpuDeviceVO`) and any INSERT/SELECT/ALTER/UPDATE targets so the
migration is explicitly applied to the zstack schema.

In `@conf/db/upgrade/V6.0.0.3__mdev_subtable.sql`:
- Around line 9-95: The script uses DATABASE() and unqualified table names
(e.g., checks and queries against MdevDeviceVO, AccountResourceRefVO,
PciDeviceVO) which risks operating on the wrong schema; update all occurrences
of TABLE_SCHEMA = DATABASE() to TABLE_SCHEMA = 'zstack' and fully qualify table
references with the zstack schema (e.g., zstack.`MdevDeviceVO`,
zstack.`PciDeviceVO`, zstack.`AccountResourceRefVO`) throughout the migration
statements including the IF EXISTS checks, INSERT ... SELECT, DELETE, LEFT JOIN
and RENAME/CREATE/INSERT IGNORE steps so every reference to MdevDeviceVO,
PciDeviceVO, AccountResourceRefVO, MdevDeviceSpecVO and MttyDeviceVO is
explicitly prefixed with zstack.
- Around line 8-66: The script currently renames MdevDeviceVO to
MdevDeviceVO_old before creating the new subtable, which can leave the DB in a
non-reentrant state if any later step fails; update the conditional checks and
control flow around the RENAME/CREATE so the migration is idempotent and
recoverable by: 1) changing the IF/WHERE guards to also detect the presence of
MdevDeviceVO_old and absence of the new MdevDeviceVO (so a previously
interrupted run can resume), 2) only performing RENAME TABLE `MdevDeviceVO` TO
`MdevDeviceVO_old` when `MdevDeviceVO_old` does not already exist and the new
`MdevDeviceVO` does not exist, 3) skipping the INSERT INTO `PciDeviceVO` and
subsequent steps if `PciDeviceVO` already contains migrated rows, and 4)
ensuring the CREATE TABLE `MdevDeviceVO` and the INSERT IGNORE INTO
`MdevDeviceVO` from `MdevDeviceVO_old` are guarded by existence checks so
re-execution is safe.

In `@conf/tools/install.sh`:
- Line 123: The wrapper script invokes the virtualenv Ansible binary using an
unquoted $@ which will split/glob arguments containing spaces or wildcards;
update the two invocations of ${VIRTUAL_ENV}/bin/ansible $@ to pass all original
args safely as "${@}" (i.e., replace the unquoted $@ with the quoted form) so
arguments are preserved intact, and apply the same change at the other
occurrence noted (the second ${VIRTUAL_ENV}/bin/ansible $@).

In `@core/src/main/java/org/zstack/core/ansible/AnsibleFacadeImpl.java`:
- Around line 260-261: In AnsibleFacadeImpl update the ShellUtils.run
invocations that currently inject "-B %s" (where the playbook path is being
passed as the %s) so the "-B %s" argument is removed and the playbook path
(playBookPath) is passed as a positional argument after all options; adjust the
same change in the three occurrences around the ShellUtils.run calls (the ones
building the ansible-playbook command string), ensuring options like
--private-key, -e, -i, -vvvv remain before you append the playBookPath as a
standalone argument and that the formatting/escaping of arguments (e.g.
JSONObjectUtil.dumpPretty(arguments) and AnsibleConstant.LOG_PATH) is preserved.

In
`@core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java`:
- Around line 38-72: The loadAllJsonFiles method currently appends into
localeMessages without clearing it, so on component restart/hot-reload old
locales can remain; update loadAllJsonFiles to clear or reinitialize the
localeMessages map at the start (e.g., call localeMessages.clear() or assign a
new LinkedHashMap) before scanning and loading files so removed locale entries
are not retained.
- Around line 127-150: The localizeErrorCode method can infinitely recurse when
ErrorCode.cause or ErrorCodeList.causes form cycles or share nodes; modify
localizeErrorCode to detect and skip already-visited ErrorCode instances by
passing a visited set (e.g., Set<ErrorCode> or IdentityHashMap identity set)
through recursive calls, check/add the current error before processing
getGlobalErrorCode/getLocalizedMessage and before recursing into
error.getCause() and each entry of ((ErrorCodeList) error).getCauses(); ensure
overloaded/local helper method localizeErrorCode(ErrorCode error, String locale,
Set<ErrorCode> visited) is used so existing callers can remain unchanged while
preventing stack overflow from cycles.

In `@core/src/main/java/org/zstack/core/errorcode/LocaleUtils.java`:
- Around line 37-60: resolveLocale currently returns DEFAULT_LOCALE (en_US)
unconditionally when no acceptLanguage matches, which can be invalid if en_US
isn't loaded; update resolveLocale to check if DEFAULT_LOCALE is present in
availableLocales and if not, pick a sensible fallback from availableLocales
(e.g., the first element or a prioritized alternative) before returning
DEFAULT_LOCALE as a last resort; modify the logic inside resolveLocale after the
for-loop (use availableLocales.contains(DEFAULT_LOCALE) and otherwise select a
fallback locale from availableLocales) and ensure this behavior works with
normalizeTag, LANGUAGE_TO_LOCALE, parseAcceptLanguage and LocaleEntry handling.

In `@rest/src/main/java/org/zstack/rest/RestServer.java`:
- Around line 411-413: The failure branch calls
i18nService.localizeErrorCode(evt.getError(), locale) and
response.setError(evt.getError()) without null checks, which can NPE if
evt.getError() is null; update those spots (where resolveLocale(),
i18nService.localizeErrorCode(...), and response.setError(...) are used) to
first capture ErrorCode err = evt.getError(); if err != null then call
i18nService.localizeErrorCode(err, locale) and response.setError(err); otherwise
skip localization and set (or leave) the response error accordingly to avoid
NPE.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`:
- Around line 429-433: Add the symmetric null-check for destHost in
prepareVmResource to match preReleaseVmResource: before calling
activateVolumes(vols, spec.getDestHost(), false, completion) in method
prepareVmResource, check if spec.getDestHost() == null, log a debug message like
"skip activate volumes because no host associated", call completion.success(),
and return early; this prevents passing a null host into activateVolumes and
mirrors the existing behavior in preReleaseVmResource.

In
`@test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy`:
- Around line 150-154: In ErrorCodeI18nCase add explicit timeouts to every
HttpURLConnection instance created (the local variable conn where you call
(HttpURLConnection) new URL(url).openConnection()) by calling
conn.setConnectTimeout(...) and conn.setReadTimeout(...) with reasonable
millisecond values (e.g. 5s/15s) immediately after opening the connection but
before setRequestMethod/setRequestProperty; update all four occurrences of conn
in this file so none of the requests can block indefinitely.

In
`@test/src/test/java/org/zstack/test/core/errorcode/TestGlobalErrorCodeI18n.java`:
- Around line 36-50: The tests testZhCNMatch and testEnUSMatch only assert the
placeholder "abc123" is present and thus miss verifying the locale-specific
language; update each test to also assert that the returned message contains a
language-specific marker for the requested locale (for example, verify zh_CN
message contains Chinese characters or a known Chinese fragment and en_US
message contains an expected English fragment) when calling
i18nService.getLocalizedMessage with error code
"ORG_ZSTACK_STORAGE_PRIMARY_10039", so the assertions ensure the correct locale
text is returned rather than only the parameter substitution.

In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy`:
- Around line 453-457: The remove branch can call remove() on an immutable
Collections.emptySet() when clientId is absent, causing
UnsupportedOperationException; change the logic in the remove branch to retrieve
a mutable set (e.g. use iscsiClientSnapshots.computeIfAbsent(clientId, {
ConcurrentHashMap.newKeySet() }) or check
iscsiClientSnapshots.containsKey(clientId) and only call remove on the existing
mutable set) so that iscsiClientSnapshots and clientId handling is safe and no
immutable set is modified.

In `@utils/src/main/java/org/zstack/utils/string/StringSimilarity.java`:
- Around line 323-355: The early return on missed.get(formatted) incorrectly
aborts the whole similarity flow; instead, detect a "formatted missed" and only
skip the formatted-string regex branch. Concretely, in the method inside
StringSimilarity where you check missed.get(formatted) (the block that currently
calls logSearchSpend(sub, start, false) and return null), remove the return and
set a local boolean (e.g., formattedMissed = true) so you still log the miss but
allow subsequent fallback logic to run; then guard Phase 1 (the block that calls
findMostSimilarRegex(formatted)) with args != null && args.length > 0 &&
!formattedMissed so formatted misses don’t trigger formatted-regex attempts but
still allow the raw-template regex (findMostSimilarRegex(sub)) and distance
fallback to execute.

---

Outside diff comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 5020-5030: Replace the use of System.currentTimeMillis() for RTT
measurement with System.nanoTime() to avoid NTP clock skew: record a pingStartNs
before calling restf.asyncJsonPost in KVMHost, compute latencyMs = Math.max(1L,
(System.nanoTime() - pingStartNs) / 1_000_000L) in the success callback, and
pass that latencyMs into pingLatencyEma.update(...) (also ensure any other
places that compute RTT for computeAdaptiveTimeout() use the same change).

---

Nitpick comments:
In `@compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java`:
- Around line 25-29: Refactor the jitter calculation so tests exercise the
actual production logic instead of calling ThreadLocalRandom directly: add a
callable method on the HostTrack class (e.g., calculateJitterBucket or
computeJitterMs) that accepts intervalMs and bucketCount and an injectable
random source (or LongSupplier) used to derive jitterMs and bucket index, then
update HostTrackJitterTest to invoke that method (instead of using
ThreadLocalRandom in the three loops) and assert its returned jitterMs/bucket
distribution; ensure tests cover both using a real random supplier and
deterministic suppliers to catch regressions in
HostTrack.calculateJitterBucket/computeJitterMs behavior.

In `@conf/tools/install.sh`:
- Line 106: 将当前的 cd $cwd 替换为对变量加引号并在失败时退出的写法(即使用 "$cwd" 并在 cd
失败时立即退出脚本),确保在脚本无法切换到目标目录时不会继续执行后续安装步骤;定位并修改包含 cd $cwd 的行即可。

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 23-24: Replace the wildcard import
org.zstack.header.storage.addon.primary.* with explicit imports of only the
classes used in this file (e.g. ExternalPrimaryStorageHostRefVO,
ExternalPrimaryStorageHostRefVO_, HeartbeatVolumeTO, HeartbeatVolumeTopology,
ExternalHostIdGetter) to improve readability; update the import block at the top
of KvmIscsiNodeServer.java so each required class is imported explicitly instead
of using the * import.
- Around line 247-248: 在 KvmIscsiNodeServer 中替换魔法值 999:在类里新增有意义的常量(例如 private
static final int MAX_EXTERNAL_HOST_ID = 999)并将 ExternalHostIdGetter 的构造调用从 new
ExternalHostIdGetter(999) 改为 new
ExternalHostIdGetter(MAX_EXTERNAL_HOST_ID),同时检查并更新类中其它使用同一含义数值的地方以复用该常量(定位符:类
KvmIscsiNodeServer、构造调用 ExternalHostIdGetter(...)、方法 getOrAllocateHostIdRef)。

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 172-173: The code uses magic numbers when constructing
PingLatencyEma in KVMHost (new PingLatencyEma(0.3, 3)); extract these literals
into named constants (e.g., PING_LATENCY_EMA_ALPHA and PING_LATENCY_EMA_WINDOW)
or ideally make them configurable via a resource/config property, then construct
pingLatencyEma using those symbols so the meaning is clear and values can be
tuned without code changes.

In `@utils/src/main/java/org/zstack/utils/string/StringSimilarity.java`:
- Around line 290-297: The comment in StringSimilarity (around the javadoc
describing matching strategy) incorrectly says "two-phase strategy" while the
subsequent bullets enumerate Phase 1/2/3; update the textual summary to
"three-phase strategy" (or otherwise reconcile the header with the bullets) so
the description matches the implementation and bullets in the StringSimilarity
class documentation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 91a043cd-3068-4bd4-845d-47751ab03312

📥 Commits

Reviewing files that changed from the base of the PR and between d0545cb and 6533eee.

⛔ Files ignored due to path filters (21)
  • compute/pom.xml is excluded by !**/*.xml
  • conf/i18n/globalErrorCodeMapping/global-error-de-DE.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-en_US.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-fr-FR.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-id-ID.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-ja-JP.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-ko-KR.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-ru-RU.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-th-TH.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-zh_CN.json is excluded by !**/*.json
  • conf/i18n/globalErrorCodeMapping/global-error-zh_TW.json is excluded by !**/*.json
  • conf/springConfigXml/Error.xml is excluded by !**/*.xml
  • sdk/src/main/java/org/zstack/sdk/AddModelAction.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/ErrorCode.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuMetricsBatchAction.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuMetricsBatchResult.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuResourceTreeAction.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuResourceTreeResult.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GpuMetricsEntry.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GpuResourceTreeNode.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/ModelEvaluationTaskInventory.java is excluded by !sdk/**
📒 Files selected for processing (39)
  • compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java
  • compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java
  • compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java
  • compute/src/test/java/org/zstack/compute/host/PingLatencyEmaTest.java
  • conf/db/upgrade/V5.5.12__schema.sql
  • conf/db/upgrade/V6.0.0.1__gpu_add_scope.sql
  • conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql
  • conf/db/upgrade/V6.0.0.3__mdev_subtable.sql
  • conf/tools/install.sh
  • core/src/main/java/org/zstack/core/Platform.java
  • core/src/main/java/org/zstack/core/ansible/AnsibleFacadeImpl.java
  • core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nService.java
  • core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java
  • core/src/main/java/org/zstack/core/errorcode/LocaleUtils.java
  • core/src/main/java/org/zstack/core/rest/RESTApiController.java
  • core/src/main/java/org/zstack/core/thread/AsyncTimer.java
  • header/src/main/java/org/zstack/header/errorcode/ErrorCode.java
  • plugin/cbd/src/main/java/org/zstack/cbd/kvm/KvmCbdNodeServer.java
  • plugin/expon/src/main/java/org/zstack/expon/ExponStorageController.java
  • plugin/expon/src/main/java/org/zstack/expon/sdk/ExponClient.java
  • plugin/expon/src/main/java/org/zstack/expon/sdk/ExponConnectConfig.java
  • plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • rest/src/main/java/org/zstack/rest/RestServer.java
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
  • test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/StorageTest.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/PrimaryStorageTest.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/ExternalPrimaryStorageSelectBackupStorageCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/expon/ExponPrimaryStorageCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/xinfini/XinfiniPrimaryStorageCase.groovy
  • test/src/test/java/org/zstack/test/core/errorcode/TestGlobalErrorCodeI18n.java
  • test/src/test/java/org/zstack/test/core/errorcode/TestLocaleUtils.java
  • testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy
  • testlib/src/main/java/org/zstack/testlib/SpringSpec.groovy
  • testlib/src/main/java/org/zstack/testlib/Test.groovy
  • utils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
  • utils/src/main/java/org/zstack/utils/string/StringSimilarity.java

Comment on lines +8 to +9
private volatile double emaMs = -1;
private final double alpha;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 查找并读取 PingLatencyEma.java 文件
find . -name "PingLatencyEma.java" -type f

Repository: MatheMatrix/zstack

Length of output: 129


🏁 Script executed:

#!/bin/bash
# 读取文件内容
cat -n compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java

Repository: MatheMatrix/zstack

Length of output: 1356


🏁 Script executed:

#!/bin/bash
# 搜索 PingLatencyEma 的所有使用和调用点
rg -n --type=java 'PingLatencyEma' -C 3

Repository: MatheMatrix/zstack

Length of output: 7280


🏁 Script executed:

#!/bin/bash
# 搜索 update 方法的调用情况
rg -n --type=java '\.update\s*\(' --no-heading -A 2

Repository: MatheMatrix/zstack

Length of output: 50375


🏁 Script executed:

#!/bin/bash
# 精确搜索 pingLatencyEma.update 的调用
rg -n --type=java 'pingLatencyEma\s*\.\s*update\s*\(' -C 5

Repository: MatheMatrix/zstack

Length of output: 1220


🏁 Script executed:

#!/bin/bash
# 搜索 KVMHost 类及其使用的并发相关信息
rg -n --type=java 'class KVMHost|pingLatencyEma.computeAdaptiveTimeout|pingLatencyEma.getEmaMs' -C 2

Repository: MatheMatrix/zstack

Length of output: 11080


🏁 Script executed:

#!/bin/bash
# 检查 alpha 参数的默认值和有效范围
rg -n --type=java 'new PingLatencyEma' -C 1

Repository: MatheMatrix/zstack

Length of output: 2705


🏁 Script executed:

#!/bin/bash
# 查找 ping 方法的完整上下文,了解回调执行环境
rg -n --type=java -B 30 'pingLatencyEma\.update\(System\.currentTimeMillis' | head -80

Repository: MatheMatrix/zstack

Length of output: 3663


🏁 Script executed:

#!/bin/bash
# 搜索 trigger 对象的类型和说明
rg -n --type=java 'AsyncBackend|trigger.*=' plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java | head -20

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

#!/bin/bash
# 检查 alpha 参数的有效范围说明
rg -n --type=java -A 5 'alpha.*EMA|exponential.*moving.*average' compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java

Repository: MatheMatrix/zstack

Length of output: 44


update() 的读改写操作不是原子的,在异步回调并发执行时 EMA 可能丢失更新

第 21 行是典型的读改写操作:emaMs = alpha * latencyMs + (1 - alpha) * emaMs。虽然使用了 volatile,但它仅保证可见性,无法保证这个复合操作的原子性。在 KVMHost 中,pingLatencyEma.update() 由异步 HTTP 回调触发(Line 5030),多个异步线程可能并发调用同一实例,导致某些更新丢失。

建议为读改写操作添加同步保护:

修复建议
-    private volatile double emaMs = -1;
+    private double emaMs = -1;
@@
-    public void update(long latencyMs) {
+    public synchronized void update(long latencyMs) {
         if (emaMs < 0) {
             emaMs = latencyMs;
         } else {
             emaMs = alpha * latencyMs + (1 - alpha) * emaMs;
         }
     }
@@
-    public long computeAdaptiveTimeout(long globalTimeoutSeconds) {
+    public synchronized long computeAdaptiveTimeout(long globalTimeoutSeconds) {
         if (emaMs <= 0) {
             return globalTimeoutSeconds;
         }
         long emaBasedTimeout = (long) (emaMs * timeoutMultiplier / 1000) + 1;
         return Math.min(Math.max(globalTimeoutSeconds, emaBasedTimeout), globalTimeoutSeconds * 3);
     }
@@
-    public double getEmaMs() {
+    public synchronized double getEmaMs() {
         return emaMs;
     }

另外,构造函数应校验参数范围:alpha 应在 (0, 1) 之间,timeoutMultiplier 应为正整数,以防止无效配置。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java` around
lines 8 - 9, PingLatencyEma 中的读-改-写操作 emaMs = alpha * latencyMs + (1 - alpha) *
emaMs 不是原子的,会在并发异步回调下丢失更新;在 update()(类名 PingLatencyEma)中用同步保护(例如给实例加
synchronized 块 或 用 java.util.concurrent.atomic.DoubleAccumulator/AtomicReference
并使用 CAS 循环)确保对字段 emaMs 的更新原子;同时在构造函数校验参数范围:alpha 必须在 (0,1) 之间,timeoutMultiplier
必须为正整数,遇到非法值应抛出 IllegalArgumentException(引用字段 alpha、emaMs、timeoutMultiplier 与方法
update() 与构造器以定位修改点)。

Comment on lines +12 to +15
public PingLatencyEma(double alpha, int timeoutMultiplier) {
this.alpha = alpha;
this.timeoutMultiplier = timeoutMultiplier;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

建议在构造函数增加参数范围校验

alpha 建议限制在 (0, 1]timeoutMultiplier 建议 > 0,避免错误配置导致超时计算异常且静默生效。

🔧 建议修复
     public PingLatencyEma(double alpha, int timeoutMultiplier) {
+        if (alpha <= 0 || alpha > 1) {
+            throw new IllegalArgumentException("alpha must be in (0, 1]");
+        }
+        if (timeoutMultiplier <= 0) {
+            throw new IllegalArgumentException("timeoutMultiplier must be > 0");
+        }
         this.alpha = alpha;
         this.timeoutMultiplier = timeoutMultiplier;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java` around
lines 12 - 15, The PingLatencyEma constructor should validate its inputs: ensure
alpha is >0 and <=1 and timeoutMultiplier is >0; if values are out of range,
throw an IllegalArgumentException with a clear message. Update the
PingLatencyEma(double alpha, int timeoutMultiplier) constructor to perform these
checks before assigning fields (validate alpha and timeoutMultiplier, then set
this.alpha and this.timeoutMultiplier), so invalid configs fail fast.

Comment on lines +14 to +15
* AC: Jitter 测试 — 重连时间戳呈均匀分布 (非全部 t=0)
* Simulates the jitter logic from HostTrackImpl.trackHost() for 3000 hosts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

测试注释含中文,建议统一为英文

注释语言需要与仓库规范保持一致。

As per coding guidelines **/*.*: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写。

Also applies to: 56-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java`
around lines 14 - 15, Replace the Chinese test comments in HostTrackJitterTest
with clear, spell-checked English consistent with repository
guidelines—specifically update the class/test header that references "AC: Jitter
测试 — 重连时间戳呈均匀分布 (非全部 t=0)" and the other comments around lines referenced (also
applies to the comments at 56-58) to English; ensure the description mentions
that the test simulates the jitter logic from HostTrackImpl.trackHost() for 3000
hosts and keep phrasing concise and consistent with other tests.

Comment on lines +8 to +10
/**
* AC: 合成延迟序列 [100, 200, 150, 300, 250]ms → EMA 产出正确自适应超时
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

请将注释中的中文改为英文

Line 9 的注释包含中文,不符合仓库统一规范。

As per coding guidelines **/*.*: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/test/java/org/zstack/compute/host/PingLatencyEmaTest.java` around
lines 8 - 10, Replace the Chinese sentence in the Javadoc comment in
PingLatencyEmaTest (the comment that currently reads "AC: 合成延迟序列 [100, 200, 150,
300, 250]ms → EMA 产出正确自适应超时") with an English equivalent, e.g. "AC: synthetic
latency series [100, 200, 150, 300, 250] ms → EMA yields correct adaptive
timeout", so the class/method comment in PingLatencyEmaTest uses only English.

Comment on lines +25 to +33
UPDATE `zstack`.`ModelEvaluationTaskVO`
SET `endTime` = STR_TO_DATE(
Json_getKeyValue(`opaque`, 'end_time'),
'%b %d, %Y %h:%i:%s %p'
)
WHERE `opaque` IS NOT NULL
AND `endTime` IS NULL
AND Json_getKeyValue(`opaque`, 'end_time') IS NOT NULL
AND Json_getKeyValue(`opaque`, 'end_time') != '';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and check the exact content
cat -n conf/db/upgrade/V5.5.12__schema.sql | sed -n '20,40p'

Repository: MatheMatrix/zstack

Length of output: 722


🏁 Script executed:

# Search for similar STR_TO_DATE patterns with %b in all upgrade scripts
rg -n "STR_TO_DATE.*%b" conf/db/upgrade/

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# Also check if there are any other STR_TO_DATE usages in the upgrade directory
rg -n "STR_TO_DATE" conf/db/upgrade/ | head -20

Repository: MatheMatrix/zstack

Length of output: 129


🏁 Script executed:

# List all upgrade files to understand the context
ls -la conf/db/upgrade/ | grep -E "\.sql$"

Repository: MatheMatrix/zstack

Length of output: 12536


确认 STR_TO_DATE 的区域设置依赖,建议验证生产环境配置。

日期格式 '%b %d, %Y %h:%i:%s %p' 中的 %b(月份英文缩写)和 %p(AM/PM)依赖 MySQL 服务器的 lc_time_names 为英文。若生产环境该值非英文,STR_TO_DATE 会返回 NULL。虽然代码已包含 NULL 值检查和数据来源注释(Python agent 使用固定英文格式),但建议明确验证生产数据库的 lc_time_names 设置,确保兼容性。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/db/upgrade/V5.5.12__schema.sql` around lines 25 - 33, The UPDATE using
STR_TO_DATE on table ModelEvaluationTaskVO with format '%b %d, %Y %h:%i:%s %p'
can return NULL if MySQL's lc_time_names isn't English; before running this
migration, verify and if necessary set the session lc_time_names to an English
locale (e.g., 'en_US') or run a SET SESSION lc_time_names='en_US' prior to
executing the UPDATE, or alternatively adjust the migration to parse a
locale-independent representation (e.g., store month numbers) instead of relying
on STR_TO_DATE; ensure the change references the same Json_getKeyValue(`opaque`,
'end_time') usage and that the migration includes the session locale set or
validation step.

Comment on lines +429 to +433
if (spec.getDestHost() == null) {
logger.debug("skip deactivate volumes because no host associated");
completion.success();
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# 查看 ExternalPrimaryStorageFactory.java 中 preReleaseVmResource 的完整实现
rg -n 'preReleaseVmResource' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java -A 30 -B 5

# 查看该文件的整体结构
wc -l storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java

# 获取 429-433 行的完整上下文
sed -n '410,445p' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java

Repository: MatheMatrix/zstack

Length of output: 3590


🏁 Script executed:

# 查找 activateVolumes 的调用点,了解何时以及在哪个 host 上激活
rg -n 'activateVolumes' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java -B 10 -A 2

# 查找 VmInstanceSpec 中 destHost 和 currentHost 的定义和用途
find . -name VmInstanceSpec.java -type f | head -1 | xargs cat | head -200

# 查询是否存在其他地方对 destHost == null 的处理
rg -n 'getDestHost.*null|destHost.*null' --type=java | head -20

# 查看 preReleaseVmResource 的所有实现
rg -n '@Override' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java | grep -A 1 'preReleaseVmResource'

Repository: MatheMatrix/zstack

Length of output: 15128


🏁 Script executed:

# 查看 prepareVmResource 的完整实现(包括上下文)
sed -n '395,415p' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java

# 查看 ChangeImage 操作时 destHost 的处理
rg -n 'ChangeImage' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java -B 5 -A 5

# 查看普通虚拟机启动时 destHost 的赋值
rg -n 'setDestHost|destHost\s*=' compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | head -10

Repository: MatheMatrix/zstack

Length of output: 1848


🏁 Script executed:

# 查看 activateVolumes 的完整实现
sed -n '472,510p' storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java

# 查看 NFS 等其他存储实现中 prepareVmResource 是否有类似的 null 检查
rg -n 'preInstantiateVmResource' plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java -A 15

# 查看 SMPPrimaryStorageBase 中的处理
rg -n 'preInstantiateVmResource' plugin/sharedMountPointPrimaryStorage/src/main/java/org/zstack/storage/primary/smp/SMPPrimaryStorageBase.java -A 15

Repository: MatheMatrix/zstack

Length of output: 1757


🏁 Script executed:

# 查看 NFS 存储的 preInstantiateVmResource 实现
rg -n 'public void prepareVmResource|public void preInstantiateVmResource' plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/ -A 20 | head -60

# 查看 SMP 存储的 preInstantiateVmResource 实现
rg -n 'public void prepareVmResource|public void preInstantiateVmResource' plugin/sharedMountPointPrimaryStorage/src/main/java/org/zstack/storage/primary/smp/ -A 20 | head -60

# 确认 destHost 在启动虚拟机时是否必然非空
rg -n 'getCurrentVmOperation|VmOperation.Create|VmOperation.Start' compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | head -15

Repository: MatheMatrix/zstack

Length of output: 508


🏁 Script executed:

# 确认 NFS 存储的对应实现
rg -n 'public void preInstantiateVmResource|destHost.*null' plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java -B 5 -A 10

# 检查是否存在在 prepareVmResource 调用 activateVolumes 前的 destHost null 检查
rg -n 'preInstantiateVmResource' storage/src/main/java/org/zstack/storage/addon/primary/ -A 20 | grep -E '(destHost|null)'

Repository: MatheMatrix/zstack

Length of output: 1048


🏁 Script executed:

# 最后查询:确认 activate 方法的实现是否会处理 null host
rg -n 'void activate' plugin/ --type=java | grep -i 'primarystorage\|volumeinfo' | head -5

# 看 destHost 是否在所有启动路径上都被赋值
rg -n 'setDestHost' compute/src/main/java/org/zstack/compute/vm/ -B 3 -A 3

Repository: MatheMatrix/zstack

Length of output: 8450


prepareVmResource 侧添加对称的 destHost 空值检查。

当前 preReleaseVmResource(Line 429)中对 destHost == null 的检查是正确的防护。但配对的 prepareVmResource(Line 413)缺少对应的检查,直接调用 activateVolumes(vols, spec.getDestHost(), false, completion) 而未验证 destHost 是否为空。这形成不对称:若 destHost 为 null,激活操作会将 null 宿主机传递给后端服务,可能导致错误;而释放侧则因 null 检查而跳过反激活。

建议在 Line 413 前添加早返回逻辑,与 release 侧保持一致:

if (spec.getDestHost() == null) {
    logger.debug("skip activate volumes because no host associated");
    completion.success();
    return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`
around lines 429 - 433, Add the symmetric null-check for destHost in
prepareVmResource to match preReleaseVmResource: before calling
activateVolumes(vols, spec.getDestHost(), false, completion) in method
prepareVmResource, check if spec.getDestHost() == null, log a debug message like
"skip activate volumes because no host associated", call completion.success(),
and return early; this prevents passing a null host into activateVolumes and
mirrors the existing behavior in preReleaseVmResource.

Comment on lines +150 to +154
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection()
try {
conn.setRequestMethod("GET")
conn.setRequestProperty("Authorization", "OAuth ${sessionId}")
conn.setRequestProperty("Accept-Language", "zh-CN")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy | sed -n '148,156p; 196,207p; 222,231p; 283,291p'

Repository: MatheMatrix/zstack

Length of output: 2640


🏁 Script executed:

rg -n 'HttpURLConnection' test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy -A10 | head -60

Repository: MatheMatrix/zstack

Length of output: 2579


🏁 Script executed:

rg -n 'setConnectTimeout|setReadTimeout' test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy

Repository: MatheMatrix/zstack

Length of output: 44


HTTP 连接缺少超时配置,集成测试在网络故障时可能无期限阻塞

所有 4 个 HttpURLConnection 实例(第 150、198、224、285 行)未配置 setConnectTimeoutsetReadTimeout。在 CI 环境或网络抖动时会导致测试用例无期限卡死。

应添加超时设置:

建议补丁
 HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection()
 try {
+    conn.setConnectTimeout(5000)
+    conn.setReadTimeout(5000)
     conn.setRequestMethod("GET")
     conn.setRequestProperty("Authorization", "OAuth ${sessionId}")
     conn.setRequestProperty("Accept-Language", "zh-CN")
@@
 HttpURLConnection postConn = (HttpURLConnection) new URL(postUrl).openConnection()
 try {
+    postConn.setConnectTimeout(5000)
+    postConn.setReadTimeout(5000)
     postConn.setRequestMethod("POST")
@@
 HttpURLConnection jobConn = (HttpURLConnection) new URL(jobUrl).openConnection()
 try {
+    jobConn.setConnectTimeout(5000)
+    jobConn.setReadTimeout(5000)
     jobConn.setRequestMethod("GET")
@@
 HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection()
 try {
+    conn.setConnectTimeout(5000)
+    conn.setReadTimeout(5000)
     conn.setRequestMethod("GET")

另外三处位置也需要修改:198-205、224-229、285-289。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/rest/ErrorCodeI18nCase.groovy`
around lines 150 - 154, In ErrorCodeI18nCase add explicit timeouts to every
HttpURLConnection instance created (the local variable conn where you call
(HttpURLConnection) new URL(url).openConnection()) by calling
conn.setConnectTimeout(...) and conn.setReadTimeout(...) with reasonable
millisecond values (e.g. 5s/15s) immediately after opening the connection but
before setRequestMethod/setRequestProperty; update all four occurrences of conn
in this file so none of the requests can block indefinitely.

Comment on lines +36 to +50
public void testZhCNMatch() {
String msg = i18nService.getLocalizedMessage(
"ORG_ZSTACK_STORAGE_PRIMARY_10039", "zh_CN", new String[]{"abc123"});
Assert.assertNotNull(msg);
Assert.assertTrue("should contain Chinese text", msg.contains("abc123"));
logger.debug(String.format("zh_CN message: %s", msg));
}

@Test
public void testEnUSMatch() {
String msg = i18nService.getLocalizedMessage(
"ORG_ZSTACK_STORAGE_PRIMARY_10039", "en_US", new String[]{"abc123"});
Assert.assertNotNull(msg);
Assert.assertTrue("should contain English text", msg.contains("abc123"));
logger.debug(String.format("en_US message: %s", msg));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

本地化断言未真正校验“语言”是否正确

这两段测试目前只校验了参数回填(abc123),即使返回了错误语言文案也会通过,容易漏掉 locale 回退/匹配回归。

✅ 建议补丁
 `@Test`
 public void testZhCNMatch() {
     String msg = i18nService.getLocalizedMessage(
             "ORG_ZSTACK_STORAGE_PRIMARY_10039", "zh_CN", new String[]{"abc123"});
     Assert.assertNotNull(msg);
-    Assert.assertTrue("should contain Chinese text", msg.contains("abc123"));
+    Assert.assertTrue("zh_CN should return Chinese template", msg.contains("未找到主存储"));
+    Assert.assertTrue(msg.contains("abc123"));
     logger.debug(String.format("zh_CN message: %s", msg));
 }

 `@Test`
 public void testEnUSMatch() {
     String msg = i18nService.getLocalizedMessage(
             "ORG_ZSTACK_STORAGE_PRIMARY_10039", "en_US", new String[]{"abc123"});
     Assert.assertNotNull(msg);
-    Assert.assertTrue("should contain English text", msg.contains("abc123"));
+    Assert.assertTrue("en_US should return English template", msg.contains("no primary storage"));
+    Assert.assertTrue(msg.contains("abc123"));
     logger.debug(String.format("en_US message: %s", msg));
 }

Based on learnings: Test files that validate internationalization functionality for Chinese locales are exempt from the "no Chinese text" coding guideline, as Chinese literals are required in assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/java/org/zstack/test/core/errorcode/TestGlobalErrorCodeI18n.java`
around lines 36 - 50, The tests testZhCNMatch and testEnUSMatch only assert the
placeholder "abc123" is present and thus miss verifying the locale-specific
language; update each test to also assert that the returned message contains a
language-specific marker for the requested locale (for example, verify zh_CN
message contains Chinese characters or a known Chinese fragment and en_US
message contains an expected English fragment) when calling
i18nService.getLocalizedMessage with error code
"ORG_ZSTACK_STORAGE_PRIMARY_10039", so the assertions ensure the correct locale
text is returned rather than only the parameter substitution.

Comment on lines +453 to +457
if ("add".equals(action)) {
iscsiClientSnapshots.computeIfAbsent(clientId, { ConcurrentHashMap.newKeySet() }).add(snapId.toString())
} else if ("remove".equals(action)) {
iscsiClientSnapshots.getOrDefault(clientId, Collections.emptySet()).remove(snapId.toString())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

remove 分支可能抛 UnsupportedOperationException

Line 456 在 clientId 不存在时会拿到 Collections.emptySet(),这是不可变集合,继续调用 remove() 会直接抛异常,导致模拟器在重复解绑场景下不稳定。

建议修复
-                            } else if ("remove".equals(action)) {
-                                iscsiClientSnapshots.getOrDefault(clientId, Collections.emptySet()).remove(snapId.toString())
+                            } else if ("remove".equals(action)) {
+                                def snapSet = iscsiClientSnapshots.get(clientId)
+                                if (snapSet != null) {
+                                    snapSet.remove(snapId.toString())
+                                    if (snapSet.isEmpty()) {
+                                        iscsiClientSnapshots.remove(clientId, snapSet)
+                                    }
+                                }
                             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testlib/src/main/java/org/zstack/testlib/ExternalPrimaryStorageSpec.groovy`
around lines 453 - 457, The remove branch can call remove() on an immutable
Collections.emptySet() when clientId is absent, causing
UnsupportedOperationException; change the logic in the remove branch to retrieve
a mutable set (e.g. use iscsiClientSnapshots.computeIfAbsent(clientId, {
ConcurrentHashMap.newKeySet() }) or check
iscsiClientSnapshots.containsKey(clientId) and only call remove on the existing
mutable set) so that iscsiClientSnapshots and clientId handling is safe and no
immutable set is modified.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

240-257: ⚠️ Potential issue | 🟠 Major

修复方向正确,但需补充分配后的二次校验防止空指针。

此处改用 ExternalPrimaryStorageHostRefVO.hostId(平台紧凑 ID,范围 [1, 999])替代原先的 heartbeatVol.getHostId()(存储侧 BDC ID,无上界),符合 PR 目标,能够解决心跳偏移量溢出问题。

getOrAllocateHostIdRef(...) 在分配失败时可能返回 null,第 257 行 ref.getHostId() 仍存在空指针风险。建议在分配后添加校验:

🛡️ 建议修复
                         if (ref == null || ref.getHostId() == 0) {
                             logger.warn(String.format("not found hostId for hostUuid[%s] and primaryStorageUuid[%s]",
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid()));
                             ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef(
                                     param.getHostUuid(), param.getPrimaryStorage().getUuid());
                         }
+
+                        if (ref == null || ref.getHostId() <= 0) {
+                            trigger.fail(operr("failed to allocate valid hostId for hostUuid[%s], primaryStorageUuid[%s]",
+                                    param.getHostUuid(), param.getPrimaryStorage().getUuid()));
+                            return;
+                        }

                         KvmSetupSelfFencerCmd cmd = new KvmSetupSelfFencerCmd();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 240 - 257, The code uses ExternalPrimaryStorageHostRefVO.ref and
then calls ref.getHostId() after possibly calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...); add a null check after the
allocation call to avoid NPE: after ref = new
ExternalHostIdGetter(999).getOrAllocateHostIdRef(...) verify ref != null and
ref.getHostId() != 0, and if the allocation failed log an error (including
hostUuid and primaryStorageUuid) and return/fail the operation (or throw an
appropriate exception) instead of proceeding to construct KvmSetupSelfFencerCmd
with a null ref; update the use sites (KvmSetupSelfFencerCmd.hostId assignment)
to only run when ref is valid.
🧹 Nitpick comments (1)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

23-24: 通配符导入可考虑改为显式导入。

org.zstack.header.storage.addon.primary.* 使用通配符导入会降低代码可读性,且可能引入不必要的类名冲突风险。建议显式导入实际使用的类(如 ExternalPrimaryStorageHostRefVOExternalPrimaryStorageHostRefVO_)。

♻️ 建议修改
-import org.zstack.header.storage.addon.primary.*;
+import org.zstack.header.storage.addon.primary.ExternalPrimaryStorageHostRefVO;
+import org.zstack.header.storage.addon.primary.ExternalPrimaryStorageHostRefVO_;
+import org.zstack.header.storage.addon.primary.HeartbeatVolumeTO;
+import org.zstack.header.storage.addon.primary.HeartbeatVolumeTopology;
+import org.zstack.header.storage.addon.primary.PrimaryStorageNodeSvc;
+import org.zstack.header.storage.addon.primary.BaseVolumeInfo;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 23 - 24, Replace the wildcard import
org.zstack.header.storage.addon.primary.* with explicit imports for only the
classes actually used in KvmIscsiNodeServer (e.g.,
ExternalPrimaryStorageHostRefVO and ExternalPrimaryStorageHostRefVO_) and keep
the existing ExternalHostIdGetter import; update the import section in
KvmIscsiNodeServer.java to list these concrete types to improve readability and
avoid accidental name conflicts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 240-257: The code uses ExternalPrimaryStorageHostRefVO.ref and
then calls ref.getHostId() after possibly calling
ExternalHostIdGetter.getOrAllocateHostIdRef(...); add a null check after the
allocation call to avoid NPE: after ref = new
ExternalHostIdGetter(999).getOrAllocateHostIdRef(...) verify ref != null and
ref.getHostId() != 0, and if the allocation failed log an error (including
hostUuid and primaryStorageUuid) and return/fail the operation (or throw an
appropriate exception) instead of proceeding to construct KvmSetupSelfFencerCmd
with a null ref; update the use sites (KvmSetupSelfFencerCmd.hostId assignment)
to only run when ref is valid.

---

Nitpick comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 23-24: Replace the wildcard import
org.zstack.header.storage.addon.primary.* with explicit imports for only the
classes actually used in KvmIscsiNodeServer (e.g.,
ExternalPrimaryStorageHostRefVO and ExternalPrimaryStorageHostRefVO_) and keep
the existing ExternalHostIdGetter import; update the import section in
KvmIscsiNodeServer.java to list these concrete types to improve readability and
avoid accidental name conflicts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 9901a224-6de1-4d7e-b375-0cdf804119d5

📥 Commits

Reviewing files that changed from the base of the PR and between 6533eee and 3fa9b47.

📒 Files selected for processing (1)
  • plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/fix/ZSTAC-81797@@2 branch from 3fa9b47 to 14a8604 Compare March 13, 2026 07:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql (1)

8-83: ⚠️ Potential issue | 🟠 Major

请将迁移脚本固定到 zstack schema,并统一使用全限定表名。

Line 9 仍使用 TABLE_SCHEMA = DATABASE(),且后续 HostEO / GpuDeviceVO 等表均未加 zstack. 前缀,迁移目标会依赖当前连接库,升级一致性风险较高。

建议修改示例
-    IF (SELECT COUNT(*) FROM information_schema.TABLES
-        WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'BareMetal2ChassisGpuDeviceVO') > 0 THEN
+    IF (SELECT COUNT(*) FROM `information_schema`.`TABLES`
+        WHERE `TABLE_SCHEMA` = 'zstack' AND `TABLE_NAME` = 'BareMetal2ChassisGpuDeviceVO') > 0 THEN

-INSERT INTO `HostEO` (
+INSERT INTO zstack.`HostEO` (

-FROM `BareMetal2ChassisVO` c
+FROM zstack.`BareMetal2ChassisVO` c

Based on learnings: In ZStack, the database schema name is fixed as 'zstack'. For SQL upgrade scripts under conf/db/upgrade, continue using TABLE_SCHEMA = 'zstack' and reference tables as zstack.UsedIpVO.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql` around lines 8 - 83, The
script uses unqualified table names and TABLE_SCHEMA = DATABASE(), causing
schema ambiguity; update the migration to target the fixed 'zstack' schema by
replacing TABLE_SCHEMA = DATABASE() with TABLE_SCHEMA = 'zstack' and fully
qualify every table reference (e.g., zstack.`HostEO`, zstack.`GpuDeviceVO`,
zstack.`PciDeviceVO`, zstack.`BareMetal2ChassisVO`,
zstack.`BareMetal2ChassisPciDeviceVO`, zstack.`BareMetal2ChassisGpuDeviceVO`) in
all SELECT, INSERT, JOIN, UPDATE and ALTER TABLE statements (including the
EXISTS subqueries and ALTER TABLE `_migrated` addition) so every occurrence is
prefixed with zstack.
🧹 Nitpick comments (2)
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)

23-24: 通配符导入可考虑改为显式导入

第 23 行使用了通配符导入 org.zstack.header.storage.addon.primary.*。虽然功能正常,但显式导入(如 ExternalPrimaryStorageHostRefVOExternalPrimaryStorageHostRefVO_HeartbeatVolumeTO 等)可提高代码可读性和可维护性。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 23 - 24, The file currently uses a wildcard import
org.zstack.header.storage.addon.primary.*; replace that wildcard with explicit
imports for the types actually used in KvmIscsiNodeServer (e.g.
ExternalPrimaryStorageHostRefVO, ExternalPrimaryStorageHostRefVO_,
HeartbeatVolumeTO, plus any other concrete classes referenced from that package)
and keep ExternalHostIdGetter as already imported; remove the wildcard import
and add only the specific import statements to improve readability and
maintainability.
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

173-173: 将 EMA 参数提取为命名常量,避免魔法值。

0.33 直接写在构造参数里可读性较弱,后续调参也不利于维护。

♻️ 建议修改
+    private static final double PING_LATENCY_EMA_ALPHA = 0.3;
+    private static final int PING_LATENCY_EMA_MULTIPLIER = 3;
+
-    private final PingLatencyEma pingLatencyEma = new PingLatencyEma(0.3, 3);
+    private final PingLatencyEma pingLatencyEma =
+            new PingLatencyEma(PING_LATENCY_EMA_ALPHA, PING_LATENCY_EMA_MULTIPLIER);

As per coding guidelines: “避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为常量或枚举。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` at line 173, Extract
the literal constructor arguments for PingLatencyEma into well-named constants
in KVMHost (e.g., private static final double PING_LATENCY_EMA_ALPHA = 0.3 and
private static final int PING_LATENCY_EMA_WINDOW = 3) and replace the magic
values in the pingLatencyEma initialization (new PingLatencyEma(...)) with those
constants so the purpose and tunable parameters are clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql`:
- Around line 8-10: The IF condition that gates the migration only checks for
BareMetal2ChassisGpuDeviceVO, which is too narrow; update the entry check to
verify the existence of all dependent tables (BareMetal2ChassisGpuDeviceVO,
BareMetal2ChassisVO, BareMetal2ChassisPciDeviceVO, HostEO, PciDeviceVO,
GpuDeviceVO) before proceeding and abort/skip the migration if any are missing.
Locate the IF block in V6.0.0.2__bm2_gpu_migrate.sql and replace the
single-table COUNT/EXISTS check with a combined condition that ensures every
listed table exists (e.g., a single query that counts matching table names in
information_schema.TABLES or multiple EXISTS checks) so the script only runs
when all required tables are present.
- Line 79: The ALTER TABLE using "ADD COLUMN IF NOT EXISTS" on table
BareMetal2ChassisPciDeviceVO (column `_migrated`) is MySQL 8+ specific and
breaks compatibility with MySQL 5.7; replace it with an existence check against
information_schema.COLUMNS inside the migration (as done in
V6.0.0.1__gpu_add_scope.sql and V6.0.0.3__mdev_subtable.sql): query
information_schema for TABLE_SCHEMA = DATABASE(), TABLE_NAME =
'BareMetal2ChassisPciDeviceVO' and COLUMN_NAME = '_migrated', and only run ALTER
TABLE `BareMetal2ChassisPciDeviceVO` ADD COLUMN `_migrated` TINYINT(1) DEFAULT 0
within an IF NOT EXISTS (...) THEN ... END IF block to ensure idempotence and
5.7 compatibility.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 172-174: This change introduces a new per-host ping latency EMA
(PingLatencyEma) field (pingLatencyEma) inside KVMHost which is an unrelated
behavioral change to this PR focused on iSCSI fencer hostId/offset; revert the
addition or move it to a separate PR: remove the PingLatencyEma declaration and
any usages from the KVMHost class (and related refs) in this branch, or extract
the EMA feature into a standalone PR referenced from this change, ensuring
KVMHost only contains fixes relevant to iSCSI fencer (hostId/offset) and no
adaptive timeout logic.

---

Duplicate comments:
In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql`:
- Around line 8-83: The script uses unqualified table names and TABLE_SCHEMA =
DATABASE(), causing schema ambiguity; update the migration to target the fixed
'zstack' schema by replacing TABLE_SCHEMA = DATABASE() with TABLE_SCHEMA =
'zstack' and fully qualify every table reference (e.g., zstack.`HostEO`,
zstack.`GpuDeviceVO`, zstack.`PciDeviceVO`, zstack.`BareMetal2ChassisVO`,
zstack.`BareMetal2ChassisPciDeviceVO`, zstack.`BareMetal2ChassisGpuDeviceVO`) in
all SELECT, INSERT, JOIN, UPDATE and ALTER TABLE statements (including the
EXISTS subqueries and ALTER TABLE `_migrated` addition) so every occurrence is
prefixed with zstack.

---

Nitpick comments:
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 23-24: The file currently uses a wildcard import
org.zstack.header.storage.addon.primary.*; replace that wildcard with explicit
imports for the types actually used in KvmIscsiNodeServer (e.g.
ExternalPrimaryStorageHostRefVO, ExternalPrimaryStorageHostRefVO_,
HeartbeatVolumeTO, plus any other concrete classes referenced from that package)
and keep ExternalHostIdGetter as already imported; remove the wildcard import
and add only the specific import statements to improve readability and
maintainability.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Line 173: Extract the literal constructor arguments for PingLatencyEma into
well-named constants in KVMHost (e.g., private static final double
PING_LATENCY_EMA_ALPHA = 0.3 and private static final int
PING_LATENCY_EMA_WINDOW = 3) and replace the magic values in the pingLatencyEma
initialization (new PingLatencyEma(...)) with those constants so the purpose and
tunable parameters are clear.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 753d39e6-a2d2-49e0-b37c-d49e1cc6b5e6

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa9b47 and 14a8604.

⛔ Files ignored due to path filters (7)
  • compute/pom.xml is excluded by !**/*.xml
  • sdk/src/main/java/org/zstack/sdk/GetGpuMetricsBatchAction.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuMetricsBatchResult.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuResourceTreeAction.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GetGpuResourceTreeResult.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GpuMetricsEntry.java is excluded by !sdk/**
  • sdk/src/main/java/org/zstack/sdk/GpuResourceTreeNode.java is excluded by !sdk/**
📒 Files selected for processing (10)
  • compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java
  • compute/src/main/java/org/zstack/compute/host/PingLatencyEma.java
  • compute/src/test/java/org/zstack/compute/host/HostTrackJitterTest.java
  • compute/src/test/java/org/zstack/compute/host/PingLatencyEmaTest.java
  • conf/db/upgrade/V6.0.0.1__gpu_add_scope.sql
  • conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql
  • conf/db/upgrade/V6.0.0.3__mdev_subtable.sql
  • core/src/main/java/org/zstack/core/thread/AsyncTimer.java
  • plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java

Comment on lines +8 to +10
IF (SELECT COUNT(*) FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'BareMetal2ChassisGpuDeviceVO') > 0 THEN

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

升级场景判定过窄:仅检查一张表存在性不够稳妥。

当前 IF 只检查 BareMetal2ChassisGpuDeviceVO,但后续还依赖 BareMetal2ChassisVOBareMetal2ChassisPciDeviceVOHostEOPciDeviceVOGpuDeviceVO。建议在入口一次性检查全部依赖表,避免部分环境直接报错中断升级。

As per coding guidelines: Review the SQL code, make sure has no errors and confirm that upgrading scene has been carefully handled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql` around lines 8 - 10, The IF
condition that gates the migration only checks for BareMetal2ChassisGpuDeviceVO,
which is too narrow; update the entry check to verify the existence of all
dependent tables (BareMetal2ChassisGpuDeviceVO, BareMetal2ChassisVO,
BareMetal2ChassisPciDeviceVO, HostEO, PciDeviceVO, GpuDeviceVO) before
proceeding and abort/skip the migration if any are missing. Locate the IF block
in V6.0.0.2__bm2_gpu_migrate.sql and replace the single-table COUNT/EXISTS check
with a combined condition that ensures every listed table exists (e.g., a single
query that counts matching table names in information_schema.TABLES or multiple
EXISTS checks) so the script only runs when all required tables are present.

);

-- Step 4: Mark old table as migrated (preserve for 1 version)
ALTER TABLE `BareMetal2ChassisPciDeviceVO` ADD COLUMN IF NOT EXISTS `_migrated` TINYINT(1) DEFAULT 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 1) 查找仓库中声明的 MySQL/GreatSQL/MariaDB 版本信息 =="
rg -n -i "mysql|greatsql|mariadb|8\\.0|5\\.7|db version|database version" README* **/*.md **/*.yml **/*.yaml **/pom.xml || true

echo
echo "== 2) 统计项目内是否已有同类语法使用(判断项目基线) =="
rg -n "ADD COLUMN IF NOT EXISTS" conf/db || true

echo
echo "== 3) 对比升级脚本中常见替代写法(information_schema + dynamic SQL) =="
rg -n "information_schema\\.TABLES|information_schema\\.COLUMNS|PREPARE|EXECUTE|ALTER TABLE" conf/db/upgrade || true

Repository: MatheMatrix/zstack

Length of output: 50442


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 1) 查找 pom.xml 中的 MySQL Connector 版本信息 =="
rg -A 3 "mysql-connector-j" --type xml | head -20

echo
echo "== 2) 查找项目文档中明确的数据库版本要求 =="
fd -e md -e txt -e rst | xargs grep -l -i "mysql\|database.*version\|db.*version\|supported.*version" 2>/dev/null | head -5 | xargs cat 2>/dev/null | grep -i "mysql\|5\.7\|8\.0\|version" | head -10 || true

echo
echo "== 3) 检查 V6.0.0.2__bm2_gpu_migrate.sql 文件的完整内容 =="
cat -n conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql

echo
echo "== 4) 查看同版本或相邻版本的升级脚本模式 =="
ls -lah conf/db/upgrade/ | grep -i "v6\|v5" | tail -10

Repository: MatheMatrix/zstack

Length of output: 5620


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 1) 查找项目 README 或安装文档中的数据库版本要求 =="
find . -maxdepth 2 -name "README*" -o -name "INSTALL*" -o -name "CONTRIBUTING*" | head -5 | xargs cat 2>/dev/null | grep -i -A 2 "mysql\|database\|version\|requirement" | head -20 || true

echo
echo "== 2) 查看相邻升级脚本 V6.0.0.1 和 V6.0.0.3 的内容 =="
echo "--- V6.0.0.1 内容 ---"
cat conf/db/upgrade/V6.0.0.1__gpu_add_scope.sql

echo
echo "--- V6.0.0.3 内容 ---"
cat conf/db/upgrade/V6.0.0.3__mdev_subtable.sql

echo
echo "== 3) 搜索整个项目中是否有关于 MySQL 8.0+ 特性的其他注释 =="
rg -n "mysql.*8|MySQL.*8\.0|version.*8\." --type-list | head -20 || true

echo
echo "== 4) 检查是否在其他脚本中使用过存储过程内的 information_schema 列检查 =="
rg -n "information_schema.COLUMNS" conf/db/upgrade/ || echo "No COLUMNS checks found in upgrade scripts"

Repository: MatheMatrix/zstack

Length of output: 7343


改用 information_schema.COLUMNS 检查列存在性,替代 MySQL 8.0+ 特有的 ADD COLUMN IF NOT EXISTS 语法。

第 79 行使用的 ADD COLUMN IF NOT EXISTS 仅在 MySQL 8.0.0+ 中被支持。若项目需要兼容 MySQL 5.7(当前 mysql-connector-j 8.2.0 支持 5.7+),该语法会导致升级失败。

同一版本发布的相邻升级脚本 V6.0.0.1__gpu_add_scope.sqlV6.0.0.3__mdev_subtable.sql 已经采用了在存储过程内使用 information_schema.COLUMNS 进行列存在性检查的方式,既保证了幂等性,又确保了跨版本兼容性。建议第 79 行改为:

IF NOT EXISTS (SELECT 1 FROM information_schema.COLUMNS
               WHERE TABLE_SCHEMA = DATABASE()
               AND TABLE_NAME = 'BareMetal2ChassisPciDeviceVO'
               AND COLUMN_NAME = '_migrated') THEN
    ALTER TABLE `BareMetal2ChassisPciDeviceVO` ADD COLUMN `_migrated` TINYINT(1) DEFAULT 0;
END IF;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/db/upgrade/V6.0.0.2__bm2_gpu_migrate.sql` at line 79, The ALTER TABLE
using "ADD COLUMN IF NOT EXISTS" on table BareMetal2ChassisPciDeviceVO (column
`_migrated`) is MySQL 8+ specific and breaks compatibility with MySQL 5.7;
replace it with an existence check against information_schema.COLUMNS inside the
migration (as done in V6.0.0.1__gpu_add_scope.sql and
V6.0.0.3__mdev_subtable.sql): query information_schema for TABLE_SCHEMA =
DATABASE(), TABLE_NAME = 'BareMetal2ChassisPciDeviceVO' and COLUMN_NAME =
'_migrated', and only run ALTER TABLE `BareMetal2ChassisPciDeviceVO` ADD COLUMN
`_migrated` TINYINT(1) DEFAULT 0 within an IF NOT EXISTS (...) THEN ... END IF
block to ensure idempotence and 5.7 compatibility.

Comment on lines +172 to +174
// Per-host ping latency EMA for adaptive timeout (ZSTAC-67534)
private final PingLatencyEma pingLatencyEma = new PingLatencyEma(0.3, 3);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

该改动与本 PR 目标不一致,建议拆分或回退本提交。

本 PR 目标是修复 iSCSI fencer 的 hostId/offset 问题,但当前实际改动是 KVM ping 自适应超时逻辑(EMA + timeout 替换),属于额外行为变更,放在 5.5.12 分支有发布风险。建议将此改动移到独立 PR,或确认 PR 描述与提交内容是否错位。

Based on learnings: “ZStack项目在cherry-pick操作中,即使发现了性能优化机会,也严格遵循不做额外修改的政策,优先保证cherry-pick的完整性和一致性。”

Also applies to: 5018-5020, 5030-5030, 5060-5060

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 172 - 174,
This change introduces a new per-host ping latency EMA (PingLatencyEma) field
(pingLatencyEma) inside KVMHost which is an unrelated behavioral change to this
PR focused on iSCSI fencer hostId/offset; revert the addition or move it to a
separate PR: remove the PingLatencyEma declaration and any usages from the
KVMHost class (and related refs) in this branch, or extract the EMA feature into
a standalone PR referenced from this change, ensuring KVMHost only contains
fixes relevant to iSCSI fencer (hostId/offset) and no adaptive timeout logic.

…or heartbeat offset

ZSTAC-81797: iSCSI fencer used HeartbeatVolumeTO.hostId (storage bdc id,
incremental up to 2000+) as heartbeat write offset multiplier. With 1MB
heartbeat_required_space, host_id=2000 causes 2000MB offset, exceeding
the 2GB heartbeat volume and failing heartbeat writes.

Fix: use ExternalPrimaryStorageHostRefVO.hostId (platform-managed compact
ID in [1, 999]) instead, same as CBD fencer already does.

Resolves: ZSTAC-81797

Change-Id: Icac5ee2059df7cf93ca4bc23829ce855d6a7184c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MatheMatrix MatheMatrix force-pushed the sync/zstackio/fix/ZSTAC-81797@@2 branch from 14a8604 to ecc93eb Compare March 13, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants