<fix>[compute]: meta#3468
<fix>[compute]: meta#3468MatheMatrix wants to merge 1 commit intofeature-zsv-5.0.0-vm-registrationfrom
Conversation
Walkthrough新增 VM 实例元数据管理系统,支持元数据脏标记、dirty polling、路径指纹跟踪、孤立检测和恢复。涵盖数据库架构、API 消息、拦截器、存储后端集成和全局配置。 Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant API as API Gateway
participant Interceptor as MetadataUpdateInterceptor
participant Marker as VmMetadataDirtyMarker
participant Storage as Primary Storage
participant Hypervisor as KVM Hypervisor
User->>API: Send API (e.g., AttachVolume)
activate API
API->>Interceptor: BeforeDeliveryMessageInterceptor<br/>Resolve VM UUIDs via resolver chain
activate Interceptor
Interceptor-->>API: Cached VM UUIDs
deactivate Interceptor
API->>API: Process API Request
API->>Interceptor: BeforePublishEventInterceptor<br/>API succeeded
activate Interceptor
Interceptor->>Marker: markDirty(vmUuid, storageStructureChange)
activate Marker
Marker->>Marker: Insert/Update VmMetadataDirtyVO
Marker-->>Interceptor: Dirty record created
deactivate Marker
Interceptor-->>API: Done
deactivate Interceptor
API-->>User: API Response
deactivate API
Note over Marker: Background polling
Marker->>Marker: Polling task claims dirty row
activate Marker
Marker->>Marker: Build VmInstanceMetadataDTO<br/>via VmMetadataBuilder
Marker->>Storage: UpdateVmInstanceMetadataOnPrimaryStorageMsg
activate Storage
Storage->>Hypervisor: UpdateVmInstanceMetadataOnHypervisorMsg
activate Hypervisor
Hypervisor->>Hypervisor: Write metadata to path
Hypervisor-->>Storage: Success/Error reply
deactivate Hypervisor
Storage-->>Marker: Metadata updated
deactivate Storage
Marker->>Marker: Update VmMetadataDirtyVO<br/>Clear claim, save fingerprint
deactivate Marker
sequenceDiagram
actor User
participant API as API Gateway
participant Validator as VmInstanceMetadataValidator
participant Processor as VmInstanceMetadataFieldProcessor
participant Storage as Primary Storage
participant DB as Database
User->>API: APIRegisterVmInstanceFromMetadataMsg<br/>(metadataPath, primaryStorageUuid, ...)
activate API
API->>API: Pre-check validation
activate API
API->>Validator: validate(DTO, schemaVersion)
activate Validator
Validator->>Validator: Check schemaVersion match
Validator->>Validator: Validate resource UUID consistency
Validator->>Validator: Validate snapshot group integrity
Validator-->>API: Validation result
deactivate Validator
deactivate API
API->>Storage: ReadVmInstanceMetadataMsg<br/>from metadataPath
activate Storage
Storage-->>API: Metadata JSON payload
deactivate Storage
API->>Processor: processVmInstanceVO<br/>processVolumeVO<br/>processSystemTagVO<br/>etc.
activate Processor
Processor->>Processor: Field sanitization<br/>Null fields, set zone/account<br/>Normalize install paths<br/>Regenerate UUIDs
Processor-->>API: Cleaned metadata
deactivate Processor
API->>DB: Create VM, volumes,<br/>system tags, configs
activate DB
DB-->>API: VM registered
deactivate DB
API->>API: Update path fingerprint<br/>Save to VmMetadataPathFingerprintVO
API-->>User: APIRegisterVmInstanceFromMetadataEvent<br/>with VmInstanceInventory
deactivate API
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (18)
header/src/main/java/org/zstack/header/storage/primary/RegisterVmInstanceException.java-3-3 (1)
3-3:⚠️ Potential issue | 🟠 Major接口命名不当:
Exception后缀应仅用于异常类。此接口名为
RegisterVmInstanceException,但它实际上是一个接口而非异常类。根据编码规范,"异常类应以Exception结尾",这意味着只有真正的异常类才应使用此后缀。当前命名会误导开发者认为这是一个异常类型。建议重命名为更能体现其职责的名称,例如:
RegisterVmInstancePathUpdaterVmInstanceRegistrationHandlerRegisterVmInstanceExceptionHandler(如果其职责是处理注册异常)♻️ 建议的命名修改
-public interface RegisterVmInstanceException { +public interface RegisterVmInstancePathUpdater {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/primary/RegisterVmInstanceException.java` at line 3, The interface currently named RegisterVmInstanceException is misleading because the Exception suffix should be reserved for throwable classes; rename the interface to a clear responsibility-driven name (e.g., VmInstanceRegistrationHandler or RegisterVmInstancePathUpdater), update the interface declaration (RegisterVmInstanceException -> VmInstanceRegistrationHandler) and all references/usages across the codebase, and adjust any Javadoc/comments and implementing classes to reflect the new name so imports and builds remain consistent.header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotTree.java-446-460 (1)
446-460:⚠️ Potential issue | 🟠 Major遍历结果现在不具备稳定顺序。
这里按
children的现有插入顺序入队;而children顺序又取决于fromInventories()收到的输入顺序。只要上游查询结果顺序波动,同一棵快照树的遍历结果就会变化,后续如果拿它参与元数据序列化/指纹计算,会出现伪漂移。🛠️ 可参考的修正方式
public List<VolumeSnapshotInventory> levelOrderTraversal() { List<VolumeSnapshotInventory> result = new ArrayList<>(); if (this.root == null) { return result; } Queue<SnapshotLeaf> queue = new LinkedList<>(); queue.offer(this.root); while (!queue.isEmpty()) { SnapshotLeaf currentLeaf = queue.poll(); result.add(currentLeaf.getInventory()); - for (SnapshotLeaf child : currentLeaf.getChildren()) { + List<SnapshotLeaf> orderedChildren = new ArrayList<>(currentLeaf.getChildren()); + orderedChildren.sort(Comparator.comparing(SnapshotLeaf::getUuid)); + for (SnapshotLeaf child : orderedChildren) { queue.offer(child); } } return result; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotTree.java` around lines 446 - 460, The levelOrderTraversal() result is non-deterministic because it enqueues SnapshotLeaf children in the order returned by getChildren(), which depends on input ordering; to fix, make the enqueue order stable by sorting the children collection before offering them to the queue (inside levelOrderTraversal()). Use a stable key from each child’s inventory (e.g. SnapshotLeaf.getInventory().getUuid() or a timestamp like getInventory().getCreateDate() if present) and sort via Comparator before the for-loop that calls queue.offer(child) so the BFS traversal order becomes deterministic.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java-1966-1968 (1)
1966-1968:⚠️ Potential issue | 🟠 Major这里应该按
hostUuid选择 backend。前面已经解析出了目标 host,但这里又回退到
getUsableBackend()取任意可用实现。多 cluster / 多 hypervisor 场景下,这可能拿到与hostUuid不匹配的 backend;本文件已有getBackendByHostUuid(hostUuid)可直接复用。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1966 - 1968, 这里目前在处理消息时用 getUsableBackend() 随机取回端实现,但应该按目标主机选择后端以避免多 cluster/多 hypervisor 情况下选错,实现方式:在调用 backend.handle(msg, hostUuid, ...) 之前改用 getBackendByHostUuid(hostUuid) 获取后端(即用 getBackendByHostUuid 替换 getUsableBackend),并在找不到返回值时做好错误处理/抛出或返回失败回复;参考要修改的标识符包括 getUsableBackend(), getBackendByHostUuid(hostUuid) 和 backend.handle(...)。plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java-1959-1960 (1)
1959-1960:⚠️ Potential issue | 🟠 Major根盘在排队期间被删除时,这里会先抛异常而不是回失败。
getHostUuidFromVolume()内部会直接解引用dbf.findByUuid()的结果。元数据更新走异步链任务,rootVolumeUuid在执行前被清理并不罕见,这里最好先做空检查并返回结构化错误。 Based on learnings,dbf.findByUuid()可能在找不到记录时返回 null,需要做 null 检查。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1959 - 1960, 在调用 getHostUuidFromVolume(msg.getRootVolumeUuid()) 后不要直接使用其返回值;getHostUuidFromVolume 内部会直接解引用 dbf.findByUuid() 的结果,dbf.findByUuid() 可能返回 null 当根盘在队列中被删除时会导致 NPE 并抛出异常。修改点:在 NfsPrimaryStorage 的处理逻辑里对 hostUuid 做 null/空检查(检查 msg.getRootVolumeUuid() 的存在性并检查 getHostUuidFromVolume 返回值),如果为 null 或空则不要抛异常,构造并返回一个结构化的失败响应(例如带明确错误码和描述的 FailureReply/ReturnValue),并在日志中记录 rootVolumeUuid 与上下文信息以便排查;同时可在 getHostUuidFromVolume 中加入对 dbf.findByUuid() 返回 null 的防护并返回 null 而非直接解引用。plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java-3801-3823 (1)
3801-3823:⚠️ Potential issue | 🟠 Major不要在这个
handle()里直接bus.reply()。这个分支拿到了
ReturnValueCompletion却绕过它直接回包,和本类其他handle(..., completion)的约定不一致,容易让上层分发链收不到完成信号。另外这里把self.getUuid()当成 VM UUID 写进错误信息,实际打印出来的是主存储 UUID。🛠 建议修改
- bus.send(umsg, new CloudBusCallBack(msg) { + bus.send(umsg, new CloudBusCallBack(completion) { `@Override` public void run(MessageReply r) { UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); if (!r.isSuccess()) { - reply.setError(Platform.operr("failed to update vm[uuid=%s] on hypervisor.", self.getUuid()) + reply.setError(Platform.operr( + "failed to update VM metadata for rootVolume[uuid=%s] on host[uuid=%s].", + msg.getRootVolumeUuid(), hostUuid) .withCause(r.getError())); } - bus.reply(msg, reply); + completion.success(reply); } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java` around lines 3801 - 3823, The handler currently calls bus.reply(...) inside the async callback and uses self.getUuid() in the error message; instead, follow the method contract that accepts ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply> by invoking the completion callbacks instead of bus.reply: inside the CloudBusCallBack.run, construct an UpdateVmInstanceMetadataOnPrimaryStorageReply, and on success call completion.success(reply) (or completion.complete(reply) per local helper) and on failure call completion.fail(reply.getError()) (or completion.fail(...) with the constructed ErrorCode); also fix the error message to reference the VM UUID from the incoming msg (e.g. msg.getVmInstanceUuid()) instead of self.getUuid() so the logged UUID is the VM's.compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-3198-3208 (1)
3198-3208:⚠️ Potential issue | 🟠 Major注册态把只读 API 也一起拦掉了
Line 3202 这里只放行了
APIDestroyVmInstanceMsg,所以后面这些APIGet*/只读请求在Registering期间也会直接失败,和注释里“允许 Destroy/read”不一致。注册中的轮询与查询会被误伤,建议改成“只拦截写操作”或补齐只读白名单。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 3198 - 3208, The guard in handleApiMessage currently only allows APIDestroyVmInstanceMsg during VmInstanceState.Registering, causing read/query APIs to be blocked contrary to the comment; update the check to allow read-only API messages (e.g., APIQuery*/APIGet* classes) or invert the logic to block only write/mutating messages: locate handleApiMessage and the Registering-state branch in VmInstanceBase, and either extend the whitelist beyond APIDestroyVmInstanceMsg to include the relevant read message classes (APIGetVmInstanceMsg, APIQueryVmInstanceMsg, etc.) or change the condition to reject only known mutating message types rather than permitting a single destroy type. Ensure bus.replyErrorByMessageType remains used for blocked messages.compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-9447-9450 (1)
9447-9450:⚠️ Potential issue | 🟠 Majorroot volume 查询缺少兜底,会把失败变成超时
Line 9447 的
findTuple()在注册早期、删除并发或回滚后都可能拿不到 root volume;现在 Line 9449/9450 直接解引用会抛异常,VmMetadataDirtyMarker看到的会是超时/未回复,而不是正常的失败 reply。这里至少要先判空并显式bus.reply(...)。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 9447 - 9450, The current code calls Q.New(VolumeVO.class)...findTuple() and immediately dereferences the returned Tuple into primaryStorageUuid and rootVolumeUuid, which can be null when the root volume is gone/never created; modify the logic around the Tuple returned by findTuple() (in VmInstanceBase where Tuple is assigned and primaryStorageUuid/rootVolumeUuid are set) to check for null (or empty) before accessing elements, and if missing construct and send an explicit failure reply on the bus (using the existing message/reply mechanism) instead of letting a NullPointerException propagate and cause a timeout; ensure the failure reply includes sufficient context (vm UUID and reason) so callers like VmMetadataDirtyMarker observe a proper failure not a timeout.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java-2058-2064 (1)
2058-2064:⚠️ Potential issue | 🟠 Major潜在的空指针异常:installPath 可能为 null
当
rootVolumeUuid对应的VolumeVO不存在时,findValue()返回 null,后续在第 2062 行调用replaceFirst会抛出NullPointerException。此外,如果
installPath的格式不符合预期的正则模式^(.+/vol-[^/]+/).*$,replaceFirst将返回原始字符串,导致metadataPath错误。🐛 建议的修复方案
String installPath = Q.New(VolumeVO.class) .eq(VolumeVO_.uuid, msg.getRootVolumeUuid()) .select(VolumeVO_.installPath) .findValue(); + if (installPath == null) { + completion.success(new UpdateVmInstanceMetadataOnPrimaryStorageReply()); + return; + } String path = installPath.replaceFirst("^(.+/vol-[^/]+/).*$", "$1"); + if (path.equals(installPath)) { + logger.warn(String.format("failed to extract volume directory from installPath[%s] for vm[uuid:%s]", + installPath, msg.getVmInstanceUuid())); + completion.success(new UpdateVmInstanceMetadataOnPrimaryStorageReply()); + return; + } String metadataPath = String.format("%s%s", path, VM_META_SUFFIX);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java` around lines 2058 - 2064, The code in NfsPrimaryStorageKVMBackend that builds metadataPath uses installPath from VolumeVO without null or format checks; guard against a null installPath (from Q.New(...).findValue()) and handle it (return an error response, throw a meaningful exception, or set a safe default) before calling replaceFirst, and ensure the installPath matches the expected pattern (e.g., via Pattern/Matcher or String.matches) so you only apply replaceFirst when the regex will produce the intended prefix; update the code paths around installPath, path, and umsg.setMetadataPath to handle the null/invalid-format cases with a clear error or fallback.compute/src/main/java/org/zstack/compute/vm/VmExpungeMetadataFlow.java-51-71 (1)
51-71:⚠️ Potential issue | 🟠 Major这里再查
VolumeVO会把 metadata 清理直接短路类注释已经说明这个 Flow 位于根卷删除之后;这样一来,Line 60-63 很可能已经查不到
primaryStorageUuid,随后 Line 65-70 会直接跳过清理,导致 expunge 后的 metadata 残留在主存上。这里应该在删卷前把 root PS UUID 放进 flow data,或者直接从VmInstanceSpec里已有的 root volume 信息取值,不要依赖删除后的VolumeVO记录。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeMetadataFlow.java` around lines 51 - 71, The flow currently queries VolumeVO for the root volume's primaryStorageUuid at runtime (using Q.New(VolumeVO.class) with VolumeVO_.primaryStorageUuid), which can return null after the root volume is deleted and short-circuit metadata cleanup; instead obtain the root primaryStorageUuid from the pre-deletion data: read it from VmInstanceSpec (spec.getVmInventory() / spec.getRootVolume() or existing root volume info) or store the root primaryStorageUuid into the flow data before the volume-delete flow step, and then use that stored value in VmExpungeMetadataFlow to drive metadata cleanup rather than querying VolumeVO after deletion.compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java-75-78 (1)
75-78:⚠️ Potential issue | 🟠 Major多 MN 下会重复处理同一批 stale 记录。
managementNodeReady()会在每个 MN 上启动这个任务,但这里的查询和后续staleRecoveryCount = staleRecoveryCount + 1没有任何 claim/CAS。两个 MN 同时扫到同一行时,都会markDirty()并各自计数,maxCycles会被按 MN 数量提前耗尽。这里需要先用条件更新抢占这行,只有抢到的一方才能 requeue 和累加计数。Also applies to: 133-174
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java` around lines 75 - 78, 当前实现会在每个 management node 上并发启动任务(managementNodeReady()/startTask()),但对待处理的 stale 条目在查询后没有做原子抢占/条件更新,导致多个 MN 同时处理同一条并各自调用 markDirty() 并累加 staleRecoveryCount,致使 maxCycles 被多次消耗;请在处理前实现条件更新(CAS)来“claim”该记录——例如在处理前用带条件的 UPDATE 或新增 owner/ownerUuid 字段并以 WHERE staleRecoveryCount = <expected> 或 owner IS NULL 的条件去设置 owner 为当前 MN 的 id,只有更新成功的一方才继续后续 requeue/累计 staleRecoveryCount 和调用 markDirty(),并把同样的改动应用到涉及相同行为的代码块(参考处理范围中提到的 133-174 行对应的方法/逻辑)。docs/design/vm-metadata-04c-sblk写入流程.md-74-79 (1)
74-79:⚠️ Potential issue | 🟠 Major摘要区的偏移定义前后不一致。
这里把 VM 摘要区写成
[96, 928),但SummaryChecksum又对bytes[96:896]取哈希。896-928到底是摘要数据还是 checksum 存储区现在说不清,这类字节布局文档最好拆成两个明确区间。📝 建议把数据区和校验区分开写
- VM 摘要区 [96, 928):保持旧值不变(Phase 1 不更新摘要) + VM 摘要数据区 [96, 896):保持旧值不变(Phase 1 不更新摘要) + SummaryChecksum 区 [896, 928):保持旧值不变 ... - VM 摘要区 [96, 928): + VM 摘要数据区 [96, 896): VmCategory = vm_category ← 此时更新 VmUuid = vm_uuid ← 此时更新(首次写入后不变) VmNameLen = len(vm_name_utf8) ← 此时更新 VmName = vm_name_utf8 ← 此时更新 SummaryChecksum = SHA-256(bytes[96:896])Also applies to: 122-129
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04c-sblk写入流程.md` around lines 74 - 79, Document splits the VM summary region ambiguously: the header says VM summary is [96, 928) but SummaryChecksum is defined as hash of bytes[96:896], leaving bytes 896–928 unclear; update the layout to explicitly separate "summary data" and "checksum" ranges, e.g., define Summary Data = bytes[96:896) and SummaryChecksum = bytes[896:928), and adjust the SummaryChecksum description to state it is the hash of bytes[96:896]; apply the same explicit interval-splitting and consistent range references where the doc repeats the layout (the section covering lines 122–129).compute/src/main/java/org/zstack/compute/vm/metadata/ResourceBasedVmUuidFromApiResolver.java-58-64 (1)
58-64:⚠️ Potential issue | 🟠 Major需要支持查询
UserTagVO以完整处理标签删除消息。当前实现仅查询
SystemTagVO,但APIDeleteTagMsg可删除系统标签或用户标签。若删除的是UserTagVO,查询会返回 null,导致无法正确解析resourceType和resourceUuid。应参照TagManagerImpl.handle(APIDeleteTagMsg)的实现方式,使用级联查询:先查SystemTagVO,不存在则再查UserTagVO。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/ResourceBasedVmUuidFromApiResolver.java` around lines 58 - 64, The APIDeleteTagMsg handling in ResourceBasedVmUuidFromApiResolver currently only queries SystemTagVO via dbf.findByUuid and misses UserTagVO cases; update the APIDeleteTagMsg branch in the resolver to first attempt to load the tag as SystemTagVO (using dbf.findByUuid), and if that returns null then query UserTagVO the same way, then extract resourceType and resourceUuid from whichever VO is found; follow the same cascade logic as TagManagerImpl.handle(APIDeleteTagMsg) so both SystemTagVO and UserTagVO deletions are handled.storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java-1833-1864 (1)
1833-1864:⚠️ Potential issue | 🟠 Major失败时未设置错误信息到响应。
当
registerVmInstance失败时,fail回调中没有将errorCode设置到event,导致调用方无法获知具体错误原因。🐛 建议的修复
`@Override` public void fail(ErrorCode errorCode) { + event.setError(errorCode); bus.publish(event); chain.next(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java` around lines 1833 - 1864, The APIRegisterVmInstanceMsg handler does not populate the error into the reply when registerVmInstance fails; in the fail callback of registerVmInstance inside handle(APIRegisterVmInstanceMsg) set the error on the APIRegisterVmInstanceReply (event) using the provided ErrorCode (e.g., event.setError(errorCode) or equivalent setter) before publishing the event so callers receive the failure details.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-898-917 (1)
898-917:⚠️ Potential issue | 🟠 Major批量初始化/升级刷新绕过了“仅 KVM UserVm”约束。
这两组 SQL 只过滤了
v.type = 'UserVm',没有带上 hypervisor 限制。这样在功能开启或升级时会给非 KVM 的 UserVm 也生成 dirty 行,后续 Poller 会把它们也送去UpdateVmInstanceMetadataMsg。既然markDirty()已经把范围限定在 KVM UserVm,这里也要补同样的谓词。💡 建议修正
- "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = :hvType AND v.uuid > :lastUuid " +
submitFullRefresh()的两条 SQL,以及submitBatchInitialization()里的初始化 SQL/分页 SQL,都需要补这条条件并绑定相同参数。Also applies to: 995-1003
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 898 - 917, The batch INSERT/UPDATE SQL in submitFullRefresh() (and the initialization/paging SQL in submitBatchInitialization()) only filter on v.type = 'UserVm' and thus wrongly include non-KVM UserVm; update those SQL statements to also filter by hypervisor (the same predicate used in markDirty(), e.g., add AND v.hypervisorType = :hypervisor or equivalent) and bind the new :hypervisor parameter in the SQL.New(...).param(...) calls so only KVM UserVm rows are inserted/updated and the poller won’t target non-KVM VMs.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-739-748 (1)
739-748:⚠️ Potential issue | 🟠 Major首次 flush 重试耗尽时会把恢复入口一起删掉。
这里先更新
VmMetadataPathFingerprintVO.lastFlushFailed,再直接删除 dirty 行;但 Line 833-846 表明 fingerprint 只会在成功 flush 后创建。也就是说第一次同步就失败到上限时,这个UPDATE很可能命中 0 行,随后 dirty 行被删掉,后续恢复任务就没有任何记录可接管了。至少要在删除前 upsert 一条 fingerprint/stale 记录,或者保留 dirty 行并显式标记 stale。Also applies to: 833-846
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 739 - 748, The current sequence updates VmMetadataPathFingerprintVO.lastFlushFailed then deletes the VmMetadataDirtyVO row, but if no fingerprint row exists (first flush failure) the UPDATE affects 0 rows and deleting the dirty row removes any recovery handle; fix by ensuring a fingerprint/stale record exists before deleting the dirty entry: either upsert a VmMetadataPathFingerprintVO with lastFlushFailed=true (create if missing) or set a “stale” flag on the VmMetadataDirtyVO and skip deletion; modify the code path around the SQL.New(...) UPDATE and the subsequent SQL.New(VmMetadataDirtyVO.class)...delete() calls so you perform an upsert/create of VmMetadataPathFingerprintVO (or mark VmMetadataDirtyVO as stale) prior to deleting the dirty row, ensuring recovery tasks can find the record.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-490-496 (1)
490-496:⚠️ Potential issue | 🟠 Major
claimDirtyRows()的回查条件不够精确。这里用
lastClaimTime >= now-5s回查“本轮刚认领”的行,会把同一 MN 上triggerFlushForVm()近 5 秒抢到的行也一起带回来。结合 Line 550-553 的去重回调,重复提交会把仍在 flush 的 claim 提前释放,其他 MN 随后就能并发接管同一 VM。建议把本轮claimAt显式写入lastClaimTime,再按managementNodeUuid + claimAt精确回查。💡 建议修正
- int claimed = SQL.New("UPDATE VmMetadataDirtyVO " + - "SET managementNodeUuid = :myId, lastClaimTime = CURRENT_TIMESTAMP " + + Timestamp claimAt = Timestamp.from(Instant.now()); + int claimed = SQL.New("UPDATE VmMetadataDirtyVO " + + "SET managementNodeUuid = :myId, lastClaimTime = :claimAt " + "WHERE managementNodeUuid IS NULL " + "AND (nextRetryTime IS NULL OR nextRetryTime <= CURRENT_TIMESTAMP) " + "ORDER BY lastOpDate ASC, vmInstanceUuid ASC " + "LIMIT :batchSize") .param("myId", Platform.getManagementServerId()) + .param("claimAt", claimAt) .param("batchSize", VmGlobalConfig.VM_METADATA_DIRTY_BATCH_SIZE.value(Integer.class)) .execute(); @@ - Timestamp thisCycleCutoff = Timestamp.from(Instant.now().minus(Duration.ofSeconds(5))); return Q.New(VmMetadataDirtyVO.class) .eq(VmMetadataDirtyVO_.managementNodeUuid, Platform.getManagementServerId()) - .gte(VmMetadataDirtyVO_.lastClaimTime, thisCycleCutoff) + .eq(VmMetadataDirtyVO_.lastClaimTime, claimAt) .list();Also applies to: 550-553
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 490 - 496, In claimDirtyRows(), instead of using a time range filter on lastClaimTime, record an explicit claimId/timestamp into each row (e.g. set VmMetadataDirtyVO.claimAt = now when claiming in the claimDirtyRows() flow) and then query rows by managementNodeUuid AND that exact claimAt value (use VmMetadataDirtyVO_.claimAt in the Q.New(...).eq(...) call) so only rows claimed in this CAS cycle are returned; also update the deduplication/release logic around triggerFlushForVm()/the callback (the code at the current dedupe block referenced by lines ~550-553) to use managementNodeUuid + claimAt for identifying owned rows instead of lastClaimTime.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataUpdateInterceptor.java-149-157 (1)
149-157:⚠️ Potential issue | 🟠 Major不要让 resolver 异常反噬主 API 流程。
resolveVmUuids()里直接调用supports()/resolveVmUuids(),没有隔离任何 resolver 异常。只要其中一个 resolver 抛出RuntimeException,异常就会从BeforeDeliveryMessageInterceptor冒出去,主 API 也会被一起打失败;元数据跟踪这里应该是 best-effort,而不是主流程依赖。建议逐个 resolvercatch后记录日志并继续尝试下一个,全部失败再降级为空结果。💡 建议修正
private List<String> resolveVmUuids(APIMessage msg) { for (VmUuidFromApiResolver resolver : resolvers) { - if (resolver.supports(msg)) { - List<String> vmUuids = resolver.resolveVmUuids(msg); - if (vmUuids != null && !vmUuids.isEmpty()) { - return vmUuids; - } + try { + if (!resolver.supports(msg)) { + continue; + } + + List<String> vmUuids = resolver.resolveVmUuids(msg); + if (vmUuids != null && !vmUuids.isEmpty()) { + return vmUuids; + } + } catch (Exception e) { + logger.warn(String.format("failed to resolve vmUuids from %s via %s", + msg.getClass().getName(), resolver.getClass().getName()), e); } } logger.debug(String.format("no resolver could extract vmUuids from %s", msg.getClass().getName())); return Collections.emptyList(); }Also applies to: 205-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataUpdateInterceptor.java` around lines 149 - 157, The call to resolveVmUuids() must not let a resolver's RuntimeException bubble up and fail the main API; update the implementation in VmMetadataUpdateInterceptor so that when iterating resolver instances you wrap each resolver.supports(...) and resolver.resolveVmUuids(...) call in a try/catch, log the exception (including resolver identity and the exception) and continue to the next resolver, collecting results; if all resolvers throw or none return values, treat the result as empty and proceed (so pendingApis.put still receives an empty vmUuids list). Apply the same per-resolver exception isolation to the other resolveVmUuids usage around the 205-215 area to ensure metadata resolution is best-effort and never aborts the API flow.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-518-526 (1)
518-526:⚠️ Potential issue | 🟠 Major全局并发限流这里有竞态窗口。
Line 520 的读取和 Line 525 的递增不是原子操作;并发
submitFlushTask()会一起越过VM_METADATA_GLOBAL_MAX_CONCURRENT,限流阈值实际守不住。这里需要 CAS,或者“先增后判超限再回滚”的原子路径。💡 建议修正
- if (globalFlushInFlight.get() >= maxConcurrent) { - // 全局并发已满,释放 claim,Poller 下轮重试 - releaseClaim(vmUuid); - return; - } - globalFlushInFlight.incrementAndGet(); + while (true) { + int inFlight = globalFlushInFlight.get(); + if (inFlight >= maxConcurrent) { + // 全局并发已满,释放 claim,Poller 下轮重试 + releaseClaim(vmUuid); + return; + } + if (globalFlushInFlight.compareAndSet(inFlight, inFlight + 1)) { + break; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 518 - 526, The global concurrency check in VmMetadataDirtyMarker is racy because you read globalFlushInFlight then call globalFlushInFlight.incrementAndGet(), allowing multiple threads to exceed VM_METADATA_GLOBAL_MAX_CONCURRENT; change to an atomic check-and-increment pattern (e.g. use getAndIncrement() or incrementAndGet() then immediately test the returned value against maxConcurrent and, if it exceeds the limit, decrement/back out and call releaseClaim(vmUuid) before returning) so the path in submitFlushTask()/wherever this block lives is atomic; ensure you reference and update the same AtomicInteger (globalFlushInFlight) and call releaseClaim(vmUuid) on the rollback path.
🟡 Minor comments (27)
conf/db/upgrade/V4.10.29__schema.sql-21-21 (1)
21-21:⚠️ Potential issue | 🟡 Minor索引创建缺少幂等性检查
CREATE INDEX语句没有使用存在性检查,如果脚本重复执行会导致错误。建议添加存在性检查以确保升级脚本的幂等性。🛠️ 建议的修复方案
-- Poller CAS claim query optimization: WHERE managementNodeUuid IS NULL AND lastClaimTime ... AND nextRetryTime <= NOW() -CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `VmMetadataDirtyVO` (`managementNodeUuid`, `lastClaimTime`, `nextRetryTime`); +DROP PROCEDURE IF EXISTS AddIndexIfNotExists; +DELIMITER $$ +CREATE PROCEDURE AddIndexIfNotExists() +BEGIN + IF NOT EXISTS ( + SELECT 1 FROM INFORMATION_SCHEMA.STATISTICS + WHERE TABLE_SCHEMA = 'zstack' + AND TABLE_NAME = 'VmMetadataDirtyVO' + AND INDEX_NAME = 'idxVmMetadataDirtyUnclaimed' + ) THEN + CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `VmMetadataDirtyVO` (`managementNodeUuid`, `lastClaimTime`, `nextRetryTime`); + END IF; +END$$ +DELIMITER ; +CALL AddIndexIfNotExists(); +DROP PROCEDURE IF EXISTS AddIndexIfNotExists;Based on learnings: "ZStack 数据库升级脚本...需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V4.10.29__schema.sql` at line 21, The CREATE INDEX for idxVmMetadataDirtyUnclaimed on table VmMetadataDirtyVO lacks an existence check and will fail if the script is re-run; modify the upgrade SQL to first query information_schema.statistics (filtering by table_schema, table_name='VmMetadataDirtyVO' and index_name='idxVmMetadataDirtyUnclaimed') and only execute CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `VmMetadataDirtyVO` (...) when that query returns no rows, ensuring idempotency of the schema change.header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.java-8-11 (1)
8-11:⚠️ Potential issue | 🟡 Minor
@RestResponse(allTo = "inventory")可能导致warnings字段在 REST 响应中丢失当使用
allTo = "inventory"时,所有响应字段会映射到inventory字段。但此类同时包含inventory和warnings两个字段,warnings可能无法正确序列化到 REST 响应中。建议改为使用
fieldsTo显式映射两个字段:💡 建议的修改
-@RestResponse(allTo = "inventory") +@RestResponse(fieldsTo = {"inventory", "warnings"})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.java` around lines 8 - 11, The `@RestResponse`(allTo = "inventory") on class APIRegisterVmInstanceFromMetadataEvent causes the warnings field to be omitted from REST responses; change the annotation to explicitly map both fields (e.g. use `@RestResponse`(fieldsTo = {"inventory", "warnings"}) or equivalent supported syntax) so both the VmInstanceInventory inventory and List<String> warnings are included in the serialized response.docs/design/vm-metadata-07a-单元测试计划.md-222-222 (1)
222-222:⚠️ Potential issue | 🟡 Minor这里的区间写法建议修正一下。
[96:928)看起来是笔误,容易让实现者误解偏移范围;建议改成更常见的半开区间写法,比如[96, 928)。✏️ 建议修改
-| UT-BIN-20 | 摘要区序列化 | vmUuid(32) + vmName(256) + vmCategory(1) | 写入 [96:928) 区域 | +| UT-BIN-20 | 摘要区序列化 | vmUuid(32) + vmName(256) + vmCategory(1) | 写入 [96, 928) 区域 |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-07a-单元测试计划.md` at line 222, Update the interval notation for UT-BIN-20's write range to use a standard half-open format: change the current "[96:928)" to "[96, 928)" in the line describing "UT-BIN-20 | 摘要区序列化 | vmUuid(32) + vmName(256) + vmCategory(1)" so implementers clearly understand the offset range.compute/src/main/java/org/zstack/compute/vm/AbstractVmInstance.java-208-213 (1)
208-213:⚠️ Potential issue | 🟡 Minor注释与放行列表不一致。
Line 208-209 说明 Registering 阶段允许“metadata-related reads”,但这里实际只注册了
ChangeVmMetaDataMsg和两个 destroy 消息。如果注册流程里确实会发起元数据读请求,它们仍会被validateOperationByState()拦住;如果不会,这段注释应删掉对应描述,避免形成错误契约。As per coding guidelines, “对于较长的注释,需要仔细校对并随代码更新,确保内容正确。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/AbstractVmInstance.java` around lines 208 - 213, The comment for VmInstanceState.Registering in AbstractVmInstance is inconsistent with the allowedOperations.addState call: it claims "metadata-related reads" are permitted but only ChangeVmMetaDataMsg, APIDestroyVmInstanceMsg, and DestroyVmInstanceMsg are actually registered; update one of the two to match the intent—either add the specific metadata-read message classes to allowedOperations.addState (so validateOperationByState() won’t block those reads) or remove/adjust the "metadata-related reads" wording from the comment to reflect that only ChangeVmMetaDataMsg and destroy messages are allowed; reference VmInstanceState.Registering, allowedOperations.addState, ChangeVmMetaDataMsg, APIDestroyVmInstanceMsg, DestroyVmInstanceMsg and validateOperationByState() when making the change.docs/design/vm-metadata-07d-性能与补充测试.md-91-95 (1)
91-95:⚠️ Potential issue | 🟡 Minor修正
Poller吞吐表的列数。Line 95 这一行是 5 列,但表头只有 4 列,Markdown 渲染时最后一个单元格会被吞掉,
< 5ms基准值也会丢失。建议把“500 dirty 行中 450 行有 nextRetryTime > now”并回场景列,保持和表头一致。📝 建议修改
-| PERF-32 | 大量退避行跳过 | 500 dirty 行中 450 行有 nextRetryTime > now | WHERE 过滤效率 | < 5ms | +| PERF-32 | 大量退避行跳过(500 dirty 行中 450 行有 nextRetryTime > now) | WHERE 过滤效率 | < 5ms |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-07d-性能与补充测试.md` around lines 91 - 95, The table row for PERF-32 has five columns causing Markdown to drop the last cell; edit the PERF-32 row so it has exactly four pipe-separated columns to match the header by moving "500 dirty 行中 450 行有 nextRetryTime > now" into the 场景 column (so the row becomes: PERF-32 | 大量退避行跳过 — 500 dirty 行中 450 行有 nextRetryTime > now | WHERE 过滤效率 | < 5ms), ensuring the benchmark "< 5ms" remains in the last column.docs/design/vm-metadata-04e-sblk运维与IO.md-7-20 (1)
7-20:⚠️ Potential issue | 🟡 Minor为这些 fenced code block 补上语言标记。
当前这些代码块都会触发 MD040。这里大多是伪代码,统一标成
text;Python 片段继续标成python即可。Also applies to: 44-62, 87-96, 100-111, 115-125, 129-137, 214-218, 224-251
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04e-sblk运维与IO.md` around lines 7 - 20, Update the fenced code blocks in the document to include explicit language markers to avoid MD040: mark pseudocode blocks (including the example VM metadata block and the scan_metadata_lvs pseudocode) as "text" and keep any real Python snippets marked as "python"; specifically search for and update the blocks containing the VM metadata example, the scan_metadata_lvs pseudocode, and the ranges noted (44-62, 87-96, 100-111, 115-125, 129-137, 214-218, 224-251) so each fenced block starts with ```text or ```python as appropriate.docs/design/vm-metadata-04e-sblk运维与IO.md-148-152 (1)
148-152:⚠️ Potential issue | 🟡 Minor
_io_sanity_check()的签名和实现对不上。这里的调用只传了
fd,但函数体里又直接使用lv_path。按这份伪代码实现,sanity check 失败时会先抛出NameError,真实的 I/O 根因反而丢了。🛠 建议修改
def initialize_metadata_lv(lv_path, lv_size): fd = os.open(lv_path, os.O_RDWR | os.O_DIRECT | os.O_SYNC) try: # Step 0: O_DIRECT sanity check - _io_sanity_check(fd) + _io_sanity_check(fd, lv_path) @@ -def _io_sanity_check(fd): +def _io_sanity_check(fd, lv_path):Also applies to: 191-206
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04e-sblk运维与IO.md` around lines 148 - 152, The _io_sanity_check signature and usage are mismatched: initialize_metadata_lv calls _io_sanity_check(fd) but the implementation uses lv_path and will raise NameError, hiding the real I/O error; fix by updating the call sites (e.g., in initialize_metadata_lv and the similar block at 191-206) to pass both fd and lv_path (or else change _io_sanity_check to accept only fd and stop referencing lv_path), and ensure the _io_sanity_check function signature and its implementation consistently use the same parameters so real I/O errors are propagated instead of NameError.docs/design/vm-metadata-04d-sblk读取与恢复.md-5-23 (1)
5-23:⚠️ Potential issue | 🟡 Minor补充代码块语言标记,避免 MD040 持续告警。
这些 fenced code block 目前都没有 language。大部分是伪代码,标成
text即可;真实的 Python 片段再单独标成python。Also applies to: 31-44, 50-66, 72-92, 109-118, 128-153, 159-169, 173-192, 196-211, 231-259, 263-289, 293-302, 351-368, 378-390
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04d-sblk读取与恢复.md` around lines 5 - 23, Fenced code blocks in the file are missing language tags causing MD040 warnings; update each block (including the one around read_metadata(lv_path, lv_size) and others listed) to include an appropriate language token (use "text" for pseudocode blocks and "python" for real Python snippets) so linters stop flagging them; ensure you update blocks that contain identifiers like read_metadata, Header, SlotAOffset, SlotACapacity, SlotBOffset, SlotBCapacity and the other ranges called out (31-44, 50-66, etc.) so every fenced block has a language specifier.docs/design/vm-metadata-04b-sblk二进制布局.md-283-288 (1)
283-288:⚠️ Potential issue | 🟡 Minor标题中使用了日文汉字"不変量"而非简体中文"不变量"。
"不変" 是日文汉字,简体中文应为"不变"。建议修正为"约束与不变量"。
📝 建议修复
-## 3. 约束与不変量 +## 3. 约束与不变量🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04b-sblk二进制布局.md` around lines 283 - 288, The header and occurrences use the Japanese kanji "不変量" instead of the Simplified Chinese term "不变量"; update the section title and any in-text occurrences (e.g., the header "3. 约束与不変量" and the labeled items mentioning "不変量") to use "不变量" consistently throughout the document, preserving surrounding punctuation and formatting.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-7382-7387 (1)
7382-7387:⚠️ Potential issue | 🟡 Minor失败时不要继续回填
metadata。当前分支在
rsp.isSuccess()为false时仍然执行reply.setMetadata(rsp.metadata),会让调用方同时拿到 error 和数据,响应语义不一致。💡 建议修改
new Http<>(readVmInstanceMetadataPath, cmd, ReadVmInstanceMetadataRsp.class).call(new ReturnValueCompletion<ReadVmInstanceMetadataRsp>(msg) { `@Override` public void success(ReadVmInstanceMetadataRsp rsp) { if (!rsp.isSuccess()) { reply.setError(operr("operation error, because:%s", rsp.getError())); + } else { + reply.setMetadata(rsp.metadata); } - reply.setMetadata(rsp.metadata); bus.reply(msg, reply); completion.done(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 7382 - 7387, In the success(ReadVmInstanceMetadataRsp rsp) handler, avoid setting reply metadata when rsp.isSuccess() is false: if rsp.isSuccess() is false, set the error via reply.setError(...) and immediately reply (bus.reply(msg, reply)) or return, and only call reply.setMetadata(rsp.metadata) in the successful branch; update the logic around reply.setError, reply.setMetadata, and bus.reply(msg, reply) so callers never receive both an error and metadata.docs/design/vm-metadata-02-脏标记与Poller.md-331-331 (1)
331-331:⚠️ Potential issue | 🟡 Minor文档中存在错别字
第 331 行中 "兗底" 应为 "兜底"。
📝 建议的修复
-对注册场景,即使元数据暂时落后于 DB,Part 3 §3.4 的 installPath 存在性检查提供额外兗底。 +对注册场景,即使元数据暂时落后于 DB,Part 3 §3.4 的 installPath 存在性检查提供额外兜底。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02-脏标记与Poller.md` at line 331, Replace the typo "兗底" with the correct word "兜底" in the sentence that reads "对注册场景,即使元数据暂时落后于 DB,Part 3 §3.4 的 installPath 存在性检查提供额外兗底。" (referencing the phrase "installPath 存在性检查" and "兗底"); also scan the document for other occurrences of "兗底" and correct them to "兜底" to ensure consistency.docs/design/vm-metadata-01a-数据模型与序列化.md-315-319 (1)
315-319:⚠️ Potential issue | 🟡 Minor列表编号重复
第 317 行和第 318 行都使用了编号 "4.",这是一个列表格式错误。
📝 建议的修复
3. 初始 ResourceConfig 白名单:`NESTED_VIRTUALIZATION`、`VM_CPU_QUOTA`、`VM_CLOCK_TRACK`、`LIBVIRT_CACHE_MODE` 等 4. 可通过 `@NeedRestoreOnVmApplySnapshot` 注解自动扩展白名单 -4. 未命中白名单的 SystemTag/ResourceConfig 不写入元数据 +5. 未命中白名单的 SystemTag/ResourceConfig 不写入元数据🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01a-数据模型与序列化.md` around lines 315 - 319, Fix the duplicated list numbering by renumbering the repeated "4." entries so the sequence is continuous: change the line containing "`@NeedRestoreOnVmApplySnapshot` 注解自动扩展白名单" from "4." to "5." and change the line "未命中白名单的 SystemTag/ResourceConfig 不写入元数据" from "4." to "6." so the ordered list reads 2, 3, 4, 5, 6.docs/design/vm-metadata-02-脏标记与Poller.md-147-191 (1)
147-191:⚠️ Potential issue | 🟡 Minor伪代码中缺少 try 块
markDirty方法的伪代码中,第 187 行有} catch (Exception e) {,但前面缺少对应的try {语句。这会导致代码示例无法编译,可能会误导实现者。📝 建议的修复
public boolean markDirty(String vmInstanceUuid, boolean storageStructureChange) { // Q23 修复:返回 boolean 表示标脏是否成功(供 MetadataStaleRecoveryTask DP-03 使用) // 前置检查:功能开关 if (!VmGlobalConfig.VM_METADATA_ENABLED.value(Boolean.class)) { return false; } + try { // Q2-2: Galera 集群兼容写法,避免 INSERT ON DUPLICATE KEY 在高并发下死锁 // Step 1: INSERT IGNORE(新行) int inserted = SQL.New("INSERT IGNORE INTO VmMetadataDirtyVO (vmInstanceUuid, dirtyVersion, storageStructureChange) " +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02-脏标记与Poller.md` around lines 147 - 191, The markDirty method's pseudo-code ends with a catch block but lacks the corresponding try; wrap the DB operations and triggerFlushForVm call inside a try { ... } so the existing catch (Exception e) in markDirty can catch exceptions; locate the markDirty method (references: markDirty, SQL.New(...) calls, VmMetadataDirtyVO, triggerFlushForVm) and add the try starting before the INSERT IGNORE sequence and closing just before the catch to ensure proper exception handling and compilation.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java-2053-2053 (1)
2053-2053:⚠️ Potential issue | 🟡 Minor缺少
@Override注解方法在第 2053 行实现了
NfsPrimaryStorageBackend接口中定义的handle方法,应添加@Override注解以确保编译期验证方法签名的正确性。参考LocalStorageKvmBackend等其他实现类已正确使用此注解。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java` at line 2053, Add the missing `@Override` annotation to the handle method in NfsPrimaryStorageKVMBackend: locate the method public void handle(UpdateVmInstanceMetadataOnPrimaryStorageMsg msg, String hostUuid, ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply> completion) and annotate it with `@Override` to indicate it implements the NfsPrimaryStorageBackend.handle(...) contract and enable compile-time signature checking, consistent with other implementations like LocalStorageKvmBackend.docs/design/vm-metadata-01c-存储层与模板虚拟机.md-227-243 (1)
227-243:⚠️ Potential issue | 🟡 Minor这几个 fenced code block 需要补语言标记
这几处代码块会持续触发 MD040,而且渲染时也少了语法高亮。按内容补成
text、java、sql或其他实际语言即可。Also applies to: 287-300, 372-383
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01c-存储层与模板虚拟机.md` around lines 227 - 243, The fenced code blocks (e.g., the block containing "INSERT IGNORE INTO VmMetadataDirtyVO ..." and the steps referencing markDirty(), initializeMetadataOnTargetPS(), readMetadata(), and nextRetryTime) lack language labels causing MD040 and missing syntax highlighting; fix by adding appropriate language identifiers (e.g., sql, text, or java) to each fenced block mentioned (lines around the INSERT/UPDATE block, and the other blocks at the reported ranges) so the Markdown linter stops flagging MD040 and rendering shows correct highlighting.docs/design/vm-metadata-01c-存储层与模板虚拟机.md-5-9 (1)
5-9:⚠️ Potential issue | 🟡 Minor目录里的锚点现在有多处跳不到目标标题
Line 6 的片段没有覆盖实际标题里的“(H3 修复)”,而 Line 8-9 还指向了文件里不存在的章节名,目录点击后会直接失效。建议把目录项和真实标题重新对齐。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01c-存储层与模板虚拟机.md` around lines 5 - 9, The TOC anchors don't match actual headings: update the TOC entries so each anchor text exactly matches the target heading text (for example align "存储迁移 Poller 暂停的崩溃恢复" in the TOC with the heading that includes the "(H3 修复)" suffix, and remove or rename TOC items that currently point to non‑existent sections like the incorrect "模板虚拟机与链式克隆元数据" variant); ensure any headings referenced in the TOC (e.g., the H3 heading that contains "(H3 修复)") exist verbatim in the document or change the TOC to the actual heading strings, and re-generate/verify the markdown anchors so clicking each TOC entry navigates to the correct in-document heading.header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.java-9-13 (1)
9-13:⚠️ Potential issue | 🟡 Minor
path这里还是单数资源名这个新接口用了
/primary-storage/...,和仓库里的 API 设计规范不一致;趁接口还没定型,建议改成复数资源路径,避免后续 SDK 和文档继续固化这个命名。As per coding guidelines, "
path: 针对资源使用复数形式。"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.java` around lines 9 - 13, The `@RestRequest` path in class APIGetVmInstanceMetadataFromPrimaryStorageMsg uses singular "primary-storage" which violates the guideline to use plural resource names; update the path value in the `@RestRequest` annotation on APIGetVmInstanceMetadataFromPrimaryStorageMsg to use the plural form (e.g. "/primary-storages/vm-instances/metadata") so it matches repository API naming conventions and prevents inconsistent SDK/docs naming.docs/design/vm-metadata-02b-高可用与运维.md-721-721 (1)
721-721:⚠️ Potential issue | 🟡 Minor标题层级跳级了。
# 10下直接出现### 10.0,## 12.4下直接出现#### D1...,会触发MD001,目录层级也会变得不稳定。Also applies to: 829-829
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02b-高可用与运维.md` at line 721, The document has skipped heading levels (e.g., under "# 10" there is "### 10.0 容量公式与常量(QX-8)" and under "## 12.4" there is "#### D1..."), triggering MD001 and breaking the TOC; fix by normalizing heading levels so each heading increments by only one level from its parent (for example change "### 10.0 容量公式与常量(QX-8)" to "## 10.0 容量公式与常量(QX-8)" or adjust the parent "# 10" to a higher level as appropriate, and similarly demote or promote "#### D1..." to "### D1..."), apply the same correction at the other occurrence (line ~829) to ensure consistent, non-skipped heading hierarchy throughout.docs/design/vm-metadata-04c-sblk写入流程.md-43-53 (1)
43-53:⚠️ Potential issue | 🟡 Minor这些 fenced code block 也需要 language tag。
当前会持续触发 markdownlint
MD040。这类时序图、状态图和伪代码统一标成text就可以。Also applies to: 57-80, 84-101, 105-133, 173-194, 232-267
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04c-sblk写入流程.md` around lines 43 - 53, The fenced code blocks containing pseudocode like the snippet with symbols target_slot, new_seq, new_lv_size, calculate_extend_size, lvextend, and calculate_slot_layout should include a language tag to avoid markdownlint MD040; update those triple-backtick blocks to use ```text (e.g., replace ``` with ```text) for this file and the other similar blocks (the other pseudocode/diagram fenced regions) so the linter treats them as plain text.docs/design/vm-metadata-02b-高可用与运维.md-28-40 (1)
28-40:⚠️ Potential issue | 🟡 Minor给这些 fenced code block 补上 language tag。
当前这些块会持续触发 markdownlint
MD040。时序/伪代码统一标成text,Java 片段标成java就够了。Also applies to: 82-87, 104-108, 112-119, 155-161, 843-858
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02b-高可用与运维.md` around lines 28 - 40, The fenced code blocks (e.g., the sequence block beginning with "T0: MN-A Poller 认领 dirty(vm-1)" and other similar timing/psuedocode blocks at the noted ranges) are missing language tags and trigger markdownlint MD040; update each such timing/sequence/psuedocode fence to begin with ```text and update any Java snippets to begin with ```java so the linter recognizes the language (apply the change to the blocks referenced in the comment: the shown block plus the other ranges called out).docs/design/vm-metadata-02b-高可用与运维.md-677-683 (1)
677-683:⚠️ Potential issue | 🟡 Minor这里的 SQL 策略说明还是旧版本。
§9.2 正文已经改成
INSERT IGNORE + UPDATE,但这张对比表还写着INSERT ... ON DUPLICATE KEY UPDATE。同一份设计文档现在自相矛盾,后续实现和评审都很容易被带偏。📝 建议修正文案
-| SQL 策略 | `INSERT ... ON DUPLICATE KEY UPDATE`(保证已有行也递增版本) | `INSERT IGNORE ... LEFT JOIN`(仅初始化新行) | +| SQL 策略 | `INSERT IGNORE + UPDATE`(先补新行,再对已有行递增版本) | `INSERT IGNORE ... LEFT JOIN`(仅初始化新行) |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02b-高可用与运维.md` around lines 677 - 683, The table's SQL strategy cell is inconsistent with §9.2: change the entry under "SQL 策略" for the "§9 升级全量刷新" column from "INSERT ... ON DUPLICATE KEY UPDATE" to the updated "INSERT IGNORE + UPDATE" to match the §9.2 text and remove the contradiction so the design doc is consistent.docs/design/vm-metadata-01b-API拦截与VM解析.md-393-393 (1)
393-393:⚠️ Potential issue | 🟡 Minor表格格式错误:两行被错误合并。
Line 393 的表格行被错误地合并为一行,使用了
||分隔两个 API 条目,导致 Markdown 表格渲染异常。📝 修复建议
-| `APIUpdateVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false || `APIRecoverVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false | +| `APIUpdateVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false | +| `APIRecoverVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01b-API拦截与VM解析.md` at line 393, The table row at line containing "APIUpdateVmInstanceMsg | CONFIG | DirectVmUuidResolver | false || APIRecoverVmInstanceMsg | CONFIG | DirectVmUuidResolver | false" merged two entries; split it into two separate markdown table rows so each API has its own line: one row for APIUpdateVmInstanceMsg with columns CONFIG, DirectVmUuidResolver, false, and a second row for APIRecoverVmInstanceMsg with columns CONFIG, DirectVmUuidResolver, false, matching the table's column structure used elsewhere.docs/design/vm-metadata-03-注册与运维.md-561-561 (1)
561-561:⚠️ Potential issue | 🟡 Minor章节标题同样存在日文汉字问题
此处 "不変量" 应修正为 "不变量",与目录保持一致。
建议修复
-## 9. 约束与不変量 +## 9. 约束与不变量🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-03-注册与运维.md` at line 561, Update the section heading that currently reads "## 9. 约束与不変量" to use the correct Chinese term "不变量" so it matches the rest of the document and the table of contents; locate the heading string "## 9. 约束与不変量" and replace "不変量" with "不变量".docs/design/vm-metadata-03-注册与运维.md-8-8 (1)
8-8:⚠️ Potential issue | 🟡 Minor锚点链接格式错误
目录中的链接
#8-运维指南注册失败后的清理与实际标题## 8. 运维指南:注册失败后的清理不匹配,缺少冒号对应的编码。这会导致点击目录链接时无法跳转到正确位置。建议修复
-8. [运维指南:注册失败后的清理](`#8-运维指南注册失败后的清理`) +8. [运维指南:注册失败后的清理](`#8-运维指南注册失败后的清理`)注意:Markdown 锚点生成规则会将标题中的中文冒号
:移除,建议确认实际生成的锚点 ID 或考虑在标题中使用英文冒号。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-03-注册与运维.md` at line 8, 目录中的锚点与标题编码不一致,导致链接失效;把目录里的链接 `#8-运维指南注册失败后的清理` 与标题 `## 8. 运维指南:注册失败后的清理` 对齐:要么把目录项改为生成器实际的锚点(即移除中文冒号,使用 `#8-运维指南注册失败后的清理` 的确切格式),要么在标题行后显式添加自定义锚点(例如 在标题后加 `{`#8-运维指南注册失败后的清理`}`)以确保两处 ID 完全匹配;定位修改点为该 Markdown 文件中的目录条目和标题 `## 8. 运维指南:注册失败后的清理`。docs/design/vm-metadata-03-注册与运维.md-13-13 (1)
13-13:⚠️ Potential issue | 🟡 Minor文字错误:使用了日文汉字
"不変量" 中的 "変" 是日文汉字,应改为中文简体 "变"。
建议修复
-9. [约束与不変量](`#9-约束与不変量`) +9. [约束与不变量](`#9-约束与不变量`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-03-注册与运维.md` at line 13, Replace the Japanese kanji "変" with the Chinese simplified character "变" wherever it appears in the document heading and links; specifically update the heading and anchor text "9. [约束与不変量](`#9-约束与不変量`)" to "9. [约束与不变量](`#9-约束与不变量`)" so both the visible heading and its anchor/reference match the corrected term.storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java-1886-1891 (1)
1886-1891:⚠️ Potential issue | 🟡 Minor错误消息引用了错误的 UUID。
错误消息中使用
self.getUuid()(主存储 UUID),但实际操作是关于虚拟机元数据读取,应该使用更准确的上下文信息。🐛 建议的修复
if (!r.isSuccess()) { - trigger.fail(operr("failed to update vm[uuid=%s] on hypervisor.", - self.getUuid()).withCause(r.getError())); + trigger.fail(operr("failed to read vm metadata from path[%s] on host[uuid=%s].", + msg.getMetadataPath(), msg.getHostUuid()).withCause(r.getError())); return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java` around lines 1886 - 1891, The error message uses self.getUuid() (the primary storage UUID) but the failing operation is about VM metadata; update the callback in run(MessageReply r) so the operr() includes the VM's UUID instead of self.getUuid() — capture and use the VM identifier available in scope (for example vm.getUuid() or a vmUuid variable passed into the closure) when calling trigger.fail(operr(...).withCause(r.getError())); leave r.getError() as the cause.compute/src/main/java/org/zstack/compute/vm/VmInstanceMetadataFieldProcessor.java-226-232 (1)
226-232:⚠️ Potential issue | 🟡 Minor路径替换可能产生意外结果。
String.replace()会替换所有匹配项。如果oldPathIdentifier在路径中多次出现或作为其他部分的子串,可能导致意外替换。🛡️ 建议使用更精确的替换逻辑
private static void replaceInstallPath(Map<String, Object> voMap, String fieldName, VmInstanceMetadataRegistrationSpec spec) { String path = (String) voMap.get(fieldName); if (path != null && spec.getOldPathIdentifier() != null && spec.getNewPathIdentifier() != null) { - voMap.put(fieldName, path.replace(spec.getOldPathIdentifier(), spec.getNewPathIdentifier())); + // 仅替换第一次出现,避免意外替换路径中其他部分 + voMap.put(fieldName, path.replaceFirst( + java.util.regex.Pattern.quote(spec.getOldPathIdentifier()), + java.util.regex.Matcher.quoteReplacement(spec.getNewPathIdentifier()))); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceMetadataFieldProcessor.java` around lines 226 - 232, The current replaceInstallPath method uses String.replace which blindly replaces all substrings and can corrupt paths; update replaceInstallPath to perform a precise replacement using spec.getOldPathIdentifier() and spec.getNewPathIdentifier() by treating the path as path segments (split on file separator) or by matching only whole-segment or prefix occurrences (not arbitrary substrings), replace only the exact segment(s) that equal oldPathIdentifier (or the specific leading segment if intended), then rejoin segments to form the new path; keep null checks and use the same method name replaceInstallPath and parameters (voMap, fieldName, VmInstanceMetadataRegistrationSpec spec) so callers remain unchanged.
| public List<String> resolveVmUuids(APIMessage msg) { | ||
| String groupUuid = ((VolumeSnapshotGroupMessage) msg).getGroupUuid(); | ||
| if (groupUuid == null) { | ||
| return Collections.emptyList(); | ||
| } | ||
|
|
||
| // groupUuid → VolumeSnapshotGroupRefVO → volumeUuid 列表 | ||
| List<String> volumeUuids = SQL.New( | ||
| "SELECT DISTINCT ref.volumeUuid FROM VolumeSnapshotGroupRefVO ref " + | ||
| "WHERE ref.volumeSnapshotGroupUuid = :groupUuid", | ||
| String.class | ||
| ).param("groupUuid", groupUuid).list(); | ||
|
|
||
| if (volumeUuids.isEmpty()) { | ||
| return Collections.emptyList(); | ||
| } | ||
|
|
||
| // volumeUuid 列表 → VolumeVO.vmInstanceUuid(去重,排除 NULL) | ||
| return SQL.New( | ||
| "SELECT DISTINCT v.vmInstanceUuid FROM VolumeVO v " + | ||
| "WHERE v.uuid IN (:volumeUuids) AND v.vmInstanceUuid IS NOT NULL", | ||
| String.class | ||
| ).param("volumeUuids", volumeUuids).list(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
for f in $(fd 'APICreateVolumeSnapshotGroupMsg.java$|VolumeSnapshotGroupMessage.java$'); do
printf '\n=== %s ===\n' "$f"
sed -n '1,220p' "$f"
done
printf '\n=== related usages ===\n'
rg -n -C3 'class APICreateVolumeSnapshotGroupMsg|interface VolumeSnapshotGroupMessage|getGroupUuid\s*\(' --type javaRepository: MatheMatrix/zstack
Length of output: 34636
此 resolver 无法处理快照组创建场景,需要移除类级注释中对 Create 的声明。
SnapshotGroupBasedVmUuidFromApiResolver 的实现完全依赖 VolumeSnapshotGroupMessage.getGroupUuid(),但 APICreateVolumeSnapshotGroupMsg 并不实现 VolumeSnapshotGroupMessage 接口(仅实现 CreateVolumeSnapshotGroupMessage),无法被转换为 VolumeSnapshotGroupMessage,会在创建时抛出 ClassCastException。
此外,创建操作在 API 调用时还没有稳定的 groupUuid(组尚未被持久化)。该 resolver 仅适用于操作已存在快照组的场景:Delete、Revert、Update、Ungroup。建议从类级注释中移除 APICreateVolumeSnapshotGroupMsg 的声明,只保留已确认可用的消息类型。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/SnapshotGroupBasedVmUuidFromApiResolver.java`
around lines 38 - 60, The resolver SnapshotGroupBasedVmUuidFromApiResolver
relies on VolumeSnapshotGroupMessage.getGroupUuid() and thus cannot handle
APICreateVolumeSnapshotGroupMsg (which doesn't implement
VolumeSnapshotGroupMessage and leads to ClassCastException and missing groupUuid
at create time); update the class-level annotation to remove
APICreateVolumeSnapshotGroupMsg and only list message types that operate on
existing groups (e.g., DeleteVolumeSnapshotGroupMsg,
RevertVolumeSnapshotGroupMsg, UpdateVolumeSnapshotGroupMsg,
UngroupVolumeSnapshotGroupMsg or the concrete message classes used elsewhere),
and adjust any related comments to reflect that this resolver only applies to
existing-group operations.
| public class VmMetadataBuilder { | ||
| private static final CLogger logger = Utils.getLogger(VmMetadataBuilder.class); |
There was a problem hiding this comment.
缺少 Spring Bean 注解。
类声明缺少 @Component 或 @Service 注解,导致无法被 Spring 容器管理,@Autowired 和 @Transactional 注解将不会生效。
🐛 建议添加 Spring 注解
+import org.springframework.stereotype.Component;
+
/**
* 构建虚拟机元数据 payload 的 Spring Component。
* ...
*/
+@Component
public class VmMetadataBuilder {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public class VmMetadataBuilder { | |
| private static final CLogger logger = Utils.getLogger(VmMetadataBuilder.class); | |
| import org.springframework.stereotype.Component; | |
| /** | |
| * 构建虚拟机元数据 payload 的 Spring Component。 | |
| */ | |
| `@Component` | |
| public class VmMetadataBuilder { | |
| private static final CLogger logger = Utils.getLogger(VmMetadataBuilder.class); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataBuilder.java`
around lines 47 - 48, The VmMetadataBuilder class is missing a Spring bean
annotation so `@Autowired/`@Transactional won't be applied; add a Spring
stereotype (e.g., annotate VmMetadataBuilder with `@Component` or `@Service`) and
import the corresponding org.springframework.stereotype.* annotation so the
class is registered in the Spring context and its dependency-injection and
transaction annotations become effective; keep the existing class name
VmMetadataBuilder and logger declaration unchanged.
compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java
Show resolved
Hide resolved
| private void handle(UpdateVmInstanceMetadataMsg msg) { | ||
| thdf.chainSubmit(new ChainTask(msg) { | ||
| @Override | ||
| public String getSyncSignature() { | ||
| return String.format("handle-update-vm-%s-metadata", msg.getUuid()); | ||
| } | ||
|
|
||
| @Override | ||
| public void run(SyncTaskChain chain) { | ||
| doHandleUpdateVmInstanceMetadata(msg); | ||
| chain.next(); | ||
| } |
There was a problem hiding this comment.
同一 VM 的 metadata flush 实际上没有被串行化
Line 9416 在 bus.send() 还没回来的时候就 chain.next() 了,而真正的主存储写入是在异步回调里结束的。这样同一 VM 的后续 flush 会和当前写入并发,旧 payload 有机会晚于新 payload 落盘,和这里“串行执行”的意图相反。
建议修改
private void handle(UpdateVmInstanceMetadataMsg msg) {
thdf.chainSubmit(new ChainTask(msg) {
`@Override`
public String getSyncSignature() {
return String.format("handle-update-vm-%s-metadata", msg.getUuid());
}
`@Override`
public void run(SyncTaskChain chain) {
- doHandleUpdateVmInstanceMetadata(msg);
- chain.next();
+ doHandleUpdateVmInstanceMetadata(msg, chain);
}
`@Override`
public String getName() {
return String.format("handle-update-vm-%s-metadata-task", msg.getUuid());
}
});
}
-private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg) {
+private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg, SyncTaskChain chain) {
String metadata = vmMetadataBuilder.buildVmInstanceMetadata(msg.getUuid());
int payloadSize = metadata.getBytes(java.nio.charset.StandardCharsets.UTF_8).length;
if (payloadSize > org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD) {
logger.error(String.format("metadata payload too large: %d bytes for vm[uuid:%s], rejecting",
payloadSize, msg.getUuid()));
MessageReply reply = new MessageReply();
reply.setError(Platform.operr("metadata payload too large (%d bytes, limit %d) for vm[uuid=%s]",
payloadSize, org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD, msg.getUuid()));
bus.reply(msg, reply);
+ chain.next();
return;
}
bus.send(umsg, new CloudBusCallBack(msg) {
`@Override`
public void run(MessageReply r) {
UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply();
if (!r.isSuccess()) {
reply.setError(Platform.operr("failed to update vm[uuid=%s] metadata on primary storage",
msg.getUuid()).withCause(r.getError()));
}
bus.reply(msg, reply);
+ chain.next();
}
});
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
9406 - 9417, The current ChainTask (handle(UpdateVmInstanceMetadataMsg msg) ->
run) calls doHandleUpdateVmInstanceMetadata(msg) and immediately chain.next(),
but the actual main-storage flush completes asynchronously in the bus.send()
callback, so subsequent metadata flushes for the same VM can run concurrently;
fix by making the task wait for the async reply before calling chain.next():
either change doHandleUpdateVmInstanceMetadata to accept a callback/Consumer or
return a Future/CompletionStage and invoke chain.next() from the bus.send reply
handler (or when the Future completes), ensuring the chain.next() call occurs
only after the bus.send completion callback (the code paths around
doHandleUpdateVmInstanceMetadata, ChainTask.run, and the bus.send reply handler
must be updated accordingly).
| Crashed(VmInstanceStateEvent.crashed), | ||
| Registering(null); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 描述:查找代码库中对 Registering 状态的使用情况,确认是否需要状态转换
# 搜索 Registering 状态的引用
rg -n "VmInstanceState\.Registering" --type java -C 3
# 搜索是否有代码处理 Registering 状态转换
rg -n "Registering" --type java | grep -i "state\|transition\|next"Repository: MatheMatrix/zstack
Length of output: 2992
🏁 Script executed:
cat -n header/src/main/java/org/zstack/header/vm/VmInstanceState.java | head -100Repository: MatheMatrix/zstack
Length of output: 6027
🏁 Script executed:
# Search for where Registering state is set/transitioned to
rg -n "setState.*Registering\|Registering.*setState\|changeState.*Registering" --type java -A 3 -B 3Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search for nextState usage and how it's called - to understand if missing transitions would be problematic
rg -n "nextState\(\)" --type java -A 2 -B 2 | head -50Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
cat -n header/src/main/java/org/zstack/header/vm/VmInstanceState.java | tail -100Repository: MatheMatrix/zstack
Length of output: 6287
🏁 Script executed:
# Search for how VM enters and exits Registering state
rg -n "Registering" header/src/main/java/org/zstack/header/vm/VmInstanceState.java -A 5 -B 5Repository: MatheMatrix/zstack
Length of output: 1051
🏁 Script executed:
# Search for all places where VM state is transitioned, particularly for Registering
rg -n "\.setState\(VmInstanceState\." --type java | grep -i registeringRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if there's any code that moves VM from Registering to another state
rg -n "from.*Registering\|Registering.*to\|transition.*Registering" --type java -iRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search for ChangeVmMetaDataMsg to understand how state transitions from Registering
rg -n "ChangeVmMetaDataMsg" --type java -A 5 -B 2 | head -80Repository: MatheMatrix/zstack
Length of output: 5951
🏁 Script executed:
# Search for direct setState calls (bypass nextState mechanism) for Registering
rg -n "setState" --type java | grep -i "registering\|state.*transition" | head -20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check how Error state (also null with no transactions) is handled
rg -n "VmInstanceState.Error" --type java | head -20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search for how ChangeVmMetaDataMsg actually changes VM state
rg -n "changeMetaData\|AtomicVmState" --type java -A 10 | head -100Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if Registering transitions are handled via AtomicVmState in ChangeVmMetaDataMsg
rg -n "expected\|value\|AtomicVmState" header/src/main/java/org/zstack/header/vm/ChangeVmMetaDataMsg.java -A 3Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look for actual usage of Registering in tests or real workflows
rg -n "setState.*Registering\|Registering.*state" --type java -B 2 -A 2Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look for the actual implementation of changeMetaData and how it uses AtomicVmState
rg -n "private void changeMetaData" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -A 50Repository: MatheMatrix/zstack
Length of output: 2236
🏁 Script executed:
# Check how AtomicVmState is used for state transitions
rg -n "getAtomicVmStates\|AtomicVmState" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -B 2 -A 5Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Verify if Error state (also null, no transactions) is used or if there's a pattern here
rg -n "VmInstanceState.Error" --type java -B 2 -A 2Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search for where Registering state is actually set from (what state comes before it)
rg -n "changeVmStateInDb.*registered\|Registering.*drivenEvent\|VmInstanceStateEvent\." compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -B 3 -A 3 | head -80Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check what states have Registering as a possible nextState (looking at all transaction definitions)
grep -rn "Registering" header/src/main/java/org/zstack/header/vm/VmInstanceState.javaRepository: MatheMatrix/zstack
Length of output: 130
🏁 Script executed:
# Look for where VM enters Registering state from Created or other states
rg -n "register\|Register" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java -i | grep -i "state\|transition" | head -20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if Registering can be transitioned from Created state
cat -n header/src/main/java/org/zstack/header/vm/VmInstanceState.java | grep -A 3 "Created.transactions"Repository: MatheMatrix/zstack
Length of output: 394
🏁 Script executed:
# Search for where a VM transitions TO Registering state (what's the source state)
rg -n "registered\|Register" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java -i -B 2 -A 2 | head -60Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check VmInstanceStateEvent to see if there's a registered event
rg -n "class VmInstanceStateEvent\|registered" header/src/main/java/org/zstack/header/vm/VmInstanceStateEvent.java -iRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search for test files that might show how Registering state is used
find . -name "*Test.java" -o -name "*Case.java" | xargs rg -l "Registering" 2>/dev/nullRepository: MatheMatrix/zstack
Length of output: 46
🏁 Script executed:
# Search for VmInstanceStateEvent definition to understand available events
cat header/src/main/java/org/zstack/header/vm/VmInstanceStateEvent.java | head -50Repository: MatheMatrix/zstack
Length of output: 632
🏁 Script executed:
# Look for where Registering state is set using setState directly (not nextState)
rg -n "\.setState\(" compute/src/main/java/org/zstack/compute/vm/ --type java -B 2 -A 2 | grep -i "registering" -B 3 -A 3Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if there are workflow/action classes that might handle Registering transitions
find compute/src/main/java -name "*Register*" -type fRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look at how Error state (also null with no transactions) is used
rg -n "VmInstanceState\.Error\|Error.transactions" header/src/main/java/org/zstack/header/vm/VmInstanceState.javaRepository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check the Registering state workflow more carefully - search for where it transitions from/to
rg -n "getState\(\) == VmInstanceState\|equals.*Registering" compute/src/main/java/org/zstack/compute/vm/ --type java -B 1 -A 1 | head -50Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look at the complete changeMetaData method to understand the state change mechanism
cat -n compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | sed -n '1863,1920p'Repository: MatheMatrix/zstack
Length of output: 2785
🏁 Script executed:
# Check changeVmStateInDb method to see how it changes state
rg -n "private.*changeVmStateInDb\|protected.*changeVmStateInDb" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -A 15Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Find changeVmStateInDb method definition
rg -n "void changeVmStateInDb" compute/src/main/java/org/zstack/compute/vm/ --type java -A 20 | head -80Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look for where Registering state is actually created/set in the codebase
rg -n "setState\|changeVmStateInDb" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | head -30Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if there are any comments or documentation about Registering state design
rg -n "Registering\|registration\|register" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -i -B 1 -A 1 | head -60Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Search more broadly for how state changes happen
rg -n "changeVmStateInDb\|setState" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java | head -40Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if there's documentation about the Registering state feature
rg -n "Registering\|registration" --type java -i | head -40Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check how the state is currently being handled - look at the entire VmInstanceBase file for state management patterns
wc -l compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.javaRepository: MatheMatrix/zstack
Length of output: 130
🏁 Script executed:
# Find changeVmStateInDb in parent classes
rg -n "changeVmStateInDb" compute/src/main/java/org/zstack/compute/vm/AbstractVmInstance.java --type java -A 25Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Get the context around the Registering state use in VmInstanceBase
rg -n "Registering" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java -B 10 -A 10Repository: MatheMatrix/zstack
Length of output: 1400
🏁 Script executed:
# Check if the state transitions for Registering are supposed to be user-provided via ChangeVmMetaDataMsg
rg -n "changeVmStateInDb.*drivenEvent\|s.getValue\(\).getDrivenEvent" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -B 3 -A 3Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Verify the pattern - check what happens when changeVmStateInDb is called with Registering as current state
rg -n "changeVmStateInDb" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java -B 2 -A 5 | head -100Repository: MatheMatrix/zstack
Length of output: 5086
🏁 Script executed:
# Confirm the design - check if Registering->nextState() would ever be called
rg -n "\.nextState\(" compute/src/main/java/org/zstack/compute/vm/ --type java -B 3 -A 3 | head -60Repository: MatheMatrix/zstack
Length of output: 2067
🏁 Script executed:
# Confirm the issue - check if changeVmStateInDb can be called with Registering as source state
# and what would happen when nextState is called
rg -n "self.getState\(\).nextState" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java -B 5 -A 2Repository: MatheMatrix/zstack
Length of output: 415
🏁 Script executed:
# Double-check: when ChangeVmMetaDataMsg is used to transition FROM Registering, what happens?
# The call is: changeVmStateInDb(s.getValue().getDrivenEvent())
# where s.getValue() is the TARGET state, not current state
# So if current is Registering and target is Stopped, it calls:
# changeVmStateInDb(VmInstanceStateEvent.stopped)
# which then tries: Registering.nextState(VmInstanceStateEvent.stopped)
# But Registering has no transactions!
# Search for all calls to changeVmStateInDb to verify this pattern
rg -n "changeVmStateInDb\(" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java | wc -lRepository: MatheMatrix/zstack
Length of output: 64
🏁 Script executed:
# Look for any special handling for Registering state in changeVmStateInDb
rg -n "changeVmStateInDb\|Registering" compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java --type java | grep -E "(changeVmStateInDb|Registering)" | head -20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Let me check the full changeVmStateInDb implementation to see if there's special handling for certain states
cat -n compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | sed -n '346,400p'Repository: MatheMatrix/zstack
Length of output: 2936
🏁 Script executed:
# Verify: does the code have any try-catch or special handling for states without transactions?
cat -n compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java | sed -n '346,380p'Repository: MatheMatrix/zstack
Length of output: 1743
🏁 Script executed:
# Now I see the issue - verify that when ChangeVmMetaDataMsg tries to transition FROM Registering,
# it will call nextState on Registering which has no transactions.
# Let me verify the exact flow one more time by looking at how AtomicVmState works
cat header/src/main/java/org/zstack/header/vm/ChangeVmMetaDataMsg.java | head -80Repository: MatheMatrix/zstack
Length of output: 2119
🏁 Script executed:
# Confirm: when changeMetaData calls changeVmStateInDb(s.getValue().getDrivenEvent()),
# s.getValue() is the TARGET state, so getDrivenEvent() returns the event of the TARGET state
# Then changeVmStateInDb tries: currentState.nextState(targetStateEvent)
# So if current is Registering, it tries: Registering.nextState(stopped/running/etc)
# which requires Registering to have those transitions defined
rg -n "getDrivenEvent\|getExpected\|getValue" header/src/main/java/org/zstack/header/vm/ChangeVmMetaDataMsg.java -A 2Repository: MatheMatrix/zstack
Length of output: 44
Registering 状态缺少状态转换定义。
新增的 Registering 状态已添加到 offlineStates 集合中,但在 static 块中没有定义任何状态转换(.transactions(...))。
当 VM 处于 Registering 状态时,ChangeVmMetaDataMsg 会尝试通过调用 changeVmStateInDb(targetState.getDrivenEvent()) 将其转换到其他状态。这会触发 self.getState().nextState(event) 的调用,即 Registering.nextState(event)。由于 Registering 状态没有定义任何转换规则,nextState() 方法将抛出 CloudRuntimeException。
建议补充 Registering 的状态转换定义,例如:
建议的转换定义
Registering.transactions(
new Transaction(VmInstanceStateEvent.stopped, VmInstanceState.Stopped),
new Transaction(VmInstanceStateEvent.running, VmInstanceState.Running),
new Transaction(VmInstanceStateEvent.destroying, VmInstanceState.Destroying)
);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@header/src/main/java/org/zstack/header/vm/VmInstanceState.java` around lines
33 - 34, Registering enum value in VmInstanceState is added to offlineStates but
lacks any .transactions(...) definitions, so calls like ChangeVmMetaDataMsg ->
changeVmStateInDb(...) -> self.getState().nextState(event) will throw
CloudRuntimeException; fix by adding transactions for Registering inside the
static initialization of VmInstanceState (the same place other states call
.transactions(...)) to map relevant events to target states (for example map
VmInstanceStateEvent.stopped -> VmInstanceState.Stopped,
VmInstanceStateEvent.running -> VmInstanceState.Running, and
VmInstanceStateEvent.destroying -> VmInstanceState.Destroying) so
nextState(event) can resolve transitions correctly.
| public void run(SyncTaskChain chain) { | ||
| doHandleUpdateMetadata(msg); | ||
| chain.next(); |
There was a problem hiding this comment.
不要在异步写入完成前提前 chain.next()。
这里一发起 bkd.handle(...) 就释放了链路槽位,VM_METADATA_PS_MAX_CONCURRENT 实际上只限制了“派发速度”,没有限制真正的 PS 元数据写入并发;而且 doHandleUpdateMetadata() 如果同步抛异常,这条队列还可能被卡住。把 chain.next() 挪到 success/fail/同步异常的统一收口里。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3350 - 3352, 当前在 run(SyncTaskChain chain) 中立即调用 chain.next()
导致在异步元数据写入完成前释放链路槽位(例如 bkd.handle(...) 的异步回调),破坏了 VM_METADATA_PS_MAX_CONCURRENT
的并发限制且同步抛异常会卡队列;请将 chain.next() 从 run 里移除,改为在 doHandleUpdateMetadata
发起的异步写入的成功/失败回调里统一调用 chain.next(),并在 doHandleUpdateMetadata 同步抛异常的 catch 分支里也调用
chain.next(),以确保无论成功、失败或同步异常都在写入完成或处理结束后释放链路槽位(参考方法名: run(SyncTaskChain),
doHandleUpdateMetadata, 以及触发异步写入的 bkd.handle)。
| public void run(SyncTaskChain chain) { | ||
| doHandleUpdateMetadata(msg); | ||
| chain.next(); |
There was a problem hiding this comment.
不要在异步回调返回前提前 chain.next()。
这样同步队列只包住了消息派发,真正的 NFS 元数据写入仍会并发执行;如果 doHandleUpdateMetadata() 同步抛异常,这个队列还可能不再前进。把 chain.next() 放到统一 completion 收口里。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 1944 - 1946, 当前 run(SyncTaskChain chain) 回调在调用
doHandleUpdateMetadata(msg) 之后立即执行 chain.next(), 导致同步队列仅包住消息派发而非实际的 NFS
元数据写入;请把对 chain.next() 的调用从 run 方法中移除,并在 doHandleUpdateMetadata 的统一
completion/回调成功或失败处(即在其完成处理的收口)触发 chain.next(),确保 SyncTaskChain
在元数据写入真正完成或处理异常后再推进;参考方法/符号:run, doHandleUpdateMetadata, SyncTaskChain,
chain.next(),并在异常路径也保证调用 chain.next() 或相应的失败回调以防任务队列阻塞。
a57104d to
f12598a
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
♻️ Duplicate comments (3)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)
9413-9416:⚠️ Potential issue | 🔴 Critical同一 VM 的 metadata flush 仍然没有真正串行化
这里在异步
bus.send()结束前就执行了chain.next(),后续同 VM 的 flush 仍可能并发进入,旧 payload 也可能晚于新 payload 落盘。chain.next()需要放到异步回调和所有早退分支里。建议修改
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateVmInstanceMetadata(msg); - chain.next(); + doHandleUpdateVmInstanceMetadata(msg, chain); } -private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg) { +private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg, SyncTaskChain chain) { String metadata = vmMetadataBuilder.buildVmInstanceMetadata(msg.getUuid()); int payloadSize = metadata.getBytes(java.nio.charset.StandardCharsets.UTF_8).length; if (payloadSize > org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD) { logger.error(String.format("metadata payload too large: %d bytes for vm[uuid:%s], rejecting", payloadSize, msg.getUuid())); MessageReply reply = new MessageReply(); reply.setError(Platform.operr("metadata payload too large (%d bytes, limit %d) for vm[uuid=%s]", payloadSize, org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD, msg.getUuid())); bus.reply(msg, reply); + chain.next(); return; } bus.send(umsg, new CloudBusCallBack(msg) { `@Override` public void run(MessageReply r) { UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); if (!r.isSuccess()) { reply.setError(Platform.operr("failed to update vm[uuid=%s] metadata on primary storage", msg.getUuid()).withCause(r.getError())); } bus.reply(msg, reply); + chain.next(); } }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 9413 - 9416, 在 run(SyncTaskChain) 中当前在调用 doHandleUpdateVmInstanceMetadata(msg) 之后立刻执行 chain.next(),这会在异步 bus.send() 返回前放行下一个任务,导致同一 VM 的 metadata flush 未串行化。修改 run 方法:把 chain.next() 从当前同步位置移除,并在 doHandleUpdateVmInstanceMetadata 内部或其异步回调(包括所有早退/错误分支)在确认异步写盘/发送完成后再调用 chain.next();确保所有分支(成功回调、失败回调、以及任何提前返回路径)都会调用 chain.next() 一次且仅一次,从而保证同一 vm 的 flush 顺序化(参考方法名 doHandleUpdateVmInstanceMetadata 和调用点 chain.next()/bus.send())。plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java (1)
1944-1946:⚠️ Potential issue | 🔴 Critical不要在异步写入完成前提前
chain.next()。现在这个限流只包住了消息派发,真正的 NFS 元数据写入仍会并发执行;而且
doHandleUpdateMetadata()只要同步抛异常,这条链就会直接卡住。chain.next()需要放到统一 completion/异常收口里。💡 建议先把链推进交给异步收口
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateMetadata(msg); - chain.next(); + doHandleUpdateMetadata(msg, chain); }
doHandleUpdateMetadata(...)的 success/fail/early-return/异常路径都要统一调用chain.next()。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1944 - 1946, 当前 run(SyncTaskChain) 实现中在调用 doHandleUpdateMetadata(msg) 之后立即执行 chain.next(),这会在异步写入尚未完成或在 doHandleUpdateMetadata 同步抛异常时提前推进链或导致链卡住;请把推进链的逻辑移到 doHandleUpdateMetadata 的统一完成/异常收口处:在所有成功、失败、提前返回或捕获到异常的分支里都确保最终调用 chain.next()(或在异步回调/CompletionHandler 中推进),并从 run(...) 中移除那个立即调用 chain.next() 的位置,以保证限流与元数据写入的生命周期一致并避免链被挂起。plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java (1)
3349-3378:⚠️ Potential issue | 🔴 Critical不要在异步写入完成前提前
chain.next()。
doHandleUpdateMetadata()这里只是发起异步bkd.handle(...)。Line 3352 立即释放队列后,VM_METADATA_PS_MAX_CONCURRENT实际限制的是“派发并发”而不是“写入并发”;另外 Line 3363-3366 之前如果同步抛异常,这条链也没有统一收口。建议把chain.next()放到 success/fail/同步异常的同一出口。🔧 建议修改
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateMetadata(msg); - chain.next(); + doHandleUpdateMetadata(msg, chain); } @@ - private void doHandleUpdateMetadata(final UpdateVmInstanceMetadataOnPrimaryStorageMsg msg) { - final String hostUuid = getHostUuidByResourceUuid(msg.getRootVolumeUuid()); - LocalStorageHypervisorFactory f = getHypervisorBackendFactoryByHostUuid(hostUuid); - LocalStorageHypervisorBackend bkd = f.getHypervisorBackend(self); - bkd.handle(msg, hostUuid, new ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply>(msg) { - `@Override` - public void success(UpdateVmInstanceMetadataOnPrimaryStorageReply returnValue) { - bus.reply(msg, returnValue); - } - - `@Override` - public void fail(ErrorCode errorCode) { - UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); - reply.setError(errorCode); - bus.reply(msg, reply); - } - }); + private void doHandleUpdateMetadata(final UpdateVmInstanceMetadataOnPrimaryStorageMsg msg, final SyncTaskChain chain) { + try { + final String hostUuid = getHostUuidByResourceUuid(msg.getRootVolumeUuid()); + LocalStorageHypervisorFactory f = getHypervisorBackendFactoryByHostUuid(hostUuid); + LocalStorageHypervisorBackend bkd = f.getHypervisorBackend(self); + bkd.handle(msg, hostUuid, new ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply>(msg) { + `@Override` + public void success(UpdateVmInstanceMetadataOnPrimaryStorageReply returnValue) { + bus.reply(msg, returnValue); + chain.next(); + } + + `@Override` + public void fail(ErrorCode errorCode) { + UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); + reply.setError(errorCode); + bus.reply(msg, reply); + chain.next(); + } + }); + } catch (Exception e) { + UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); + reply.setError(operr("failed to update VM metadata on local primary storage[uuid:%s], %s", self.getUuid(), e.getMessage())); + bus.reply(msg, reply); + chain.next(); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java` around lines 3349 - 3378, 当前在 Runnable.run() 中立刻调用 chain.next() 导致 doHandleUpdateMetadata() 发起的异步写入未完成就释放队列;请把 chain.next() 移出 run() 的立即调用,把对队列的释放放到 bkd.handle(...) 的统一出口:在 ReturnValueCompletion 的 success(...) 和 fail(...) 中各调用 chain.next(),并在调用 bkd.handle(...) 的同步调用处用 try/catch 将同步抛出的异常也走到同一失败分支(调用 chain.next() 后回复 msg),确保无论同步异常或异步回调都会统一调用 chain.next() 并正确 reply(msg,...); 涉及符号:run, doHandleUpdateMetadata, bkd.handle, ReturnValueCompletion.success, ReturnValueCompletion.fail, chain.next。
🟠 Major comments (20)
conf/db/upgrade/V4.10.29__schema.sql-20-21 (1)
20-21:⚠️ Potential issue | 🟠 Major给索引创建补上幂等保护。
Line 21 直接执行
CREATE INDEX,升级如果在前半段已成功、重跑到这里时会因为索引已存在而失败,恢复路径不安全。建议先查information_schema.STATISTICS,仅在索引不存在时再创建。建议修改
-- Poller CAS claim query optimization: WHERE managementNodeUuid IS NULL AND lastClaimTime ... AND nextRetryTime <= NOW() -CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `VmMetadataDirtyVO` (`managementNodeUuid`, `lastClaimTime`, `nextRetryTime`); +DROP PROCEDURE IF EXISTS `zstack`.`sp_add_idx_vm_metadata_dirty_unclaimed`; +DELIMITER $$ +CREATE PROCEDURE `zstack`.`sp_add_idx_vm_metadata_dirty_unclaimed`() +BEGIN + IF NOT EXISTS ( + SELECT 1 + FROM information_schema.STATISTICS + WHERE TABLE_SCHEMA = 'zstack' + AND TABLE_NAME = 'VmMetadataDirtyVO' + AND INDEX_NAME = 'idxVmMetadataDirtyUnclaimed' + ) THEN + SET `@sql` = 'CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `zstack`.`VmMetadataDirtyVO` (`managementNodeUuid`, `lastClaimTime`, `nextRetryTime`)'; + PREPARE stmt FROM `@sql`; + EXECUTE stmt; + DEALLOCATE PREPARE stmt; + END IF; +END $$ +DELIMITER ; +CALL `zstack`.`sp_add_idx_vm_metadata_dirty_unclaimed`(); +DROP PROCEDURE `zstack`.`sp_add_idx_vm_metadata_dirty_unclaimed`;Based on learnings: 升级脚本需要通过
information_schema检查对象存在性,确保脚本可以安全重复执行。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V4.10.29__schema.sql` around lines 20 - 21, 为 CREATE INDEX `idxVmMetadataDirtyUnclaimed` ON `VmMetadataDirtyVO` 添加幂等性检查:在执行创建索引前查询 information_schema.STATISTICS(检查 TABLE_NAME='VmMetadataDirtyVO' AND INDEX_NAME='idxVmMetadataDirtyUnclaimed'),只有在不存在该索引时才执行 CREATE INDEX,这样重跑或部分成功的升级脚本不会因索引已存在而失败。compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-3199-3207 (1)
3199-3207:⚠️ Potential issue | 🟠 Major注册态把只读 API 也一起拦掉了
这里的注释写的是“允许 Destroy 和 query/read 操作”,但实现实际只放行了
APIDestroyVmInstanceMsg。像APIGetVmDnsMsg、APIGetVmHostnameMsg、APIGetVmCapabilitiesMsg这类只读接口在Registering期间也会被直接拒绝,和这里的语义不一致。建议显式放行只读消息,或把校验下沉到变更型 API 分支。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 3199 - 3207, The guard in VmInstanceBase rejects all messages except APIDestroyVmInstanceMsg while VmInstanceState.Registering, but the intent per comment is to allow destroy and read-only APIs; update the check so that read-only messages (e.g. APIGetVmDnsMsg, APIGetVmHostnameMsg, APIGetVmCapabilitiesMsg) are also permitted during Registering (or alternatively invert the logic to only block mutating messages), i.e. modify the if-condition around VmInstanceState.Registering in VmInstanceBase to allow APIDestroyVmInstanceMsg and the read-only API message types instead of only APIDestroyVmInstanceMsg.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java-1966-1968 (1)
1966-1968:⚠️ Potential issue | 🟠 Major按目标主机选择 backend。
这里已经解析出明确的
hostUuid,再用getUsableBackend()会退化成“任意一个可用 cluster 的 backend”。PS 同时挂多种 hypervisor 时,容易把请求交给错误实现;本类handle(BatchSyncVolumeSizeOnPrimaryStorageMsg)已经用getBackendByHostUuid()规避了同类问题。💡 建议改成按主机反查 backend
- final NfsPrimaryStorageBackend backend = getUsableBackend(); + final NfsPrimaryStorageBackend backend = getBackendByHostUuid(hostUuid);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1966 - 1968, The code calls getUsableBackend() even though hostUuid is known, which can route the request to the wrong hypervisor backend; change the lookup to use getBackendByHostUuid(hostUuid) (same approach used in handle(BatchSyncVolumeSizeOnPrimaryStorageMsg)) so the backend for the specific host is selected before invoking backend.handle(msg, hostUuid, ...); ensure null handling if getBackendByHostUuid returns null and keep the existing ReturnValueCompletion flow.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java-2058-2064 (1)
2058-2064:⚠️ Potential issue | 🟠 Major不要用
/vol-.../正则反推元数据目录。这里把目录形态硬编码进了
replaceFirst()。只要installPath为空,或者路径不再匹配这个模式,就会把元数据写到错误位置,甚至直接在这里 NPE。按最后一个/截目录并先判空更稳。💡 一个更稳的改法
String installPath = Q.New(VolumeVO.class) .eq(VolumeVO_.uuid, msg.getRootVolumeUuid()) .select(VolumeVO_.installPath) .findValue(); - String path = installPath.replaceFirst("^(.+/vol-[^/]+/).*$", "$1"); + if (installPath == null || !installPath.contains("/")) { + completion.fail(operr("cannot resolve metadata path from root volume[uuid:%s]", msg.getRootVolumeUuid())); + return; + } + String path = installPath.substring(0, installPath.lastIndexOf('/') + 1); String metadataPath = String.format("%s%s", path, VM_META_SUFFIX);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java` around lines 2058 - 2064, 当前用 installPath.replaceFirst("^(.+/vol-[^/]+/).*$", "$1") 通过硬编码的 /vol-.../ 正则反推元数据目录会在 installPath 为空或路径不匹配时产生 NPE 或写到错误位置;改为从查询到的 VolumeVO.installPath(通过 Q.New(...).select(VolumeVO_.installPath))先判空,再通过查找最后一个 '/' 截取目录(如果没有 '/',退回到空或根目录的安全默认值),然后用该目录 + VM_META_SUFFIX 设置到 umsg.setMetadataPath(...),确保在 msg.getRootVolumeUuid() 返回的 installPath 不可用时有安全降级处理并避免 NPE。header/src/main/java/org/zstack/header/vm/VmInstanceMetadataValidator.java-124-140 (1)
124-140:⚠️ Potential issue | 🟠 Major不要静默放过缺少 UUID 的快照组条目。
Line 127-130 和 Line 135-136 只在字段存在时才继续校验;如果
snapshotGroups[*].uuid或snapshotGroupRefs[*].volumeSnapshotGroupUuid缺失,当前实现会直接通过校验。这样坏数据会被带到更后面的注册/修复流程里,定位会更难。参考修正
- Set<String> groupUuids = new HashSet<>(); - for (String groupJson : dto.snapshotGroups) { - Map<String, Object> groupMap = JSONObjectUtil.toObject(groupJson, Map.class); - Object uuid = groupMap.get("uuid"); - if (uuid != null) { - groupUuids.add(uuid.toString()); - } + Set<String> groupUuids = new HashSet<>(); + for (int i = 0; i < dto.snapshotGroups.size(); i++) { + Map<String, Object> groupMap = JSONObjectUtil.toObject(dto.snapshotGroups.get(i), Map.class); + Object uuid = groupMap.get("uuid"); + if (uuid == null || uuid.toString().isEmpty()) { + throw new CloudRuntimeException(String.format( + "metadata snapshotGroups[%s].uuid is missing", i)); + } + groupUuids.add(uuid.toString()); } - for (String refJson : dto.snapshotGroupRefs) { - Map<String, Object> refMap = JSONObjectUtil.toObject(refJson, Map.class); - Object groupUuid = refMap.get("volumeSnapshotGroupUuid"); - if (groupUuid != null && !groupUuids.contains(groupUuid.toString())) { + for (int i = 0; i < dto.snapshotGroupRefs.size(); i++) { + Map<String, Object> refMap = JSONObjectUtil.toObject(dto.snapshotGroupRefs.get(i), Map.class); + Object groupUuid = refMap.get("volumeSnapshotGroupUuid"); + if (groupUuid == null || groupUuid.toString().isEmpty()) { + throw new CloudRuntimeException(String.format( + "metadata snapshotGroupRefs[%s].volumeSnapshotGroupUuid is missing", i)); + } + if (!groupUuids.contains(groupUuid.toString())) { throw new CloudRuntimeException(String.format( "metadata snapshotGroupRef references non-existent group[uuid:%s]", groupUuid)); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceMetadataValidator.java` around lines 124 - 140, In VmInstanceMetadataValidator, stop silently accepting entries missing UUIDs: when iterating dto.snapshotGroups (variable groupMap) ensure the "uuid" key exists and is non-empty — if not, throw a CloudRuntimeException with a clear message including the offending group JSON; likewise when iterating dto.snapshotGroupRefs (variable refMap) ensure "volumeSnapshotGroupUuid" exists and is non-empty — if missing, throw a CloudRuntimeException describing the bad snapshotGroupRef and include the offending ref JSON so the invalid metadata is rejected early.header/src/main/java/org/zstack/header/vm/VmInstanceMetadataValidator.java-92-93 (1)
92-93:⚠️ Potential issue | 🟠 Major把 JSON 解析失败统一转换成
CloudRuntimeException。Line 92、Line 126 和 Line 134 这里直接把底层 JSON 解析异常往外抛,调用方拿不到具体失败路径,也和本类其他校验分支统一抛
CloudRuntimeException的语义不一致。元数据注册链路如果按异常类型做错误码映射,这里会把“格式错误”漏成通用运行时异常。Also applies to: 126-126, 134-134
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceMetadataValidator.java` around lines 92 - 93, The JSON parsing calls using JSONObjectUtil.toObject(rm.vo, Map.class) in VmInstanceMetadataValidator should catch underlying JSON parsing exceptions and rethrow them as CloudRuntimeException with a clear message (e.g., "failed to parse vm metadata JSON for uuid field" or similar) so behavior matches other validation branches; update each occurrence (the one that assigns voMap and the other two parsing sites around the voUuid and related fields) to try/catch the parse, wrap the original exception as the cause in a new CloudRuntimeException, and include identifying context such as the affected rm.vo or vm identifier in the message.compute/src/main/java/org/zstack/compute/vm/VmGlobalConfig.java-140-141 (1)
140-141: 🛠️ Refactor suggestion | 🟠 MajorVM_METADATA 配置缺少
@GlobalConfigDef注解
VM_METADATA配置项使用了validValues = {"true", "false"}验证,但缺少@GlobalConfigDef注解来指定默认值和类型。建议补充以保持与其他配置项的一致性。建议修复
+ `@GlobalConfigDef`(defaultValue = "false", type = Boolean.class, + description = "Enable VM metadata feature") `@GlobalConfigValidation`(validValues = {"true", "false"}) public static GlobalConfig VM_METADATA = new GlobalConfig(CATEGORY, "vm.metadata");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmGlobalConfig.java` around lines 140 - 141, VM_METADATA is missing the `@GlobalConfigDef` annotation; update the declaration for the GlobalConfig named VM_METADATA to add `@GlobalConfigDef` with an explicit defaultValue and type (e.g. defaultValue = "true" and type = "Boolean" or the project's canonical boolean type) while keeping the existing `@GlobalConfigValidation`(validValues = {"true","false"}); ensure you annotate the VM_METADATA field (the GlobalConfig VM_METADATA) so it matches other configs' metadata declarations.compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java-133-138 (1)
133-138:⚠️ Potential issue | 🟠 Major
limit(batchSize)这里会让一小批失败记录长期霸占恢复批次。当前筛选只有
lastFlushFailed=true+limit(batchSize),但没有 claim/轮转/游标语义。只要前一批里有 VM 每轮都无法被成功推进并继续保留lastFlushFailed=true,后面的 stale VM 就可能一直进不了本任务的 batch,形成长期饥饿。这里需要一个“每次尝试都会推进”的排序/认领字段,或者显式的 claim 机制,而不是只靠固定的限流查询。compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java-135-156 (1)
135-156:⚠️ Potential issue | 🟠 Major将
lastFlushFailed字段的查询和更新条件改为true/false而不是1/0。
VmMetadataPathFingerprintVO.lastFlushFailed是boolean字段,但代码在 HQL 查询和更新中使用了数值字面量1和0。根据 JPQL/HQL 规范,boolean 类型的实体属性必须使用布尔字面量true和false,而非数值字面量。使用0/1可能导致类型检查失败或在不同数据库方言间产生移植性问题。需要修改的位置:
- 第 135 行:
WHERE fp.lastFlushFailed = 1→WHERE fp.lastFlushFailed = true- 第 154 行:
SET lastFlushFailed = 0→SET lastFlushFailed = false- 第 171 行:
SET lastFlushFailed = 0→SET lastFlushFailed = false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java` around lines 135 - 156, The HQL/JPQL uses numeric literals for a boolean entity field causing potential type/portability issues; in MetadataStaleRecoveryTask locate the query that selects VmMetadataPathFingerprintVO where fp.lastFlushFailed = 1 and change the predicate to fp.lastFlushFailed = true, and update the SQL.New(...) update statements that set lastFlushFailed = 0 to set lastFlushFailed = false (these refer to the VmMetadataPathFingerprintVO.lastFlushFailed updates used when staleRecoveryCount >= maxCycles and the later reset), making all boolean comparisons/assignments use true/false instead of 1/0.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-7342-7345 (1)
7342-7345:⚠️ Potential issue | 🟠 Major在读元数据失败时,不应设置 metadata 字段。
lines 7383-7386 的逻辑有缺陷:即使
!rsp.isSuccess()为真,代码仍然会执行reply.setMetadata(rsp.metadata)。应在成功时才设置 metadata:if (rsp.isSuccess()) { reply.setMetadata(rsp.metadata); } else { reply.setError(operr("operation error, because:%s", rsp.getError())); }同时请确认
metadataPath已在 agent 端进行路径规范化和前缀校验,特别是当来自 APIRegisterVmInstanceMsg 的用户供应路径时。Also applies to: 7342-7345
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 7342 - 7345, The current handler uses WriteVmInstanceMetadataCmd / Http<WriteVmInstanceMetadataRsp> to call the agent but sets reply.setMetadata(rsp.metadata) even when rsp.isSuccess() is false; change the response-handling logic so you only call reply.setMetadata(...) when rsp.isSuccess() is true and otherwise call reply.setError(operr("operation error, because:%s", rsp.getError())); also ensure metadataPath normalization and prefix validation is performed on the agent side (and validate inputs coming from APIRegisterVmInstanceMsg) before using metadataPath to prevent path traversal or invalid prefixes.compute/src/main/java/org/zstack/compute/vm/metadata/MetadataOrphanDetector.java-171-184 (1)
171-184:⚠️ Potential issue | 🟠 Major当前这条扫描链路还是同步
int聚合,和新 handler 的异步契约对不上。
detectOrphans()这里按同步返回值累计orphanCount,但本 PR 里的MetadataStorageHandler.scanMetadataVmUuids()已经定义成 callback 形式。后续如果直接接上 handler,要么得在周期线程里阻塞等结果,要么 summary 会在实际扫描完成前就打印;而现在直接返回0也让任务变成了静默 no-op。Also applies to: 196-224
header/src/main/java/org/zstack/header/vm/VmInstanceMetadataCodec.java-54-58 (1)
54-58:⚠️ Potential issue | 🟠 Major解码路径缺少
resourceUuid/vo.uuid一致性校验。
VmInstanceMetadataDTO.ResourceMetadata已经把这条约束写成 DTO 契约了,但这几个入口现在都只是直接toObject。损坏或伪造的元数据可以让外层 UUID 和内层 VO 指向不同资源,后续注册/修复流程会在错误对象上继续运行。Also applies to: 67-70, 93-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceMetadataCodec.java` around lines 54 - 58, The decode path (VmInstanceMetadataCodec.decode and the other similar decode methods at the referenced locations) must validate DTO contract consistency: after deserializing with JSONObjectUtil.toObject into VmInstanceMetadataDTO, iterate its ResourceMetadata entries and ensure each ResourceMetadata.resourceUuid equals the contained vo.uuid (e.g., resource.getVo().getUuid()); if any mismatch throw a clear exception (IllegalArgumentException or a domain-specific ValidationException) instead of returning the object so corrupted/forged metadata cannot proceed. Update VmInstanceMetadataCodec.decode and the other decode entry points to perform this check and fail-fast on inconsistency.header/src/main/java/org/zstack/header/vm/MetadataStorageHandler.java-79-92 (1)
79-92: 🛠️ Refactor suggestion | 🟠 Major用布尔参数表示写入模式,调用点很容易传错。
storageStructureChange同时决定 tmp 后缀和底层 OP type,true/false在调用处没有自解释性。这个接口是新引入的公共边界,最好现在就改成显式枚举或拆成两个方法。 As per coding guidelines, "避免使用布尔型参数造成含义不明确。"♻️ 建议改成显式模式
- void writeMetadata(String psUuid, String vmUuid, String payloadJson, - boolean storageStructureChange, Completion completion); + void writeMetadata(String psUuid, String vmUuid, String payloadJson, + MetadataWriteMode writeMode, Completion completion);enum MetadataWriteMode { NORMAL_UPDATE, STORAGE_STRUCTURE_CHANGE }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/MetadataStorageHandler.java` around lines 79 - 92, The writeMetadata API on MetadataStorageHandler currently uses a boolean storageStructureChange which is unclear at call sites; replace this boolean with an explicit enum (e.g., MetadataWriteMode with values NORMAL_UPDATE and STORAGE_STRUCTURE_CHANGE) or split into two intent-named methods (e.g., writeMetadataNormal / writeMetadataForStorageChange), update the signature of writeMetadata accordingly, and change all callers to use the enum constant or the new method names so the tmp suffix and OP type decision is explicit (ensure downstream implementations and tests that reference writeMetadata/storageStructureChange are updated to handle the new enum or new method variants).compute/src/main/java/org/zstack/compute/vm/metadata/MetadataPathDriftDetector.java-153-156 (1)
153-156:⚠️ Potential issue | 🟠 Major这里会把全量巡检放大成 N+1 查询。
每条指纹都单独查一次
VolumeVO,有卷时再查一次VolumeSnapshotVO。VM 数量一大,这个周期任务会稳定地产生数万次 SQL。建议先按 batch 收集vmUuid,再批量拉取 volume/snapshot 并按 VM 分组构造快照。Also applies to: 193-212
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataPathDriftDetector.java` around lines 153 - 156, The loop in MetadataPathDriftDetector (iterating VmMetadataPathFingerprintVO batch and calling buildCurrentPathSnapshot per vm) causes an N+1 query pattern because each fingerprint triggers separate VolumeVO and VolumeSnapshotVO lookups; instead, collect all vmUuid values from the batch up front, perform two bulk queries to fetch all related VolumeVOs and VolumeSnapshotVOs (by vmUuid or volumeUuid as appropriate), group those results by vmUuid, and then iterate the batch to buildCurrentPathSnapshot using the preloaded, grouped volume/snapshot data so no per-fingerprint DB calls are made (apply same fix for the code region around lines 193-212).header/src/main/java/org/zstack/header/vm/VmInstanceMetadataDTO.java-53-68 (1)
53-68:⚠️ Potential issue | 🟠 MajorBase64 不能保护这里声明的敏感信息。
如果
systemTags/resourceConfigs里真的会落密码、密钥等内容,把 JSON 再做一次 Base64 只是编码,不是加密;拿到主存储内容的人都能直接还原。这里至少要在构建阶段排除/脱敏敏感项,或者改成受管密钥加密后再持久化。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceMetadataDTO.java` around lines 53 - 68, The fields systemTags and resourceConfigs on VmInstanceMetadataDTO are documented as Base64-encoded JSON but Base64 does not protect secrets; update the construction/persistence flow to avoid storing raw secrets: either filter/sanitize sensitive keys during the build that produces the SystemTagVO/ResourceConfig JSON (exclude or mask password/secret/privateKey fields) or encrypt the JSON payload with a managed key before assigning to systemTags/resourceConfigs; locate the producers that serialize SystemTagVO list and ResourceConfig list (the code paths that create the Base64 JSON for VmInstanceMetadataDTO) and implement either redaction of sensitive fields or integration with the existing key management/encryption utility to store encrypted payloads instead of plain Base64.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-833-846 (1)
833-846:⚠️ Potential issue | 🟠 Major成功 flush 后没有清掉失败标志。
onFlushFailure()在重试耗尽时会把VmMetadataPathFingerprintVO.lastFlushFailed置成true,但这里更新已有指纹时只改了pathSnapshot和lastFlushTime。同一 VM 后续即使刷写成功,也会一直保留“上次 flush 失败”的陈旧状态。💡 建议修正
} else { fp.setPathSnapshot(pathJson); fp.setLastFlushTime(new Timestamp(System.currentTimeMillis())); + fp.setLastFlushFailed(false); dbf.update(fp); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 833 - 846, The existing update branch for VmMetadataPathFingerprintVO (in the block that finds by vmUuid and updates fp) doesn't clear the stale failure flag set by onFlushFailure(); update the code that handles the existing VmMetadataPathFingerprintVO (the else branch that calls fp.setPathSnapshot and fp.setLastFlushTime and dbf.update(fp)) to also reset fp.setLastFlushFailed(false) and reset fp.setStaleRecoveryCount(0) (or appropriate recovery fields) so a successful flush clears the prior failure state for that VM.storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java-1866-2131 (1)
1866-2131:⚠️ Potential issue | 🟠 Major注册成功后拿到的 VM 仍然缺少关键元数据。
当前只恢复了
VolumeVO、VolumeSnapshot*和VmInstanceVO;VmMetadata里的vmNicVOs、system tag/resource config、volumeSnapshotReference*等字段完全没有持久化,下面的 tag/config 逻辑也还停在注释里。API 返回成功后,数据库里的 VM 状态和 metadata 文件并不一致。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java` around lines 1866 - 2131, registerVmInstance currently persists only VolumeVO, VolumeSnapshot(*/Tree) and VmInstanceVO; you must also persist vmMetadata.vmNicVOs, vmMetadata.volumeSnapshotReference* (refs), vmMetadata.vmSystemTags and vmMetadata.vmResourceConfigs so DB matches metadata. In register-volume/register-snapshot/register-vmInstance flows: convert vmNicVOs to VmNicVO objects, set their vmInstanceUuid/owner fields and dbf.persistCollection them; convert snapshot reference entries to the appropriate VolumeSnapshotReferenceVO/VolumeSnapshotReferenceTreeVO (use the same pattern as VolumeSnapshotVO/TreeVO), set primaryStorageUuid/volumeUuid/treeUuid and dbf.persist or persistCollection them; restore and fix the commented system tag/resource config logic (SystemTagVO and ResourceConfigVO creation) to correctly match system tags via getResourceSystemTagFromSystem and resource configs via getResourceConfigFromSystem, set resourceUuid to newVm.getUuid(), then dbf.persistCollection for tags and configs; ensure accountUuid/primaryStorageUuid fields are set where applicable and that all dbf.persist/persistCollection calls occur before completion.success(vmInstanceInventory).compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-897-925 (1)
897-925:⚠️ Potential issue | 🟠 Major全量刷新和初始化绕过了前面的 KVM 过滤。
markDirty()明确只支持UserVm + KVM,但这几段 SQL 只按type='UserVm'建脏行/推进分页。功能开关打开或版本升级后,非 KVM 的UserVm也会被 Poller 拿去发UpdateVmInstanceMetadataMsg,只会制造无效 flush 和重试噪音。💡 建议修正
- "SELECT v.uuid, 1, 1 FROM VmInstanceVO v " + - "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid " + + "SELECT v.uuid, 1, 1 FROM VmInstanceVO v " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = 'KVM' AND v.uuid > :lastUuid " + "ORDER BY v.uuid ASC LIMIT :batchSize") @@ - "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = 'KVM' AND v.uuid > :lastUuid " + "ORDER BY v.uuid ASC LIMIT :batchSize") @@ - List<String> batch = SQL.New("SELECT v.uuid FROM VmInstanceVO v " + - "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid " + + List<String> batch = SQL.New("SELECT v.uuid FROM VmInstanceVO v " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = 'KVM' AND v.uuid > :lastUuid " + "ORDER BY v.uuid ASC LIMIT :batchSize", String.class) @@ - "SELECT v.uuid, 1, 0 FROM VmInstanceVO v " + - "LEFT JOIN VmMetadataDirtyVO d ON v.uuid = d.vmInstanceUuid " + - "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid AND d.vmInstanceUuid IS NULL " + + "SELECT v.uuid, 1, 0 FROM VmInstanceVO v " + + "LEFT JOIN VmMetadataDirtyVO d ON v.uuid = d.vmInstanceUuid " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = 'KVM' AND v.uuid > :lastUuid AND d.vmInstanceUuid IS NULL " + "ORDER BY v.uuid ASC LIMIT :batchSize") @@ - List<String> batchUuids = SQL.New("SELECT v.uuid FROM VmInstanceVO v " + - "WHERE v.type = 'UserVm' AND v.uuid > :lastUuid " + + List<String> batchUuids = SQL.New("SELECT v.uuid FROM VmInstanceVO v " + + "WHERE v.type = 'UserVm' AND v.hypervisorType = 'KVM' AND v.uuid > :lastUuid " + "ORDER BY v.uuid ASC", String.class)Also applies to: 995-1015
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 897 - 925, The SQL batches in VmMetadataDirtyMarker (used by markDirty()) currently only filter on v.type='UserVm' and thus include non-KVM UserVm rows; update each SQL.New(...) statement that inserts, updates, or selects VmInstanceVO rows to also filter by the KVM hypervisor (e.g. add "AND v.hypervisorType = 'KVM'" or a parameterized "AND v.hypervisorType = :hypervisor" to the WHERE clauses), pass the hypervisor param where used, and ensure both the insert-ignore, update, and subsequent SELECT for pagination use this additional constraint so only UserVm + KVM instances are processed.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-518-526 (1)
518-526:⚠️ Potential issue | 🟠 Major全局并发上限可以被并发绕过。
get() >= maxConcurrent和incrementAndGet()不是原子操作。triggerFlushForVm()和 Poller 并发提交时,多个线程都可能先通过检查,再一起把globalFlushInFlight加上去,实际 flush 数会超过配置值。💡 建议修正
int maxConcurrent = VmGlobalConfig.VM_METADATA_GLOBAL_MAX_CONCURRENT.value(Integer.class); - if (globalFlushInFlight.get() >= maxConcurrent) { - // 全局并发已满,释放 claim,Poller 下轮重试 - releaseClaim(vmUuid); - return; + while (true) { + int current = globalFlushInFlight.get(); + if (current >= maxConcurrent) { + // 全局并发已满,释放 claim,Poller 下轮重试 + releaseClaim(vmUuid); + return; + } + if (globalFlushInFlight.compareAndSet(current, current + 1)) { + break; + } } - globalFlushInFlight.incrementAndGet();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 518 - 526, 当前使用的检查 plus increment (globalFlushInFlight.get() >= maxConcurrent 之后 globalFlushInFlight.incrementAndGet()) 不是原子性的,会被并发绕过;在 VmMetadataDirtyMarker 的触发路径(triggerFlushForVm() / Poller)改为使用原子 CAS 循环或其它原子操作来做“检查并占位”:读取 globalFlushInFlight,若 >= VmGlobalConfig.VM_METADATA_GLOBAL_MAX_CONCURRENT.value(...) 则调用 releaseClaim(vmUuid) 并返回,否则通过 globalFlushInFlight.compareAndSet(old, old+1) 成功后继续;确保所有提前返回或失败路径仍会正确调用 release/减少计数(对应原先的 releaseClaim 行为),以避免计数泄露。storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java-1851-1854 (1)
1851-1854:⚠️ Potential issue | 🟠 Major失败分支没有把错误回给 API 调用方。
这里直接
publish(event),但没有event.setError(errorCode)。调用方会收到一个没有 inventory、也没有 error 的“成功”响应,错误被吞掉了。💡 建议修正
`@Override` public void fail(ErrorCode errorCode) { + event.setError(errorCode); bus.publish(event); chain.next(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java` around lines 1851 - 1854, 在 PrimaryStorageBase 中重写的 fail(ErrorCode errorCode) 分支没有把错误回传给 API;在发布事件之前调用 event.setError(errorCode)(并根据需要设置事件的成功/失败字段),然后再调用 bus.publish(event) 和 chain.next(),以确保调用方能接收到错误信息而不是空的“成功”响应。
🟡 Minor comments (15)
conf/db/upgrade/V4.10.29__schema.sql-14-17 (1)
14-17:⚠️ Potential issue | 🟡 Minor统一补齐
zstackschema 限定。这里新表名已经显式写了
zstack,但外键引用和清理语句仍是未限定表名。和本目录升级脚本约定不一致,也让执行更依赖当前默认 schema。建议把这些引用统一改成zstack....。建议修改
CONSTRAINT `fkVmMetadataDirtyVOVmInstanceEO` FOREIGN KEY (`vmInstanceUuid`) - REFERENCES `VmInstanceEO` (`uuid`) ON DELETE CASCADE, + REFERENCES `zstack`.`VmInstanceEO` (`uuid`) ON DELETE CASCADE, CONSTRAINT `fkVmMetadataDirtyVOManagementNodeVO` FOREIGN KEY (`managementNodeUuid`) - REFERENCES `ManagementNodeVO` (`uuid`) ON DELETE SET NULL + REFERENCES `zstack`.`ManagementNodeVO` (`uuid`) ON DELETE SET NULL @@ CONSTRAINT `fkVmMetadataPathFingerprintVOVmInstanceEO` FOREIGN KEY (`vmInstanceUuid`) - REFERENCES `VmInstanceEO` (`uuid`) ON DELETE CASCADE + REFERENCES `zstack`.`VmInstanceEO` (`uuid`) ON DELETE CASCADE @@ -DELETE FROM `GarbageCollectorVO` WHERE `name` LIKE 'update-vm-%-metadata-gc'; +DELETE FROM `zstack`.`GarbageCollectorVO` WHERE `name` LIKE 'update-vm-%-metadata-gc';As per coding guidelines: In ZStack, the database schema name is fixed as 'zstack'. For SQL upgrade scripts under conf/db/upgrade, continue using TABLE_SCHEMA = 'zstack' and reference tables as zstack.
UsedIpVO.Also applies to: 31-32, 36-36
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V4.10.29__schema.sql` around lines 14 - 17, The foreign key constraints fkVmMetadataDirtyVOVmInstanceEO and fkVmMetadataDirtyVOManagementNodeVO currently reference VmInstanceEO and ManagementNodeVO without the fixed schema; update these references to use the zstack schema prefix (e.g. zstack.`VmInstanceEO`, zstack.`ManagementNodeVO`) and similarly update any related cleanup/drop statements in this script that reference tables (e.g. UsedIpVO and other tables noted) so all table references consistently use zstack.`...` as per upgrade script conventions.docs/design/vm-metadata-04d-sblk读取与恢复.md-5-5 (1)
5-5:⚠️ Potential issue | 🟡 Minor给这些 fenced code block 补上语言标识。
这些位置都会触发
markdownlint的 MD040。这里大多是流程/伪代码,统一标成text即可;少数真实代码块再保留python、bash之类的具体语言。Also applies to: 31-31, 50-50, 72-72, 109-109, 128-128, 159-159, 173-173, 196-196, 231-231, 263-263, 293-293, 351-351, 378-378
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04d-sblk读取与恢复.md` at line 5, 文档中的若干 fenced code block 缺少语言标识,导致 markdownlint 报 MD040;请为所有三引号代码块补上语言标识(默认将流程/伪代码标为 text),真实代码块保留具体语言如 python、bash 等,以消除 MD040 警告并保持语法高亮一致性。docs/design/vm-metadata-04b-sblk二进制布局.md-15-15 (1)
15-15:⚠️ Potential issue | 🟡 Minor给这些 fenced code block 补上语言标识。
这几处都会触发
markdownlint的 MD040。这里主要是布局示意和伪代码,直接补成text就够了;真实代码示例再用具体语言。Also applies to: 155-155, 200-200, 224-224
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-04b-sblk二进制布局.md` at line 15, The fenced code blocks in docs/design/vm-metadata-04b-sblk二进制布局.md are missing language identifiers and trigger markdownlint MD040; update each triple-backtick fence at the indicated locations (including the blocks around lines 15, 155, 200, 224) to use a language tag (use "text" for layout/psuedocode blocks) by changing ``` to ```text so the linter recognizes the block language.docs/design/vm-metadata-00-设计约束索引.md-33-45 (1)
33-45:⚠️ Potential issue | 🟡 Minor约束总数和 Part 2 计数对不上。
按 Line 37-45 当前表格逐条数,Part 2 实际列了 9 条约束,但 Line 81 仍写成 8 条 / 总计 47 条。这个索引文档是后续交叉引用入口,数字不一致会让测试和设计对账失真。
📝 建议修改
-**总计**:6(Part 1b)+ 11(Part 1c)+ 8(Part 2)+ 14(Part 2b)+ 8(Part 3)= **47 条约束** +**总计**:6(Part 1b)+ 11(Part 1c)+ 9(Part 2)+ 14(Part 2b)+ 8(Part 3)= **48 条约束**Also applies to: 81-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-00-设计约束索引.md` around lines 33 - 45, The Part 2 constraints table lists nine items (C-DM-01 through C-TF-09) but the document still shows "8 条 / 总计 47 条" on Line 81; update that summary to "9 条 / 总计 48 条" (and any other occurrences of the old counts) so the Part 2 count and overall total match the table entries; locate the table under the "Part 2 — Dirty Mark & Poller" heading and the summary text near Line 81 to apply the change.docs/design/vm-metadata-00-设计约束索引.md-73-73 (1)
73-73:⚠️ Potential issue | 🟡 MinorLine 73 的引号不成对。
回滚"由外到内"+ 空树清理 SQL这里会破坏表格文本可读性,建议统一成成对引号并顺手补一下空格。📝 建议修改
-| C-03-4 | 回滚"由外到内"+ 空树清理 SQL | §9 | +| C-03-4 | 回滚“由外到内” + 空树清理 SQL | §9 |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-00-设计约束索引.md` at line 73, The table entry for row C-03-4 contains an unmatched quote and poor spacing ("回滚"由外到内"+ 空树清理 SQL); update the cell text (identify by the symbol C-03-4) to use paired quotes and consistent spacing, e.g. change to 回滚“由外到内”+ 空树清理 SQL or 回滚 "由外到内" + 空树清理 SQL so quotes are balanced and spaces around the + are normalized.docs/design/vm-metadata-07d-性能与补充测试.md-91-95 (1)
91-95:⚠️ Potential issue | 🟡 MinorPERF-32 行比表头多了一列。
第 95 行的 PERF-32 行有 5 个单元格,但表头只有 4 列。Markdown 渲染时会导致表格结构错误,最后一列会被截断。建议将条件信息合并到场景列中:
📝 建议修改
-| PERF-32 | 大量退避行跳过 | 500 dirty 行中 450 行有 nextRetryTime > now | WHERE 过滤效率 | < 5ms | +| PERF-32 | 大量退避行跳过(500 dirty 行中 450 行有 nextRetryTime > now) | WHERE 过滤效率 | < 5ms |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-07d-性能与补充测试.md` around lines 91 - 95, The PERF-32 table row has one extra cell causing a 5th column mismatch with the 4-column header; update the PERF-32 row so it only has four pipe-separated cells by merging the "500 dirty 行中 450 行有 nextRetryTime > now" condition into the "场景" column (i.e., make PERF-32's 场景 include the nextRetryTime condition) and keep the 度量指标 and 基准 cells as-is so the table columns align.docs/design/vm-metadata-01b-API拦截与VM解析.md-391-393 (1)
391-393:⚠️ Potential issue | 🟡 Minor表格行格式错误:两条 API 记录在同一行。
第 393 行在单个表格行中连接了
APIUpdateVmInstanceMsg和APIRecoverVmInstanceMsg,使用||分隔。Markdown 表格无法正确解析此格式,导致第二条记录显示不完整或被忽略。📝 建议修改
-| `APIUpdateVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false || `APIRecoverVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false | +| `APIUpdateVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false | +| `APIRecoverVmInstanceMsg` | CONFIG | `DirectVmUuidResolver` | false |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01b-API拦截与VM解析.md` around lines 391 - 393, The table row mistakenly combines two API records into one using "||"; split the single line into two separate Markdown table rows so `APIUpdateVmInstanceMsg` and `APIRecoverVmInstanceMsg` each have their own row with the same columns (Level: CONFIG, Resolver: DirectVmUuidResolver, updateOnFailure: false), ensuring each row begins and ends with proper pipe delimiters and matches the table header format.docs/design/vm-metadata-01b-API拦截与VM解析.md-79-88 (1)
79-88:⚠️ Potential issue | 🟡 Minor代码块需要补充语言标识符。
第 79-88 行和第 269-273 行的 fenced code blocks 缺少语言标识符,会导致 MD040 违规。建议使用
text标识符。同时,第 391-393 行的表格行结构存在问题:两行表格被合并到一行,应分别成行。
📝 建议修改
行 79-88:
-``` +```text行 269-273:
-``` +```text行 391-393:表格第二行应另起一行,不能与第一行共用。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01b-API拦截与VM解析.md` around lines 79 - 88, Add the missing language identifiers for the fenced code blocks around the described flow and the other block by changing their opening fences to use ```text (refer to the fenced code blocks currently showing the timing flow and the other block at the second location) and fix the malformed table by splitting the merged row into two separate lines so the second table row begins on its own line (the problematic table appears where the lines around 391-393 are merged).docs/design/vm-metadata-01c-存储层与模板虚拟机.md-227-243 (1)
227-243:⚠️ Potential issue | 🟡 Minor为这三个 fenced code block 补充 language 标识
当前文件中第 227-243、287-300 和 372-383 行的三个代码块都缺少 language 标识,会触发 markdownlint MD040 规则报告,且缺乏语法高亮。这些块用于展示 SQL 语句、时间序列图和 ASCII 结构图,建议统一补充
text标识符。📝 修改示例
-``` +```text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01c-存储层与模板虚拟机.md` around lines 227 - 243, Add explicit language identifiers ("text") to the three fenced code blocks that lack them: the block beginning with "Step 1: 暂停该 VM 的 Poller flush" (which contains the SQL INSERT/UPDATE), the subsequent block that shows the time-series diagram, and the block containing the ASCII storage/structure diagram; replace each opening ``` with ```text so markdownlint MD040 is satisfied and syntax highlighting is applied.docs/design/vm-metadata-01c-存储层与模板虚拟机.md-5-9 (1)
5-9:⚠️ Potential issue | 🟡 Minor修复目录中失效的锚点
- 第 8 行指向的"新增/修改代码文件清单"在正文中不存在,文档从第 2 节直接跳到第 3 节。
- 第 9 行错误地将"约束与不変量"编号为第 4 节,但实际上正文中该标题是第 3 节。
- 第 6 行的锚点
#16-存储迁移-poller-暂停的崩溃恢复与实际标题不符——标题后缀有"(H3 修复)",会影响自动生成的锚点。建议修改
-3. [新增/修改代码文件清单](`#3-新增修改代码文件清单`) -4. [约束与不変量](`#4-约束与不変量`) +3. [约束与不変量](`#3-约束与不変量`)第 6 行的锚点需要调整为与实际标题匹配(含后缀):
- 1.6. [存储迁移 Poller 暂停的崩溃恢复](`#16-存储迁移-poller-暂停的崩溃恢复`) + 1.6. [存储迁移 Poller 暂停的崩溃恢复](`#16-存储迁移-poller-暂停的崩溃恢复h3-修复`)或者修改标题本身,移除后缀括号,改为纯注释形式。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-01c-存储层与模板虚拟机.md` around lines 5 - 9, 修复文档目录中失效锚点:在 docs/design/vm-metadata-01c-存储层与模板虚拟机.md 中更新目录项使其与正文标题精确匹配,具体做法是:将第 6 行的锚点 `#16-存储迁移-poller-暂停的崩溃恢复` 改为包含正文中实际标题后缀(例如 包含“(H3 修复)”)或直接在正文中移除该后缀以保持锚点简洁;将第 8 行的“新增/修改代码文件清单”目录项改为与正文真实节号/标题一致或在正文新增该小节;将第 9 行“约束与不変量”编号/锚点改为匹配正文当前为第 3 节的标题,确保目录中的文本和锚点标识符与对应标题(节标题文本)完全一致以修复自动锚点生成问题。docs/design/vm-metadata-02b-高可用与运维.md-675-683 (1)
675-683:⚠️ Potential issue | 🟡 Minor比较表里的 SQL 策略还是旧表述。
§9.2 已经明确改成
INSERT IGNORE + UPDATE两步语义,这里仍写成INSERT ... ON DUPLICATE KEY UPDATE,后续实现或评审很容易被这张表带偏。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02b-高可用与运维.md` around lines 675 - 683, 表格中“SQL 策略”一栏仍列为 `INSERT ... ON DUPLICATE KEY UPDATE`,与 §9.2 中约定的两步语义不一致;请将该单元格改为与 §9.2 保持一致的描述(例如写明为 “INSERT IGNORE + UPDATE(两步语义)” 或等效表述),并在旁注或括号中标注与 §9.2 的对应关系以避免混淆(参照 §9.2 的描述以保证术语一致性)。docs/design/vm-metadata-02-脏标记与Poller.md-198-228 (1)
198-228:⚠️ Potential issue | 🟡 Minor给这些 fenced code block 补上语言标识。
这些代码块现在会持续触发 markdownlint 的 MD040,而且语法高亮也会缺失。这里补上最接近的语言即可,比如
java、sql、json或text。Also applies to: 647-647, 795-795, 847-847, 878-878
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02-脏标记与Poller.md` around lines 198 - 228, Add language identifiers to the fenced code blocks in docs/design/vm-metadata-02-脏标记与Poller.md so markdownlint MD040 stops firing and syntax highlighting works; for the SQL/DB flow blocks (showing INSERT IGNORE / DELETE / UPDATE sequences) use "sql", for the Java method overload snippet that includes "public boolean markDirty(String vmInstanceUuid) { return markDirty(vmInstanceUuid, false); }" use "java", and choose appropriate tags (e.g., "text" or "json") for the other examples mentioned (lines around 647, 795, 847, 878) to match their content. Ensure each triple-backtick fence is immediately followed by the language token.docs/design/vm-metadata-02b-高可用与运维.md-171-199 (1)
171-199:⚠️ Potential issue | 🟡 Minor这里的恢复策略描述已经和后文不一致了。
这一段写的是“重试耗尽后仅依赖下次 API 自动重试”,但同一文档的 §8.2.3 以及 Part 2 §4.8 已经引入了
lastFlushFailed+MetadataStaleRecoveryTask的后台重入队路径。建议统一口径,否则实现说明和运维说明会给出相反结论。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/vm-metadata-02b-高可用与运维.md` around lines 171 - 199, The document is inconsistent: §8.1 states "no background recovery, rely on API-triggered retries", but §8.2.3 and Part 2 §4.8 introduce lastFlushFailed + MetadataStaleRecoveryTask (background requeue), so reconcile them by choosing one canonical recovery flow and updating all sections and cross-references accordingly; either (A) remove/replace references to lastFlushFailed and MetadataStaleRecoveryTask in §8.2.3 and Part 2 §4.8 and adjust examples to match the API-triggered retry behavior described in §8.1, or (B) augment §8.1 to describe how retryCount>=maxRetry interacts with lastFlushFailed + MetadataStaleRecoveryTask (what sets lastFlushFailed, when MetadataStaleRecoveryTask will requeue or escalate, and how markDirty/nextRetryTime are affected), and then update the tables and alerting language to include the background recovery path; update all mentions of MetadataStaleEvent/MetadataStaleRecoveryTask/lastFlushFailed to the chosen single behavior and ensure cross-section references point to the authoritative description.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-7382-7387 (1)
7382-7387:⚠️ Potential issue | 🟡 Minor失败分支不要继续回填 metadata。
Line 7386 在
rsp.isSuccess()为false时仍然把rsp.metadata放进 reply,容易把部分内容或陈旧内容带给上游。这里最好只在成功分支设置 metadata。建议修改
public void success(ReadVmInstanceMetadataRsp rsp) { if (!rsp.isSuccess()) { reply.setError(operr("operation error, because:%s", rsp.getError())); + } else { + reply.setMetadata(rsp.metadata); } - reply.setMetadata(rsp.metadata); bus.reply(msg, reply); completion.done(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 7382 - 7387, The success callback for ReadVmInstanceMetadataRsp improperly sets reply metadata even when rsp.isSuccess() is false; update the anonymous success(ReadVmInstanceMetadataRsp rsp) handler so that reply.setMetadata(rsp.metadata) is only executed inside the successful branch (i.e., when rsp.isSuccess() returns true) and ensure that when rsp.isSuccess() is false you set reply.setError(...) and immediately bus.reply(msg, reply) without touching metadata; locate the code by the success(ReadVmInstanceMetadataRsp) method, the ReadVmInstanceMetadataRsp type, and the reply.setMetadata / reply.setError calls to make the change.header/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsg.java-59-61 (1)
59-61:⚠️ Potential issue | 🟡 Minor
__example__()现在返回空消息,生成的 API 示例不可用。至少要填上
metadataPath、primaryStorageUuid、clusterUuid这些必填字段,否则 Groovy/Markdown 文档看不出最小可调用请求。 As per coding guidelines, "API 类需要实现 example 方法以便生成 API 文档,并确保生成对应的 Groovy API Template 与 API Markdown 文件。"📝 建议补全示例
public static APIRegisterVmInstanceMsg __example__() { APIRegisterVmInstanceMsg msg = new APIRegisterVmInstanceMsg(); + msg.setMetadataPath("/path/to/vm-metadata.json"); + msg.setPrimaryStorageUuid(uuid()); + msg.setClusterUuid(uuid()); return msg; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsg.java` around lines 59 - 61, The __example__ method in APIRegisterVmInstanceMsg returns an empty message; update APIRegisterVmInstanceMsg.__example__() to construct and return a minimal valid example by setting required fields such as metadataPath, primaryStorageUuid and clusterUuid (and any other mandatory properties used by validation) on the APIRegisterVmInstanceMsg instance so the generated Groovy/Markdown API template and documentation show a runnable minimal request.
| private static final List<String> SUPPORTED_PS_TYPES = List.of( | ||
| "SharedBlock", | ||
| "LocalStorage", | ||
| "NFS" | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== build files =="
fd 'pom\.xml$|build\.gradle$|build\.gradle\.kts$' -a
echo
echo "== Maven compiler settings =="
rg -n --glob 'pom.xml' '<maven.compiler.source>|<maven.compiler.target>|<release>|<source>|<target>' .
echo
echo "== Gradle Java compatibility settings =="
rg -n --glob 'build.gradle' --glob 'build.gradle.kts' 'sourceCompatibility|targetCompatibility|JavaVersion' .Repository: MatheMatrix/zstack
Length of output: 21864
改用 Arrays.asList(...) 或 Collections.unmodifiableList(...)。
仓库编译目标为 Java 8(根目录 pom.xml 第 21-22 行设置 maven.compiler.source 和 maven.compiler.target 为 8),但 List.of(...) 仅在 Java 9+ 才可用,此处会导致编译失败。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataOrphanDetector.java`
around lines 73 - 77, The static field SUPPORTED_PS_TYPES in
MetadataOrphanDetector uses List.of(...) which requires Java 9+; replace it with
a Java 8 compatible immutable list, e.g., create the list with
Arrays.asList("SharedBlock", "LocalStorage", "NFS") and wrap with
Collections.unmodifiableList(...) (or use Collections.unmodifiableList(new
ArrayList<>(Arrays.asList(...)))) to maintain immutability; update the
SUPPORTED_PS_TYPES declaration in class MetadataOrphanDetector accordingly.
| // Step 2: 查询刚认领到的行(DP-01 修复:增加 lastClaimTime 过滤, | ||
| // 仅返回本轮 CAS 认领的行,避免与 triggerFlushForVm 并发认领的行混入) | ||
| Timestamp thisCycleCutoff = Timestamp.from(Instant.now().minus(Duration.ofSeconds(5))); | ||
| return Q.New(VmMetadataDirtyVO.class) | ||
| .eq(VmMetadataDirtyVO_.managementNodeUuid, Platform.getManagementServerId()) | ||
| .gte(VmMetadataDirtyVO_.lastClaimTime, thisCycleCutoff) | ||
| .list(); |
There was a problem hiding this comment.
这里会把别的路径刚认领的行也当成本轮 Poller 的结果。
lastClaimTime >= now-5s 不能区分“本轮 claimDirtyRows() 刚认领”与“triggerFlushForVm() 几秒前认领”。同一 vmUuid 被重复 submitFlushTask() 后,exceedMaxPendingCallback() 会执行 releaseClaim(vmUuid),把正在 flush 的 claim 提前释放,其他 MN 就能抢到这行并发刷写。这里最好用唯一 claim token/batch 标记回查,而不是靠时间窗近似匹配。
Also applies to: 550-554
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java`
around lines 490 - 496, The query using lastClaimTime
(VmMetadataDirtyVO_.lastClaimTime >= thisCycleCutoff) cannot reliably
distinguish rows claimed by thisCycle's claimDirtyRows() from rows claimed
earlier by triggerFlushForVm(); change the claiming flow to generate a unique
claim token/batch id in claimDirtyRows(), persist it on VmMetadataDirtyVO (e.g.
add claimToken field), set that token when claiming rows, and then replace the
time-based filter in the retrieval code with an equality filter on
VmMetadataDirtyVO_.claimToken to return only rows belonging to the current claim
batch; also update submitFlushTask(), exceedMaxPendingCallback(), and
releaseClaim(vmUuid) handling to set/clear or check the claimToken
appropriately.
| @RestResponse(allTo = "inventory") | ||
| public class APIRegisterVmInstanceFromMetadataEvent extends APIEvent { | ||
| private VmInstanceInventory inventory; | ||
| private List<String> warnings; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 检查项目中使用 allTo = "inventory" 注解且包含多个字段的 API 事件类
rg -l '@RestResponse.*allTo.*inventory' --type java | xargs -I {} sh -c 'echo "=== {} ===" && rg -A 5 "class.*extends.*API" {}'Repository: MatheMatrix/zstack
Length of output: 50376
🏁 Script executed:
# Get the full content of APIRegisterVmInstanceFromMetadataEvent.java
find . -name "APIRegisterVmInstanceFromMetadataEvent.java" -type f | head -1 | xargs wc -lRepository: MatheMatrix/zstack
Length of output: 152
🏁 Script executed:
# Read the file content
find . -name "APIRegisterVmInstanceFromMetadataEvent.java" -type f | head -1 | xargs cat -nRepository: MatheMatrix/zstack
Length of output: 1587
🏁 Script executed:
# Find similar classes with warnings field that might have `@RestResponse`
rg -l "warnings" --type java | xargs -I {} rg -B 2 "private List<String> warnings" {} | head -50Repository: MatheMatrix/zstack
Length of output: 456
🏁 Script executed:
# Check APIReadVmInstanceMetadataReply specifically
fd "APIReadVmInstanceMetadataReply.java" | xargs cat -nRepository: MatheMatrix/zstack
Length of output: 2245
修改 @RestResponse 注解以正确暴露所有字段。
类中包含两个字段 inventory 和 warnings,但使用了 @RestResponse(allTo = "inventory") 注解。该注解仅会在 REST 响应中暴露 inventory 字段,导致 warnings 字段无法被正确序列化。
建议改为 @RestResponse(fieldsTo = {"all"}) 以确保两个字段都能在 REST 响应中正确暴露。同时,__example__() 方法应设置 warnings 字段(如 warnings = java.util.Collections.emptyList()),以便为 API 文档提供完整示例。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.java`
around lines 8 - 11, Replace the `@RestResponse`(allTo = "inventory") on class
APIRegisterVmInstanceFromMetadataEvent with `@RestResponse`(fieldsTo = {"all"}) so
both fields (inventory and warnings) are exposed in the REST response, and
update the class' __example__() method to populate the warnings field (e.g.,
warnings = java.util.Collections.emptyList()) along with the existing inventory
example to ensure the API documentation includes both fields.
| responseClass = APIReimageVmInstanceEvent.class, | ||
| category = "vmInstance" | ||
| ) | ||
| @MetadataImpact(MetadataImpact.Impact.STORAGE) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 查找 MetadataImpact 注解的定义位置
echo "=== 查找 MetadataImpact 定义 ==="
fd -e java MetadataImpact --exec grep -l "^public.*@interface\|^@interface\|^public `@interface`" {} \; 2>/dev/null || \
ast-grep --pattern '@interface MetadataImpact { $$$ }'
echo ""
echo "=== 查找 MetadataImpact 类/注解声明 ==="
rg -n "(`@interface`|class|enum)\s+MetadataImpact" --type java
echo ""
echo "=== 检查其他文件如何 import MetadataImpact ==="
rg -n "import.*MetadataImpact" --type java | head -10Repository: MatheMatrix/zstack
Length of output: 1732
🏁 Script executed:
cat -n header/src/main/java/org/zstack/header/vm/APIReimageVmInstanceMsg.java | head -30Repository: MatheMatrix/zstack
Length of output: 1147
缺少 MetadataImpact 注解的 import 语句
添加了 @MetadataImpact(MetadataImpact.Impact.STORAGE) 注解,但文件中没有对应的 import 语句。即使 MetadataImpact 与本文件在同一包 (org.zstack.header.vm) 中,Java 仍然要求显式导入注解以在装饰器中使用。不添加该 import 会导致编译失败。
🔧 建议添加 import
import org.springframework.http.HttpMethod;
import org.zstack.header.message.APIMessage;
import org.zstack.header.message.APIParam;
import org.zstack.header.rest.RestRequest;
+import org.zstack.header.vm.MetadataImpact;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@header/src/main/java/org/zstack/header/vm/APIReimageVmInstanceMsg.java` at
line 18, Add an explicit import for the MetadataImpact annotation so the
`@MetadataImpact`(MetadataImpact.Impact.STORAGE) usage in APIReimageVmInstanceMsg
compiles; specifically import the MetadataImpact type (e.g., MetadataImpact) at
the top of the file before the class declaration to resolve the annotation
reference used on APIReimageVmInstanceMsg.
| bus.send(umsg, new CloudBusCallBack(msg) { | ||
| @Override | ||
| public void run(MessageReply r) { | ||
| UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); | ||
| if (!r.isSuccess()) { | ||
| reply.setError(Platform.operr("failed to update vm[uuid=%s] on hypervisor.", self.getUuid()) | ||
| .withCause(r.getError())); | ||
| } | ||
| bus.reply(msg, reply); | ||
| } |
There was a problem hiding this comment.
请通过 completion 返回结果,不要在这里直接 bus.reply()。
这个 override 已经拿到了 ReturnValueCompletion,但当前分支完全绕过了它;这和本类其他 handle(..., completion) 的契约不一致,外层调用方很容易卡住,或者出现重复 reply。顺手也把失败文案里的资源标识修掉:这里现在把 self.getUuid() 当成了 VM UUID,实际它是主存储 UUID。
🛠️ 建议修改
bus.send(umsg, new CloudBusCallBack(msg) {
`@Override`
public void run(MessageReply r) {
UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply();
if (!r.isSuccess()) {
- reply.setError(Platform.operr("failed to update vm[uuid=%s] on hypervisor.", self.getUuid())
- .withCause(r.getError()));
+ completion.fail(Platform.operr(
+ "failed to update vm metadata on hypervisor for root volume[uuid:%s]",
+ msg.getRootVolumeUuid()
+ ).withCause(r.getError()));
+ return;
}
- bus.reply(msg, reply);
+ completion.success(reply);
}
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java`
around lines 3814 - 3823, 在 LocalStorageKvmBackend 中替换当前 bus.send 回调里的直接
bus.reply(msg, reply) 行为,改为使用外层传入的 ReturnValueCompletion(completion)来返回结果:在回调中根据
r.isSuccess() 调用 completion.success(reply) 或 completion.fail(err);不要直接对 msg 调用
bus.reply 以免重复回复或卡住调用者;同时修正文案里用到的资源标识,将错误信息中把 self.getUuid() 误当作 VM UUID
的描述改为主存储(primary storage)UUID,并把 r.getError() 包装成合适的 PlatformOpErr 传入
completion.fail。
| List<VolumeVO> volumes = new ArrayList<>(); | ||
| volumesString.forEach(v -> volumes.add(JSONObjectUtil.toObject(v, VolumeVO.class))); | ||
|
|
||
| List<VolumeVO> newVolumes = new ArrayList<>(); | ||
| volumes.forEach(v -> { | ||
| VolumeVO vo = new VolumeVO(); | ||
| // vo.setRootImageUuid(vo.getRootImageUuid()); | ||
| vo.setAccountUuid(msg.getSession().getAccountUuid()); | ||
| vo.setPrimaryStorageUuid(msg.getPrimaryStorageUuid()); | ||
| vo.setInstallPath(v.getInstallPath()); | ||
|
|
||
| vo.setCreateDate(v.getCreateDate()); | ||
| vo.setDescription(v.getDescription()); | ||
| vo.setName(v.getName()); | ||
| vo.setSize(v.getSize()); | ||
| vo.setActualSize(v.getActualSize()); | ||
| vo.setState(v.getState()); | ||
| vo.setUuid(v.getUuid()); | ||
| vo.setVmInstanceUuid(v.getVmInstanceUuid()); | ||
| vo.setType(v.getType()); | ||
| vo.setCreateDate(v.getCreateDate()); | ||
| vo.setLastOpDate(v.getLastOpDate()); | ||
| vo.setDeviceId(v.getDeviceId()); | ||
| vo.setStatus(v.getStatus()); | ||
| vo.setFormat(v.getFormat()); | ||
| vo.setShareable(v.isShareable()); | ||
| vo.setVolumeQos(v.getVolumeQos()); | ||
| vo.setLastDetachDate(v.getLastDetachDate()); | ||
| vo.setLastVmInstanceUuid(v.getLastVmInstanceUuid()); | ||
| vo.setLastAttachDate(v.getLastAttachDate()); | ||
| vo.setProtocol(v.getProtocol()); | ||
| newVolumes.add(vo); | ||
| }); | ||
| dbf.persistCollection(newVolumes); | ||
| trigger.next(); | ||
| } | ||
| }); | ||
|
|
||
| flow(new NoRollbackFlow() { | ||
| String __name__ = "register-snapshot"; | ||
|
|
||
| @Override | ||
| public void run(FlowTrigger trigger, Map data) { | ||
| // 快照 | ||
| vmMetadata.volumeSnapshots.forEach((volumeUuid, snapshotList) -> { | ||
| // 一个 volume 有多个快照树 | ||
| // key = treeuuid | ||
| // value = snapshosts | ||
| Map<String, List<VolumeSnapshotInventory>> snapshotsByTreeUuid = new HashMap<>(); | ||
| snapshotList.forEach(snapshot -> { | ||
| VolumeSnapshotInventory inv = JSONObjectUtil.toObject(snapshot, VolumeSnapshotInventory.class); | ||
| if (snapshotsByTreeUuid.containsKey(inv.getTreeUuid())) { | ||
| snapshotsByTreeUuid.get(inv.getTreeUuid()).add(inv); | ||
| } else { | ||
| snapshotsByTreeUuid.put(inv.getTreeUuid(), new ArrayList<>()); | ||
| snapshotsByTreeUuid.get(inv.getTreeUuid()).add(inv); | ||
| } | ||
| }); | ||
|
|
||
| // 遍历每一颗树 | ||
| snapshotsByTreeUuid.forEach((treeUuid, snapshots) -> { | ||
| //构建快照树 | ||
| VolumeSnapshotTree tree = VolumeSnapshotTree.fromInventories(snapshots); | ||
| // 层级遍历 快照 | ||
| List<VolumeSnapshotInventory> levelOrderTraversals = tree.levelOrderTraversal(); | ||
| // 判断当前树有没有 latest 节点 | ||
| boolean treeIsCurrent = levelOrderTraversals.stream().anyMatch(VolumeSnapshotInventory::isLatest); | ||
|
|
||
| // 先创建快照树,VolumeSnapshotVO 外键依赖 VolumeSnapshotTreeVO | ||
| VolumeSnapshotTreeVO newTree = new VolumeSnapshotTreeVO(); | ||
| newTree.setCurrent(treeIsCurrent); | ||
| newTree.setVolumeUuid(volumeUuid); | ||
| newTree.setUuid(treeUuid); | ||
| newTree.setStatus(VolumeSnapshotTreeStatus.Completed); | ||
| dbf.persist(newTree); | ||
|
|
||
| // 按照层级遍历的快照构建VolumeSnapshotTreeVO | ||
| levelOrderTraversals.forEach(snapshot -> { | ||
| VolumeSnapshotVO vo = new VolumeSnapshotVO(); | ||
| vo.setPrimaryStorageUuid(msg.getPrimaryStorageUuid()); | ||
| vo.setPrimaryStorageInstallPath(snapshot.getPrimaryStorageInstallPath()); | ||
|
|
||
| vo.setName(snapshot.getName()); | ||
| vo.setCreateDate(snapshot.getCreateDate()); | ||
| vo.setDescription(snapshot.getDescription()); | ||
| vo.setLastOpDate(snapshot.getLastOpDate()); | ||
| vo.setParentUuid(snapshot.getParentUuid()); | ||
| vo.setState(VolumeSnapshotState.valueOf(snapshot.getState())); | ||
| vo.setType(snapshot.getType()); | ||
| vo.setVolumeUuid(snapshot.getVolumeUuid()); | ||
| vo.setFormat(snapshot.getFormat()); | ||
| vo.setUuid(snapshot.getUuid()); | ||
| vo.setStatus(VolumeSnapshotStatus.valueOf(snapshot.getStatus())); | ||
| vo.setLatest(snapshot.isLatest()); | ||
| vo.setSize(snapshot.getSize()); | ||
| vo.setVolumeType(snapshot.getVolumeType()); | ||
| vo.setTreeUuid(snapshot.getTreeUuid()); | ||
| vo.setDistance(snapshot.getDistance()); | ||
| dbf.persist(vo); | ||
| }); | ||
| }); | ||
| }); | ||
|
|
||
| // 快照组 | ||
| List<VolumeSnapshotGroupVO> newGroups = new ArrayList<>(); | ||
| vmMetadata.volumeSnapshotGroupVO.forEach(group -> { | ||
| VolumeSnapshotGroupVO vo = JSONObjectUtil.toObject(group, VolumeSnapshotGroupVO.class); | ||
| vo.setAccountUuid(msg.getSession().getAccountUuid()); | ||
| newGroups.add(vo); | ||
| }); | ||
| dbf.persistCollection(newGroups); | ||
|
|
||
| // 快照组ref | ||
| List<VolumeSnapshotGroupRefVO> newGroupRefs = new ArrayList<>(); | ||
| vmMetadata.volumeSnapshotGroupRefVO.forEach(group -> { | ||
| VolumeSnapshotGroupRefVO vo = JSONObjectUtil.toObject(group, VolumeSnapshotGroupRefVO.class); | ||
| newGroupRefs.add(vo); | ||
| }); | ||
| dbf.persistCollection(newGroupRefs); | ||
|
|
||
| trigger.next(); | ||
| } | ||
| }); | ||
|
|
||
| flow(new NoRollbackFlow() { | ||
| String __name__ = "register-vmInstance"; | ||
|
|
||
| @Override | ||
| public void run(FlowTrigger trigger, Map data) { | ||
| VmInstanceVO metaVm = JSONObjectUtil.toObject(vmMetadata.vmInstanceVO, VmInstanceVO.class); | ||
| VmInstanceVO newVm = new VmInstanceVO(); | ||
|
|
||
| newVm.setClusterUuid(msg.getClusterUuid()); | ||
| newVm.setHostUuid(msg.getHostUuid()); | ||
| // 寻找有没有cache的tag lv 构建imageCache | ||
| // newVm.setImageUuid(); | ||
|
|
||
| newVm.setUuid(metaVm.getUuid()); | ||
| newVm.setName(metaVm.getName()); | ||
| newVm.setDescription(metaVm.getDescription()); | ||
| newVm.setType(metaVm.getType()); | ||
| newVm.setHypervisorType(metaVm.getHypervisorType()); | ||
| newVm.setCreateDate(metaVm.getCreateDate()); | ||
| newVm.setLastOpDate(metaVm.getLastOpDate()); | ||
| newVm.setState(metaVm.getState()); | ||
| newVm.setRootVolumeUuid(metaVm.getRootVolumeUuid()); | ||
| newVm.setInternalId(metaVm.getInternalId()); | ||
| newVm.setCpuNum(metaVm.getCpuNum()); | ||
| newVm.setCpuSpeed(metaVm.getCpuSpeed()); | ||
| newVm.setMemorySize(metaVm.getMemorySize()); | ||
| newVm.setReservedMemorySize(metaVm.getReservedMemorySize()); | ||
| newVm.setAllocatorStrategy(metaVm.getAllocatorStrategy()); | ||
| newVm.setPlatform(metaVm.getPlatform()); | ||
| newVm.setArchitecture(metaVm.getArchitecture()); | ||
| newVm.setGuestOsType(metaVm.getGuestOsType()); | ||
| dbf.persist(newVm); |
There was a problem hiding this comment.
这段注册流程没有事务/回滚保护。
这里连续 persist/persistCollection 多张表,但所有 step 都是 NoRollbackFlow,error() 里也只做 completion.fail()。只要 snapshot/group/vm 任一步失败,前面已经写入的 volume/tree/snapshot 就会残留在库里,下一次重试很容易直接撞上重复主键或留下孤儿记录。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java`
around lines 1907 - 2062, The flows "register-snapshot" and
"register-vmInstance" (and the earlier volume persist block) perform multiple
dbf.persist / dbf.persistCollection calls inside NoRollbackFlow, so partial
writes will remain on error; convert these NoRollbackFlow blocks into
transactional flows (or wrap the body in a single DB transaction) so that on
failure the transaction is rolled back: replace NoRollbackFlow with a
Flow/SyncFlow implementation that starts a DB transaction (via
dbf.beginTransaction()/commit()/rollback() or the project's transactional
helper), perform all dbf.persist/dbf.persistCollection inside that transaction,
and on exception call rollback and trigger.fail()/completion.fail(); ensure the
flow names ("register-snapshot", "register-vmInstance") and the volume persist
section use the same transaction so any error undoes prior writes to avoid
duplicate PKs or orphaned records.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
header/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkRouterInterfaceIpMsg.java (1)
28-33:⚠️ Potential issue | 🟡 Minor
__example__方法返回类型错误(预存问题)虽然此代码段不是本次 PR 的变更,但发现了一个预存的复制粘贴错误:
__example__方法声明返回APIGetL3NetworkMtuMsg,但当前类是APIGetL3NetworkRouterInterfaceIpMsg。这会导致 API 文档生成不正确。🐛 建议修复
- public static APIGetL3NetworkMtuMsg __example__() { - APIGetL3NetworkMtuMsg msg = new APIGetL3NetworkMtuMsg(); + public static APIGetL3NetworkRouterInterfaceIpMsg __example__() { + APIGetL3NetworkRouterInterfaceIpMsg msg = new APIGetL3NetworkRouterInterfaceIpMsg(); msg.setL3NetworkUuid(uuid()); return msg; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkRouterInterfaceIpMsg.java` around lines 28 - 33, The __example__ method in class APIGetL3NetworkRouterInterfaceIpMsg incorrectly declares and constructs APIGetL3NetworkMtuMsg (copy/paste error); update the method signature return type and the instantiated object to APIGetL3NetworkRouterInterfaceIpMsg and keep setting the L3 network UUID via setL3NetworkUuid(uuid()) so the example returns the correct message type.
🧹 Nitpick comments (6)
core/src/main/java/org/zstack/core/debug/APICleanQueueMsg.java (1)
33-33: 预存在的命名问题:managementiUuid存在拼写错误。字段名
managementiUuid中多了一个字母i,应为managementUuid。这是预存在的问题,不属于本次PR变更范围,可考虑在后续维护中修复。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/debug/APICleanQueueMsg.java` at line 33, 字段名 managementiUuid 拼写错误,应改为 managementUuid;请在 APICleanQueueMsg 中重命名该字段并同时更新所有相关访问器和引用(例如 getter/setter、构造器、序列化/反序列化、JSON 映射、数据库字段映射及其他类中对 managementiUuid 的引用)以避免编译或运行时错误;若需兼容外部序列化格式,请保留向后兼容策略(如 `@JsonProperty` 或临时桥接方法)直至外部使用方迁移完毕。header/src/main/java/org/zstack/header/storage/backup/APIScanBackupStorageMsg.java (1)
21-24:__example__方法未设置示例值。现有的
__example__方法返回的消息实例未设置backupStorageUuid,可能导致 API 文档示例不够完整。建议补充示例 UUID 值。♻️ 建议的改进
public static APIScanBackupStorageMsg __example__() { APIScanBackupStorageMsg msg = new APIScanBackupStorageMsg(); + msg.setBackupStorageUuid(uuid()); return msg; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/backup/APIScanBackupStorageMsg.java` around lines 21 - 24, The __example__ method in APIScanBackupStorageMsg returns an instance without setting example data; update APIScanBackupStorageMsg.__example__ to populate the backupStorageUuid field with a sample UUID (e.g., a valid UUID string) on the msg instance before returning so the API example includes a concrete backupStorageUuid value.header/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageUsageReportMsg.java (1)
8-9: Import 顺序可以优化新增的
org.zstack.header.vm.MetadataImpact导入放在了java.util.List之后,建议将其与其他org.zstack导入(第 4-6 行)放在一起,保持导入分组的一致性。♻️ 建议的修改
import org.zstack.header.message.APIParam; import org.zstack.header.message.APISyncCallMessage; import org.zstack.header.rest.RestRequest; +import org.zstack.header.vm.MetadataImpact; import java.util.List; -import org.zstack.header.vm.MetadataImpact;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageUsageReportMsg.java` around lines 8 - 9, The import order is inconsistent: move the new org.zstack.header.vm.MetadataImpact import to the group with the other org.zstack imports in APIGetPrimaryStorageUsageReportMsg so all org.zstack.* imports (e.g., existing imports on lines 4-6) are together and java.util.List remains in the standard java.* group; update the import section to follow the project's grouping (java.*, javax.*, third-party, org.zstack.*) and keep MetadataImpact adjacent to the other org.zstack imports.header/src/main/java/org/zstack/header/network/service/APIDetachNetworkServiceProviderFromL2NetworkMsg.java (1)
7-11: 格式不一致:注解周围存在多余空行。第 8 行和第 10 行的空行导致格式与 PR 中其他文件不一致。建议移除多余空行,使注解紧邻导入语句和类声明。
♻️ 建议修复
import org.zstack.header.network.l2.L2NetworkVO; import org.zstack.header.vm.MetadataImpact; - `@MetadataImpact`(MetadataImpact.Impact.NONE) - public class APIDetachNetworkServiceProviderFromL2NetworkMsg extends APIMessage {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/network/service/APIDetachNetworkServiceProviderFromL2NetworkMsg.java` around lines 7 - 11, 移除 `@MetadataImpact` 注解周围的多余空行:在 APIDetachNetworkServiceProviderFromL2NetworkMsg 类中,将导入语句 import org.zstack.header.vm.MetadataImpact;、注解 `@MetadataImpact`(MetadataImpact.Impact.NONE) 和类声明 public class APIDetachNetworkServiceProviderFromL2NetworkMsg extends APIMessage 放在连续行上,确保注解紧邻导入和类声明以保持与仓库中其他文件的格式一致。header/src/main/java/org/zstack/header/network/l2/APICreateL2NetworkMsg.java (1)
47-49: 注解后的空行格式不一致。第 48 行的空行导致
@MetadataImpact注解与类声明之间存在多余空行,与 PR 中其他文件的格式不一致(如APIDeleteL2NetworkMsg.java)。建议删除该空行以保持一致性。♻️ 建议修复
`@MetadataImpact`(MetadataImpact.Impact.NONE) - public abstract class APICreateL2NetworkMsg extends APICreateMessage implements APIAuditor {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/network/l2/APICreateL2NetworkMsg.java` around lines 47 - 49, Remove the extra blank line between the `@MetadataImpact` annotation and the class declaration in APICreateL2NetworkMsg so the annotation sits immediately above the class definition (i.e., ensure `@MetadataImpact`(...) is directly followed by "public abstract class APICreateL2NetworkMsg ..."), matching the formatting used in files like APIDeleteL2NetworkMsg.java.header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.java (1)
10-18: 建议在 REST 路径中包含主存储 UUID当前 API 路径
/primary-storage/vm-instances/metadata没有包含{uuid}路径变量来标识具体的主存储。对比同模块中的APIReadVmInstanceMetadataMsg,其路径使用了/vm-instances/{vmUuid}/metadata模式。根据 REST 设计惯例和项目 coding guidelines,GET 请求的资源标识符通常应通过路径变量传递,而非请求体。建议调整为:
♻️ 建议的路径修改
`@RestRequest`( - path = "/primary-storage/vm-instances/metadata", + path = "/primary-storage/{uuid}/vm-instances/metadata", method = HttpMethod.GET, responseClass = APIGetVmInstanceMetadataFromPrimaryStorageReply.class )请确认当前路径设计是否是有意为之(例如为了支持批量查询多个主存储的元数据)。如果是单主存储查询,建议采用路径变量方式。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.java` around lines 10 - 18, The REST path currently on class APIGetVmInstanceMetadataFromPrimaryStorageMsg lacks a path variable for the target primary storage UUID; update the `@RestRequest` path to include a path variable (e.g. "/primary-storage/{uuid}/vm-instances/metadata") and ensure the existing `@APIParam` field uuid is used as that path variable (adjust any annotations or parameter binding if your framework requires explicit path-to-field binding). If the API is intended for batch queries, leave the path and document the behavior; otherwise switch to the single-resource path variable approach and update related clients/tests that call this endpoint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@header/src/main/java/org/zstack/header/image/APICreateRootVolumeTemplateFromRootVolumeMsg.java`:
- Line 24: The MetadataImpact on APICreateRootVolumeTemplateFromRootVolumeMsg is
incorrectly set to NONE; update the annotation on class
APICreateRootVolumeTemplateFromRootVolumeMsg to the correct impact (e.g.,
MetadataImpact.Impact.WRITE or the enum constant used for
storage/template-affecting operations) so metadata dirty/refresh flows run;
mirror the value used by other template/storage-related messages to ensure
consistency.
In
`@header/src/main/java/org/zstack/header/storage/snapshot/APIBackupVolumeSnapshotMsg.java`:
- Line 36: 类注解 MetadataImpact 标记错误:在 APIBackupVolumeSnapshotMsg 上将
`@MetadataImpact`(MetadataImpact.Impact.NONE) 改为
`@MetadataImpact`(MetadataImpact.Impact.STORAGE);查找 APIBackupVolumeSnapshotMsg
类并替换注解值以与其他 snapshot 相关的 API 保持一致,确保元数据刷新/拦截逻辑能正确识别为 STORAGE 影响。
In
`@header/src/main/java/org/zstack/header/storage/snapshot/APIBatchDeleteVolumeSnapshotMsg.java`:
- Line 29: 当前在 APIBatchDeleteVolumeSnapshotMsg 上的注解
`@MetadataImpact`(MetadataImpact.Impact.NONE) 分类错误,应改为反映会触发存储元数据变更的 STORAGE;请修改
APIBatchDeleteVolumeSnapshotMsg 的 MetadataImpact 注解为
MetadataImpact.Impact.STORAGE(定位符:APIBatchDeleteVolumeSnapshotMsg、MetadataImpact
注解),保存并运行相关单元/集成测试以确保删除快照流程会触发存储元数据更新。
---
Outside diff comments:
In
`@header/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkRouterInterfaceIpMsg.java`:
- Around line 28-33: The __example__ method in class
APIGetL3NetworkRouterInterfaceIpMsg incorrectly declares and constructs
APIGetL3NetworkMtuMsg (copy/paste error); update the method signature return
type and the instantiated object to APIGetL3NetworkRouterInterfaceIpMsg and keep
setting the L3 network UUID via setL3NetworkUuid(uuid()) so the example returns
the correct message type.
---
Nitpick comments:
In `@core/src/main/java/org/zstack/core/debug/APICleanQueueMsg.java`:
- Line 33: 字段名 managementiUuid 拼写错误,应改为 managementUuid;请在 APICleanQueueMsg
中重命名该字段并同时更新所有相关访问器和引用(例如 getter/setter、构造器、序列化/反序列化、JSON 映射、数据库字段映射及其他类中对
managementiUuid 的引用)以避免编译或运行时错误;若需兼容外部序列化格式,请保留向后兼容策略(如 `@JsonProperty`
或临时桥接方法)直至外部使用方迁移完毕。
In
`@header/src/main/java/org/zstack/header/network/l2/APICreateL2NetworkMsg.java`:
- Around line 47-49: Remove the extra blank line between the `@MetadataImpact`
annotation and the class declaration in APICreateL2NetworkMsg so the annotation
sits immediately above the class definition (i.e., ensure `@MetadataImpact`(...)
is directly followed by "public abstract class APICreateL2NetworkMsg ..."),
matching the formatting used in files like APIDeleteL2NetworkMsg.java.
In
`@header/src/main/java/org/zstack/header/network/service/APIDetachNetworkServiceProviderFromL2NetworkMsg.java`:
- Around line 7-11: 移除 `@MetadataImpact` 注解周围的多余空行:在
APIDetachNetworkServiceProviderFromL2NetworkMsg 类中,将导入语句 import
org.zstack.header.vm.MetadataImpact;、注解
`@MetadataImpact`(MetadataImpact.Impact.NONE) 和类声明 public class
APIDetachNetworkServiceProviderFromL2NetworkMsg extends APIMessage
放在连续行上,确保注解紧邻导入和类声明以保持与仓库中其他文件的格式一致。
In
`@header/src/main/java/org/zstack/header/storage/backup/APIScanBackupStorageMsg.java`:
- Around line 21-24: The __example__ method in APIScanBackupStorageMsg returns
an instance without setting example data; update
APIScanBackupStorageMsg.__example__ to populate the backupStorageUuid field with
a sample UUID (e.g., a valid UUID string) on the msg instance before returning
so the API example includes a concrete backupStorageUuid value.
In
`@header/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageUsageReportMsg.java`:
- Around line 8-9: The import order is inconsistent: move the new
org.zstack.header.vm.MetadataImpact import to the group with the other
org.zstack imports in APIGetPrimaryStorageUsageReportMsg so all org.zstack.*
imports (e.g., existing imports on lines 4-6) are together and java.util.List
remains in the standard java.* group; update the import section to follow the
project's grouping (java.*, javax.*, third-party, org.zstack.*) and keep
MetadataImpact adjacent to the other org.zstack imports.
In
`@header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.java`:
- Around line 10-18: The REST path currently on class
APIGetVmInstanceMetadataFromPrimaryStorageMsg lacks a path variable for the
target primary storage UUID; update the `@RestRequest` path to include a path
variable (e.g. "/primary-storage/{uuid}/vm-instances/metadata") and ensure the
existing `@APIParam` field uuid is used as that path variable (adjust any
annotations or parameter binding if your framework requires explicit
path-to-field binding). If the API is intended for batch queries, leave the path
and document the behavior; otherwise switch to the single-resource path variable
approach and update related clients/tests that call this endpoint.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f0d2ca25-1916-4b1b-a8bd-b02fe287e308
📒 Files selected for processing (242)
core/src/main/java/org/zstack/core/captcha/APIRefreshCaptchaMsg.javacore/src/main/java/org/zstack/core/config/APIGetGlobalConfigOptionsMsg.javacore/src/main/java/org/zstack/core/config/APIQueryGlobalConfigMsg.javacore/src/main/java/org/zstack/core/config/APIResetGlobalConfigMsg.javacore/src/main/java/org/zstack/core/config/APIUpdateGlobalConfigMsg.javacore/src/main/java/org/zstack/core/debug/APICleanQueueMsg.javacore/src/main/java/org/zstack/core/debug/APIDebugSignalMsg.javacore/src/main/java/org/zstack/core/debug/APIGetDebugSignalMsg.javacore/src/main/java/org/zstack/core/errorcode/APICheckElaborationContentMsg.javacore/src/main/java/org/zstack/core/errorcode/APIGetElaborationCategoriesMsg.javacore/src/main/java/org/zstack/core/errorcode/APIGetElaborationsMsg.javacore/src/main/java/org/zstack/core/errorcode/APIReloadElaborationMsg.javacore/src/main/java/org/zstack/core/eventlog/APIQueryEventLogMsg.javacore/src/main/java/org/zstack/core/gc/APIDeleteGCJobMsg.javacore/src/main/java/org/zstack/core/gc/APIQueryGCJobMsg.javacore/src/main/java/org/zstack/core/gc/APITriggerGCJobMsg.javaheader/src/main/java/org/zstack/header/APIIsOpensourceVersionMsg.javaheader/src/main/java/org/zstack/header/allocator/APIGetCpuMemoryCapacityMsg.javaheader/src/main/java/org/zstack/header/allocator/APIGetHostAllocatorStrategiesMsg.javaheader/src/main/java/org/zstack/header/apimediator/APIIsReadyToGoMsg.javaheader/src/main/java/org/zstack/header/cluster/APIChangeClusterStateMsg.javaheader/src/main/java/org/zstack/header/cluster/APICreateClusterMsg.javaheader/src/main/java/org/zstack/header/cluster/APIDeleteClusterMsg.javaheader/src/main/java/org/zstack/header/cluster/APIQueryClusterMsg.javaheader/src/main/java/org/zstack/header/cluster/APIUpdateClusterMsg.javaheader/src/main/java/org/zstack/header/cluster/APIUpdateClusterOSMsg.javaheader/src/main/java/org/zstack/header/configuration/APIChangeDiskOfferingStateMsg.javaheader/src/main/java/org/zstack/header/configuration/APIChangeInstanceOfferingStateMsg.javaheader/src/main/java/org/zstack/header/configuration/APICreateDiskOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APICreateInstanceOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIDeleteDiskOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIDeleteInstanceOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateApiJsonTemplateMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateApiTypeScriptDefinitionMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateGroovyClassMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateSqlForeignKeyMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateSqlIndexMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateSqlVOViewMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGenerateTestLinkDocumentMsg.javaheader/src/main/java/org/zstack/header/configuration/APIGetGlobalPropertyMsg.javaheader/src/main/java/org/zstack/header/configuration/APIQueryDiskOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIQueryInstanceOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIUpdateDiskOfferingMsg.javaheader/src/main/java/org/zstack/header/configuration/APIUpdateInstanceOfferingMsg.javaheader/src/main/java/org/zstack/header/console/APIQueryConsoleProxyAgentMsg.javaheader/src/main/java/org/zstack/header/console/APIReconnectConsoleProxyAgentMsg.javaheader/src/main/java/org/zstack/header/console/APIRequestConsoleAccessMsg.javaheader/src/main/java/org/zstack/header/console/APIUpdateConsoleProxyAgentMsg.javaheader/src/main/java/org/zstack/header/core/APIGetChainTaskMsg.javaheader/src/main/java/org/zstack/header/core/encrypt/APIGetEncryptedFieldMsg.javaheader/src/main/java/org/zstack/header/core/encrypt/APIUpdateEncryptKeyMsg.javaheader/src/main/java/org/zstack/header/core/external/service/APIGetExternalServicesMsg.javaheader/src/main/java/org/zstack/header/core/external/service/APIReloadExternalServiceMsg.javaheader/src/main/java/org/zstack/header/core/progress/APIGetTaskProgressMsg.javaheader/src/main/java/org/zstack/header/core/webhooks/APICreateWebhookMsg.javaheader/src/main/java/org/zstack/header/core/webhooks/APIDeleteWebhookMsg.javaheader/src/main/java/org/zstack/header/core/webhooks/APIQueryWebhookMsg.javaheader/src/main/java/org/zstack/header/core/webhooks/APIUpdateWebhookMsg.javaheader/src/main/java/org/zstack/header/host/APIAddHostMsg.javaheader/src/main/java/org/zstack/header/host/APIChangeHostStateMsg.javaheader/src/main/java/org/zstack/header/host/APIDeleteHostMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHostBlockDevicesMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHostPowerStatusMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHostSensorsMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHostTaskMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHostWebSshUrlMsg.javaheader/src/main/java/org/zstack/header/host/APIGetHypervisorTypesMsg.javaheader/src/main/java/org/zstack/header/host/APIGetPhysicalMachineBlockDevicesMsg.javaheader/src/main/java/org/zstack/header/host/APIMountBlockDeviceMsg.javaheader/src/main/java/org/zstack/header/host/APIPowerOnHostMsg.javaheader/src/main/java/org/zstack/header/host/APIPowerResetHostMsg.javaheader/src/main/java/org/zstack/header/host/APIQueryHostMsg.javaheader/src/main/java/org/zstack/header/host/APIReconnectHostMsg.javaheader/src/main/java/org/zstack/header/host/APIShutdownHostMsg.javaheader/src/main/java/org/zstack/header/host/APIUpdateHostIpmiMsg.javaheader/src/main/java/org/zstack/header/host/APIUpdateHostMsg.javaheader/src/main/java/org/zstack/header/host/APIUpdateHostNqnMsg.javaheader/src/main/java/org/zstack/header/host/APIUpdateHostnameMsg.javaheader/src/main/java/org/zstack/header/identity/APIChangeResourceOwnerMsg.javaheader/src/main/java/org/zstack/header/identity/APICreateAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APIDeleteAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APIGetAccountQuotaUsageMsg.javaheader/src/main/java/org/zstack/header/identity/APIGetResourceAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APILogInByAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APILogOutMsg.javaheader/src/main/java/org/zstack/header/identity/APIQueryAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APIQueryAccountResourceRefMsg.javaheader/src/main/java/org/zstack/header/identity/APIQueryQuotaMsg.javaheader/src/main/java/org/zstack/header/identity/APIRenewSessionMsg.javaheader/src/main/java/org/zstack/header/identity/APIRevokeResourceSharingMsg.javaheader/src/main/java/org/zstack/header/identity/APIShareResourceMsg.javaheader/src/main/java/org/zstack/header/identity/APIUpdateAccountMsg.javaheader/src/main/java/org/zstack/header/identity/APIUpdateQuotaMsg.javaheader/src/main/java/org/zstack/header/identity/APIValidateSessionMsg.javaheader/src/main/java/org/zstack/header/identity/login/APIGetLoginProceduresMsg.javaheader/src/main/java/org/zstack/header/identity/login/APILogInMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIAttachRoleToAccountMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APICreateRoleMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIDeleteRoleMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIDetachRoleFromAccountMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIGetRolePolicyActionsMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIQueryRoleAccountRefMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIQueryRoleMsg.javaheader/src/main/java/org/zstack/header/identity/role/api/APIUpdateRoleMsg.javaheader/src/main/java/org/zstack/header/image/APIAddImageMsg.javaheader/src/main/java/org/zstack/header/image/APICalculateImageHashMsg.javaheader/src/main/java/org/zstack/header/image/APIChangeImageStateMsg.javaheader/src/main/java/org/zstack/header/image/APICreateDataVolumeTemplateFromVolumeMsg.javaheader/src/main/java/org/zstack/header/image/APICreateDataVolumeTemplateFromVolumeSnapshotMsg.javaheader/src/main/java/org/zstack/header/image/APICreateRootVolumeTemplateFromRootVolumeMsg.javaheader/src/main/java/org/zstack/header/image/APICreateRootVolumeTemplateFromVolumeSnapshotMsg.javaheader/src/main/java/org/zstack/header/image/APIDeleteImageMsg.javaheader/src/main/java/org/zstack/header/image/APIExpungeImageMsg.javaheader/src/main/java/org/zstack/header/image/APIGetCandidateBackupStorageForCreatingImageMsg.javaheader/src/main/java/org/zstack/header/image/APIGetCandidateImagesForCreatingVmMsg.javaheader/src/main/java/org/zstack/header/image/APIGetUploadImageJobDetailsMsg.javaheader/src/main/java/org/zstack/header/image/APIQueryImageMsg.javaheader/src/main/java/org/zstack/header/image/APIRecoverImageMsg.javaheader/src/main/java/org/zstack/header/image/APISetImageBootModeMsg.javaheader/src/main/java/org/zstack/header/image/APISyncImageSizeMsg.javaheader/src/main/java/org/zstack/header/image/APIUpdateImageMsg.javaheader/src/main/java/org/zstack/header/longjob/APICancelLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APICleanLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APIDeleteLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APIQueryLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APIRerunLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APIResumeLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APISubmitLongJobMsg.javaheader/src/main/java/org/zstack/header/longjob/APIUpdateLongJobMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetCurrentTimeMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetManagementNodeArchMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetManagementNodeOSMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetPlatformTimeZoneMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetSupportAPIsMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIGetVersionMsg.javaheader/src/main/java/org/zstack/header/managementnode/APIQueryManagementNodeMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIAttachL2NetworkToClusterMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIAttachL2NetworkToHostMsg.javaheader/src/main/java/org/zstack/header/network/l2/APICreateL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APICreateL2NoVlanNetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APICreateL2VlanNetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIDeleteL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIDetachL2NetworkFromClusterMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIDetachL2NetworkFromHostMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIGetCandidateClustersForAttachingL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIGetCandidateL2NetworksForAttachingClusterMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIGetL2NetworkTypesMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIGetVSwitchTypesMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIQueryL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIQueryL2VlanNetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIUpdateL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l2/APIUpdateL2NetworkVirtualNetworkIdMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddDnsToL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddHostRouteToL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddIpRangeByNetworkCidrMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddIpv6RangeByNetworkCidrMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddIpv6RangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIAddReservedIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIChangeL3NetworkStateMsg.javaheader/src/main/java/org/zstack/header/network/l3/APICheckIpAvailabilityMsg.javaheader/src/main/java/org/zstack/header/network/l3/APICreateL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIDeleteIpAddressMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIDeleteIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIDeleteL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIDeleteReservedIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIGetFreeIpMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIGetIpAddressCapacityMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkMtuMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkRouterInterfaceIpMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIGetL3NetworkTypesMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIQueryAddressPoolMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIQueryIpAddressMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIQueryIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIQueryL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIRemoveDnsFromL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIRemoveHostRouteFromL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/l3/APISetL3NetworkMtuMsg.javaheader/src/main/java/org/zstack/header/network/l3/APISetL3NetworkRouterInterfaceIpMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIUpdateIpRangeMsg.javaheader/src/main/java/org/zstack/header/network/l3/APIUpdateL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/service/APIAddNetworkServiceProviderMsg.javaheader/src/main/java/org/zstack/header/network/service/APIAttachNetworkServiceProviderToL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/service/APIAttachNetworkServiceToL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/service/APIDetachNetworkServiceFromL3NetworkMsg.javaheader/src/main/java/org/zstack/header/network/service/APIDetachNetworkServiceProviderFromL2NetworkMsg.javaheader/src/main/java/org/zstack/header/network/service/APIGetNetworkServiceTypesMsg.javaheader/src/main/java/org/zstack/header/network/service/APIQueryNetworkServiceL3NetworkRefMsg.javaheader/src/main/java/org/zstack/header/network/service/APIQueryNetworkServiceProviderMsg.javaheader/src/main/java/org/zstack/header/query/APIGenerateInventoryQueryDetailsMsg.javaheader/src/main/java/org/zstack/header/query/APIGenerateQueryableFieldsMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APICreateResourceAttributeKeyMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APICreateResourceAttributeValueMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APIDeleteResourceAttributeKeyMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APIDeleteResourceAttributeValueMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APIQueryResourceAttributeKeyMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APIQueryResourceAttributeValueMsg.javaheader/src/main/java/org/zstack/header/resourceattribute/api/APIUpdateResourceAttributeKeyMsg.javaheader/src/main/java/org/zstack/header/search/APICreateSearchIndexMsg.javaheader/src/main/java/org/zstack/header/search/APIDeleteSearchIndexMsg.javaheader/src/main/java/org/zstack/header/storage/addon/backup/APIAddExternalBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/addon/primary/APIAddExternalPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/addon/primary/APIDiscoverExternalPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/addon/primary/APIUpdateExternalPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIAddBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIAttachBackupStorageToZoneMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIChangeBackupStorageStateMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APICleanUpTrashOnBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIDeleteBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIDeleteExportedImageFromBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIDetachBackupStorageFromZoneMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIGetBackupStorageCapacityMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIGetBackupStorageTypesMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIGetTrashOnBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIQueryBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIReconnectBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIScanBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/backup/APIUpdateBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIAddPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIAddStorageProtocolMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIAttachPrimaryStorageToClusterMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIChangePrimaryStorageStateMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APICleanUpImageCacheOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APICleanUpStorageTrashOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APICleanUpTrashOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIDeletePrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIDetachPrimaryStorageFromClusterMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageAllocatorStrategiesMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageCapacityMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageLicenseInfoMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageTypesMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetPrimaryStorageUsageReportMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetTrashOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIQueryImageCacheMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIQueryPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIReconnectPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APISyncPrimaryStorageCapacityMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIUpdatePrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/snapshot/APIBackupVolumeSnapshotMsg.javaheader/src/main/java/org/zstack/header/storage/snapshot/APIBatchDeleteVolumeSnapshotMsg.java
🚧 Files skipped from review as they are similar to previous changes (1)
- header/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsg.java
| ) | ||
| @TagResourceType(ImageVO.class) | ||
| @DefaultTimeout(timeunit = TimeUnit.HOURS, value = 72) | ||
| @MetadataImpact(MetadataImpact.Impact.NONE) |
There was a problem hiding this comment.
MetadataImpact 的分类看起来设错了。
APICreateRootVolumeTemplateFromRootVolumeMsg 会基于根云盘创建模板,并且涉及备份存储写入;按这次 PR 的分类口径,这类 template/storage 相关 API 不应标成 NONE。这里如果继续使用 NONE,后续依赖该注解触发的元数据脏标记或刷新链路可能会被跳过。
建议修改
-@MetadataImpact(MetadataImpact.Impact.NONE)
+@MetadataImpact(MetadataImpact.Impact.STORAGE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @MetadataImpact(MetadataImpact.Impact.NONE) | |
| `@MetadataImpact`(MetadataImpact.Impact.STORAGE) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@header/src/main/java/org/zstack/header/image/APICreateRootVolumeTemplateFromRootVolumeMsg.java`
at line 24, The MetadataImpact on APICreateRootVolumeTemplateFromRootVolumeMsg
is incorrectly set to NONE; update the annotation on class
APICreateRootVolumeTemplateFromRootVolumeMsg to the correct impact (e.g.,
MetadataImpact.Impact.WRITE or the enum constant used for
storage/template-affecting operations) so metadata dirty/refresh flows run;
mirror the value used by other template/storage-related messages to ensure
consistency.
| * @since 0.1.0 | ||
| */ | ||
|
|
||
| @MetadataImpact(MetadataImpact.Impact.NONE) |
There was a problem hiding this comment.
MetadataImpact 分类看起来标错了。
这个 API 是卷快照相关的存储操作,而且 PR 目标里已经把 snapshot CRUD 归到了 STORAGE。这里标成 Impact.NONE 会让后续基于该注解的元数据刷新/拦截逻辑误判为“无影响”。建议改成 @MetadataImpact(MetadataImpact.Impact.STORAGE),并与其他 snapshot 类 API 保持一致。
建议修改
-@MetadataImpact(MetadataImpact.Impact.NONE)
+@MetadataImpact(MetadataImpact.Impact.STORAGE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@header/src/main/java/org/zstack/header/storage/snapshot/APIBackupVolumeSnapshotMsg.java`
at line 36, 类注解 MetadataImpact 标记错误:在 APIBackupVolumeSnapshotMsg 上将
`@MetadataImpact`(MetadataImpact.Impact.NONE) 改为
`@MetadataImpact`(MetadataImpact.Impact.STORAGE);查找 APIBackupVolumeSnapshotMsg
类并替换注解值以与其他 snapshot 相关的 API 保持一致,确保元数据刷新/拦截逻辑能正确识别为 STORAGE 影响。
| responseClass = APIBatchDeleteVolumeSnapshotEvent.class | ||
| ) | ||
| @DefaultTimeout(timeunit = TimeUnit.HOURS, value = 6) | ||
| @MetadataImpact(MetadataImpact.Impact.NONE) |
There was a problem hiding this comment.
MetadataImpact 分类疑似标错了。
根据本 PR 的提交说明,snapshot CRUD 被归类为 STORAGE。这里把批量删除卷快照标成 Impact.NONE,会让这条删除路径跳过存储元数据变更标记,导致快照删除后 VM 元数据可能不刷新。
建议修改
-@MetadataImpact(MetadataImpact.Impact.NONE)
+@MetadataImpact(MetadataImpact.Impact.STORAGE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @MetadataImpact(MetadataImpact.Impact.NONE) | |
| `@MetadataImpact`(MetadataImpact.Impact.STORAGE) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@header/src/main/java/org/zstack/header/storage/snapshot/APIBatchDeleteVolumeSnapshotMsg.java`
at line 29, 当前在 APIBatchDeleteVolumeSnapshotMsg 上的注解
`@MetadataImpact`(MetadataImpact.Impact.NONE) 分类错误,应改为反映会触发存储元数据变更的 STORAGE;请修改
APIBatchDeleteVolumeSnapshotMsg 的 MetadataImpact 注解为
MetadataImpact.Impact.STORAGE(定位符:APIBatchDeleteVolumeSnapshotMsg、MetadataImpact
注解),保存并运行相关单元/集成测试以确保删除快照流程会触发存储元数据更新。
Resolves: ZSV-10000 Change-Id: I746a6d7266686d637a6d76756861636a74766b74
1cd28ae to
8345c56
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
header/src/main/java/org/zstack/header/vm/APIAttachVmNicToVmMsg.java (1)
8-13:⚠️ Potential issue | 🟡 MinorREST 路径存在拼写错误:"nices" 应为 "nics"
该 API 定义中的路径使用了 "nices",但项目中所有其他 VM NIC 相关 API 都一致使用 "nics" 格式(如
/vm-instances/nics/{vmNicUuid}、/security-groups/{securityGroupUuid}/vm-instances/nics等)。此拼写错误已传播至 SDK 自动生成代码和 Groovy 文档。应统一使用 "nics" 以符合项目 API 命名规范。建议修复
`@RestRequest`( - path = "/vm-instances/{vmInstanceUuid}/nices/{vmNicUuid}", + path = "/vm-instances/{vmInstanceUuid}/nics/{vmNicUuid}", method = HttpMethod.POST, responseClass = APIAttachVmNicToVmEvent.class, parameterName = "params" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/APIAttachVmNicToVmMsg.java` around lines 8 - 13, The REST path in the RestRequest annotation on APIAttachVmNicToVmMsg uses the misspelled segment "nices" instead of "nics"; update the path value in the `@RestRequest` on class APIAttachVmNicToVmMsg to use "/vm-instances/{vmInstanceUuid}/nics/{vmNicUuid}" so it matches other VM NIC APIs and regenerates SDK/docs consistently, then run any codegen/tests that validate REST paths.
♻️ Duplicate comments (10)
header/src/main/java/org/zstack/header/vm/VmInstanceState.java (1)
33-34:⚠️ Potential issue | 🔴 Critical给
Registering补齐状态转换规则。现在只把
Registering加进了枚举和offlineStates,但没有对应的.transactions(...)。一旦有流程在该状态下调用nextState(),这里会直接抛CloudRuntimeException。🔧 建议修复
Crashed.transactions( new Transaction(VmInstanceStateEvent.running, VmInstanceState.Running), new Transaction(VmInstanceStateEvent.unknown, VmInstanceState.Unknown), new Transaction(VmInstanceStateEvent.stopping, VmInstanceState.Stopping), new Transaction(VmInstanceStateEvent.stopped, VmInstanceState.Stopped), new Transaction(VmInstanceStateEvent.rebooting, VmInstanceState.Rebooting), new Transaction(VmInstanceStateEvent.crashed, VmInstanceState.Crashed), new Transaction(VmInstanceStateEvent.destroyed, VmInstanceState.Destroyed), new Transaction(VmInstanceStateEvent.noState, VmInstanceState.NoState), new Transaction(VmInstanceStateEvent.destroying, VmInstanceState.Destroying) ); + Registering.transactions( + new Transaction(VmInstanceStateEvent.stopped, VmInstanceState.Stopped), + new Transaction(VmInstanceStateEvent.running, VmInstanceState.Running), + new Transaction(VmInstanceStateEvent.destroying, VmInstanceState.Destroying), + new Transaction(VmInstanceStateEvent.unknown, VmInstanceState.Unknown) + ); NoState.transactions( new Transaction(VmInstanceStateEvent.running, VmInstanceState.Running),Also applies to: 56-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceState.java` around lines 33 - 34, The enum VmInstanceState defines a Registering value but lacks a .transactions(...) entry, so calls to nextState() from Registering throw CloudRuntimeException; add a transactions(...) block for Registering in VmInstanceState (matching the pattern used for other states) that lists the allowed VmInstanceStateEvent -> target state transitions (and mirror any offlineStates inclusion) so nextState() can resolve valid transitions from Registering; update any related offlineStates usage if necessary to keep consistency with the new transactions entry.storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java (1)
1866-2130:⚠️ Potential issue | 🔴 CriticalVM 注册流程仍然缺少原子性。
register-volume、register-snapshot、register-vmInstance这几步在多个NoRollbackFlow里连续落多张表;任何一步失败,前面已写入的数据都不会回滚。重试时很容易留下孤儿 snapshot/tree/group/volume 记录,或者直接撞上重复主键。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java` around lines 1866 - 2130, The registerVmInstance flow writes to multiple tables across the NoRollbackFlow steps ("register-volume", "register-snapshot", "register-vmInstance") so failures leave partial data and orphaned records; fix by making the DB writes atomic: either wrap all persistence into a single transactional/rollbackable flow or add compensating rollback logic for each step. Concretely, modify registerVmInstance to (a) replace the NoRollbackFlow steps that call dbf.persist/dbf.persistCollection with a single Flow (or transactional block) that uses dbf.startTransaction/commit/rollback (or a RollbackFlow implementation) to persist volumes, snapshot trees, snapshots, groups, refs and the VM together, or (b) implement rollback handlers for each NoRollbackFlow that delete the entities created via dbf on failure; ensure the FlowChain error handler triggers the transaction rollback or executes the compensating deletes so no orphaned VolumeVO/VolumeSnapshotVO/VolumeSnapshotTreeVO/VolumeSnapshotGroupVO/VolumeSnapshotGroupRefVO or VmInstanceVO remain.compute/src/main/java/org/zstack/compute/vm/metadata/SnapshotGroupBasedVmUuidFromApiResolver.java (1)
11-29:⚠️ Potential issue | 🟡 Minor类注释与实际支持的消息类型不一致。
Javadoc 中列出的
APICreateVolumeSnapshotGroupMsg实际上实现的是CreateVolumeSnapshotGroupMessage接口,而非VolumeSnapshotGroupMessage。因此supports()方法不会匹配创建消息,而 Javadoc 描述具有误导性。建议移除对 Create 场景的声明,仅保留实际支持的消息类型(Delete、Revert、Update、Ungroup 等操作已有快照组的场景)。
📝 建议修正文档
/** * 快照组关联 VM UUID 解析器:从实现 {`@link` VolumeSnapshotGroupMessage} 接口的 API 消息中获取 groupUuid, * 通过 VolumeSnapshotGroupRefVO 查询关联的所有 volumeUuid,再查询 VolumeVO 得到 vmInstanceUuid。 * * <p>覆盖的 API:</p> * <ul> * <li>{`@code` APIDeleteVolumeSnapshotGroupMsg}(STORAGE, updateOnFailure=true)</li> - * <li>{`@code` APICreateVolumeSnapshotGroupMsg}(STORAGE, updateOnFailure=true)</li> + * <li>{`@code` APIRevertVmFromSnapshotGroupMsg}(STORAGE)</li> + * <li>{`@code` APIUpdateVolumeSnapshotGroupMsg}(CONFIG)</li> + * <li>其他实现 VolumeSnapshotGroupMessage 的 API</li> * </ul>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/SnapshotGroupBasedVmUuidFromApiResolver.java` around lines 11 - 29, Update the class Javadoc of SnapshotGroupBasedVmUuidFromApiResolver to reflect the actual supported API message types used by its supports() method: remove APICreateVolumeSnapshotGroupMsg (which implements CreateVolumeSnapshotGroupMessage) from the listed scenarios and explicitly list only the messages that implement VolumeSnapshotGroupMessage (e.g., APIDeleteVolumeSnapshotGroupMsg and other delete/revert/update/ungroup messages that the supports() method matches), so the documentation is consistent with the supports() logic in the class.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java (1)
1943-1946:⚠️ Potential issue | 🔴 Critical不要在异步写入完成前推进
SyncTaskChain。Line 1945 这里只把消息派发纳入了 PS 级并发控制,真正的
backend.handle(...)仍会在chain.next()之后并发执行;如果doHandleUpdateMetadata()同步抛异常,队列还可能被卡住。请把chain.next()挪到 success/fail/同步异常的统一收口里。🛠️ 建议修改
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateMetadata(msg); - chain.next(); + try { + doHandleUpdateMetadata(msg, new NoErrorCompletion(msg) { + `@Override` + public void done() { + chain.next(); + } + }); + } catch (Exception e) { + chain.next(); + throw e; + } }private void doHandleUpdateMetadata(UpdateVmInstanceMetadataOnPrimaryStorageMsg msg, NoErrorCompletion completion) { ... backend.handle(msg, hostUuid, new ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply>(msg) { `@Override` public void success(UpdateVmInstanceMetadataOnPrimaryStorageReply r) { bus.reply(msg, r); completion.done(); } `@Override` public void fail(ErrorCode errorCode) { reply.setError(errorCode); bus.reply(msg, reply); completion.done(); } }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1943 - 1946, The current SyncTaskChain task calls doHandleUpdateMetadata(msg) and immediately calls chain.next(), which allows backend.handle(...) to run concurrently and can leave the chain stuck if doHandleUpdateMetadata throws synchronously; modify the task to not call chain.next() immediately but instead wrap doHandleUpdateMetadata to accept a NoErrorCompletion (or similar) and call chain.next() only from the unified completion/failure path (i.e., after backend.handle(...) invokes its ReturnValueCompletion.success or .fail, and also in the catch block for any synchronous exception), so move the chain.next() into that single success/fail/exception handler and wire doHandleUpdateMetadata to reply on success/fail before calling completion.done()/chain.next().plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java (1)
3349-3352:⚠️ Potential issue | 🔴 Critical不要在异步写入完成前释放并发槽位。
Line 3351 现在只限制了消息派发速度,
bkd.handle(...)的实际元数据写入会在chain.next()之后继续并发执行;如果doHandleUpdateMetadata()同步抛异常,这条链还会被卡住。请把chain.next()放到 success/fail/同步异常的统一收口里。🛠️ 建议修改
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateMetadata(msg); - chain.next(); + try { + doHandleUpdateMetadata(msg, new NoErrorCompletion(msg) { + `@Override` + public void done() { + chain.next(); + } + }); + } catch (Exception e) { + chain.next(); + throw e; + } }private void doHandleUpdateMetadata(UpdateVmInstanceMetadataOnPrimaryStorageMsg msg, NoErrorCompletion completion) { ... bkd.handle(msg, hostUuid, new ReturnValueCompletion<UpdateVmInstanceMetadataOnPrimaryStorageReply>(msg) { `@Override` public void success(UpdateVmInstanceMetadataOnPrimaryStorageReply returnValue) { bus.reply(msg, returnValue); completion.done(); } `@Override` public void fail(ErrorCode errorCode) { UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); reply.setError(errorCode); bus.reply(msg, reply); completion.done(); } }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java` around lines 3349 - 3352, 当前 run(SyncTaskChain) 中直接在 doHandleUpdateMetadata(...) 之后立即调用 chain.next() 导致并发槽位在异步写入未完成前被释放,并且若 doHandleUpdateMetadata 同步抛异常会卡住链条;请把 chain.next() 移到 doHandleUpdateMetadata 内部的异步回调(bkd.handle(...) 的 success 与 fail)以及同步异常的统一收口里(例如在 finally 或使用 NoErrorCompletion/ReturnValueCompletion 的 done/complete 路径中调用 chain.next()),确保无论成功、失败或抛异常都只在真正完成处理后再释放并发槽位;参考方法名:run(SyncTaskChain), doHandleUpdateMetadata(...), bkd.handle(...).compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)
9413-9416:⚠️ Potential issue | 🔴 Critical不要在主存储回调前提前
chain.next()。Line 9416 现在会在
bus.send()尚未完成时释放串行链,同一 VM 的后续 flush 仍可并发进入,旧 payload 有机会晚于新 payload 落盘。🐛 建议修改
`@Override` public void run(SyncTaskChain chain) { - doHandleUpdateVmInstanceMetadata(msg); - chain.next(); + doHandleUpdateVmInstanceMetadata(msg, chain); } @@ - private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg) { + private void doHandleUpdateVmInstanceMetadata(UpdateVmInstanceMetadataMsg msg, SyncTaskChain chain) { @@ if (payloadSize > org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD) { @@ reply.setError(Platform.operr("metadata payload too large (%d bytes, limit %d) for vm[uuid=%s]", payloadSize, org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD, msg.getUuid())); bus.reply(msg, reply); + chain.next(); return; } @@ bus.send(umsg, new CloudBusCallBack(msg) { `@Override` public void run(MessageReply r) { UpdateVmInstanceMetadataOnPrimaryStorageReply reply = new UpdateVmInstanceMetadataOnPrimaryStorageReply(); @@ } bus.reply(msg, reply); + chain.next(); } }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 9413 - 9416, The run(SyncTaskChain) handler currently calls chain.next() immediately after doHandleUpdateVmInstanceMetadata(msg), releasing the serial chain before the async main-storage/bus.send() work completes; remove the premature chain.next() and instead invoke chain.next() only after the asynchronous operation completes (e.g. in the callback/response handler inside doHandleUpdateVmInstanceMetadata or the bus.send completion/main-storage callback) so subsequent flushes for the same VM cannot run concurrently and old payloads cannot overtake newer ones.header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.java (1)
8-11:⚠️ Potential issue | 🟠 Major
warnings字段现在不会出现在 REST 返回里。Line 8 把响应限定到了
inventory,所以 Line 11 的warnings会被 REST 序列化和示例文档一起漏掉;当前__example__()也没有把它填出来。可直接套用的修正
-@RestResponse(allTo = "inventory") +@RestResponse(fieldsTo = {"all"}) public class APIRegisterVmInstanceFromMetadataEvent extends APIEvent { @@ public static APIRegisterVmInstanceFromMetadataEvent __example__() { APIRegisterVmInstanceFromMetadataEvent evt = new APIRegisterVmInstanceFromMetadataEvent(); VmInstanceInventory vm = new VmInstanceInventory(); vm.setUuid(uuid()); vm.setName("recovered-vm"); evt.setInventory(vm); + evt.setWarnings(java.util.Collections.emptyList()); return evt; } }Also applies to: 37-43
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.java` around lines 8 - 11, The `@RestResponse`(allTo = "inventory") on class APIRegisterVmInstanceFromMetadataEvent excludes the warnings field from REST serialization and the generated example; update the annotation to include warnings (e.g., add "warnings" to allTo or remove the allTo restriction) so warnings are returned, and also populate the warnings list inside the class's __example__() method; apply the same fix for the duplicate class variant referenced around lines 37-43 to ensure both the REST response and example include the warnings field.compute/src/main/java/org/zstack/compute/vm/metadata/MetadataOrphanDetector.java (1)
73-77:⚠️ Potential issue | 🔴 Critical
List.of()需要 Java 9+,与项目 Java 8 目标不兼容。此问题已在之前的审查中标记。项目的
pom.xml配置maven.compiler.source和maven.compiler.target为 8,但List.of(...)仅在 Java 9+ 可用,会导致编译失败。🔧 兼容 Java 8 的修复
- private static final List<String> SUPPORTED_PS_TYPES = List.of( - "SharedBlock", - "LocalStorage", - "NFS" - ); + private static final List<String> SUPPORTED_PS_TYPES = Collections.unmodifiableList( + Arrays.asList("SharedBlock", "LocalStorage", "NFS") + );需要添加导入:
import java.util.Arrays; import java.util.Collections;,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataOrphanDetector.java` around lines 73 - 77, The constant SUPPORTED_PS_TYPES in MetadataOrphanDetector uses List.of(...) which requires Java 9+; replace it with a Java 8-compatible construction such as Collections.unmodifiableList(Arrays.asList("SharedBlock", "LocalStorage", "NFS")) (or Arrays.asList(...) if mutability is acceptable) and add the required imports (java.util.Arrays and java.util.Collections) so the code compiles under Java 8.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java (2)
490-496:⚠️ Potential issue | 🔴 Critical这批查询仍然会捞到别的路径刚认领的行。
lastClaimTime >= now-5s还是无法区分“本轮claimDirtyRows()刚 claim 的行”与triggerFlushForVm()/ 上一轮 poller 近 5 秒内 claim 的行。后面的submitFlushTask()会把这些外来 claim 当成本轮结果,重复 flush 和提前 release claim 的竞态还会继续出现。这里还是需要 batch token,而不是时间窗近似。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 490 - 496, The current filter in VmMetadataDirtyMarker that uses lastClaimTime (Timestamp.thisCycleCutoff) still picks up rows claimed by other paths; change the claim semantics to use a per-claim batch token instead of a time window: when claimDirtyRows() performs the CAS/claim step, generate and persist a unique claimBatchToken (or claimId) into the VmMetadataDirtyVO rows it claims, then in the subsequent query replace the time-based filter (.gte(VmMetadataDirtyVO_.lastClaimTime, thisCycleCutoff)) with an equality on that token (e.g. .eq(VmMetadataDirtyVO_.claimBatchToken, myBatchToken)); ensure triggerFlushForVm(), submitFlushTask() and any code that releases or touches claims use the same token field when matching/releasing so only rows from this claim batch are processed.
641-645:⚠️ Potential issue | 🔴 Critical目标 VM 字段还是填错了。
UpdateVmInstanceMetadataMsg这里需要的是vmInstanceUuid,不是uuid。保持现在的写法会让处理端拿不到目标 VM。💡 建议修正
UpdateVmInstanceMetadataMsg msg = new UpdateVmInstanceMetadataMsg(); - msg.setUuid(vmUuid); + msg.setVmInstanceUuid(vmUuid); msg.setStorageStructureChange(latestDirty.isStorageStructureChange()); msg.setTimeout(TimeUnit.MINUTES.toMillis(5));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 641 - 645, The code is setting the wrong field on UpdateVmInstanceMetadataMsg: replace the call to msg.setUuid(vmUuid) with the correct setter for the target VM (e.g. msg.setVmInstanceUuid(vmUuid)) so the consumer can locate the VM; keep the rest (setStorageStructureChange, setTimeout, bus.makeLocalServiceId(..., VmInstanceConstant.SERVICE_ID)) unchanged.
🟠 Major comments (17)
header/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusMsg.java-22-37 (1)
22-37:⚠️ Potential issue | 🟠 Major不要把调用方的
vmUuids列表按引用塞进消息体。
NeedReplyMessage会沿异步链路传递;这里保留原始List引用,发送方后续如果复用、清空或追加这个列表,请求内容会跟着变化。建议在 setter 中复制并归一化为空列表,避免 TOCTOU 式问题。建议修改
+import java.util.ArrayList; +import java.util.Collections; import java.util.List; @@ - private List<String> vmUuids; + private List<String> vmUuids = Collections.emptyList(); @@ public List<String> getVmUuids() { return vmUuids; } @@ public void setVmUuids(List<String> vmUuids) { - this.vmUuids = vmUuids; + this.vmUuids = vmUuids == null + ? Collections.emptyList() + : Collections.unmodifiableList(new ArrayList<>(vmUuids)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusMsg.java` around lines 22 - 37, BatchCheckMetadataStatusMsg currently stores the caller's vmUuids list by reference which can lead to TOCTOU mutations when the message is sent asynchronously; modify setVmUuids to defensively copy and normalize null to an empty list (e.g., this.vmUuids = (vmUuids == null ? Collections.emptyList() : new ArrayList<>(vmUuids))) and consider making getVmUuids return an unmodifiable view or a copy to prevent external mutation (refer to class BatchCheckMetadataStatusMsg, field vmUuids, methods setVmUuids and getVmUuids).header/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusReply.java-17-24 (1)
17-24:⚠️ Potential issue | 🟠 Major避免把可变
Map直接暴露到 reply 边界。这里直接保存并返回调用方传入的
results。如果构造 reply 的代码在发送后继续复用或修改同一个Map,reply 内容会被反向污染;而null也会把下游消费方推向额外的判空分支。建议在 setter 中做 defensive copy,并把null归一化为空映射。建议修改
+import java.util.Collections; +import java.util.LinkedHashMap; import java.util.Map; @@ - private Map<String, MetadataStatusResult> results; + private Map<String, MetadataStatusResult> results = Collections.emptyMap(); @@ public Map<String, MetadataStatusResult> getResults() { return results; } @@ public void setResults(Map<String, MetadataStatusResult> results) { - this.results = results; + this.results = results == null + ? Collections.emptyMap() + : Collections.unmodifiableMap(new LinkedHashMap<>(results)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusReply.java` around lines 17 - 24, 在 BatchCheckMetadataStatusReply 中不要直接暴露可变 Map 字段 results:在 setResults(Map<String, MetadataStatusResult> results) 中对传入的 map 做防御性拷贝并将 null 归一化为 Collections.emptyMap()(或 new HashMap<>()),同时在 getResults() 返回前也返回一个不可变视图或防御性拷贝以避免外部修改(参考类名 BatchCheckMetadataStatusReply 和方法名 getResults/setResults),确保 reply 对外为不可变或独立副本。testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy-684-691 (1)
684-691:⚠️ Potential issue | 🟠 Major让 READ/WRITE 模拟器保持有状态。
现在 WRITE 直接返回成功,READ 永远返回固定 payload,这样写入路径、路径路由和读后校验都没有被真正覆盖;即使生产代码把 metadata 写错了,测试也还是会通过。建议至少按请求中的 metadata 路径或 VM 标识缓存写入内容,并在 READ 时返回同一份数据。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy` around lines 684 - 691, The READ/WRITE KVM metadata simulators are stateless: spec.simulator handlers for KVMConstant.WRITE_VM_INSTANCE_METADATA_PATH and KVMConstant.READ_VM_INSTANCE_METADATA_PATH always return fixed responses (WriteVmInstanceMetadataRsp / ReadVmInstanceMetadataRsp) so tests cannot detect incorrect writes; change the spec.simulator for WRITE_VM_INSTANCE_METADATA_PATH to store the incoming metadata keyed by VM id or metadata path (extract from the HttpEntity payload or request headers) into an in-memory map and have the READ_VM_INSTANCE_METADATA_PATH handler look up and return the stored value (falling back to the existing fixed payload only if no write exists), ensuring the simulator functions (the two spec.simulator closures and the ReadVmInstanceMetadataRsp/WriteVmInstanceMetadataRsp flow) share that map and keying logic.header/src/main/java/org/zstack/header/vm/APICleanupVmInstanceMetadataMsg.java-11-17 (1)
11-17:⚠️ Potential issue | 🟠 Major给这个 API 补上
@MetadataImpact。本 PR 的目标已经明确说明:CI 会校验所有
APIMessage子类都带这个注解。APICleanupVmInstanceMetadataMsg现在缺失后,会直接偏离这条约束;这里至少要补一个与“清理主存元数据”语义一致的 impact。🩹 建议修改
+@MetadataImpact(MetadataImpact.Impact.STORAGE) `@RestRequest`( path = "/vm-instances/metadata/cleanup", method = HttpMethod.PUT,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/APICleanupVmInstanceMetadataMsg.java` around lines 11 - 17, Add the `@MetadataImpact` annotation to the APICleanupVmInstanceMetadataMsg class to reflect that this API modifies VM metadata (it performs cleanup/removal); update the class declaration for APICleanupVmInstanceMetadataMsg to include `@MetadataImpact` with the appropriate impact type/flags that indicate metadata deletion/cleanup so it satisfies the CI rule requiring all APIMessage subclasses to carry this annotation.conf/db/upgrade/V4.10.29__schema.sql-20-21 (1)
20-21:⚠️ Potential issue | 🟠 Major把索引创建改成可重入。
Line 21 的
CREATE INDEX没有做存在性保护。升级如果在这之后部分失败并重跑,前面的CREATE TABLE IF NOT EXISTS会跳过,但这里会因为索引已存在而直接中断,恢复流程走不下去。这里建议先查information_schema.STATISTICS,再按条件创建索引。 Based on learnings, “ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作,不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V4.10.29__schema.sql` around lines 20 - 21, CREATE INDEX `idxVmMetadataDirtyUnclaimed` on VmMetadataDirtyVO is not idempotent; change the upgrade step to first query information_schema.STATISTICS for table_name='VmMetadataDirtyVO' and index_name='idxVmMetadataDirtyUnclaimed' and only run CREATE INDEX `idxVmMetadataDirtyUnclaimed` (...) when that query returns no rows so the migration can be safely re-run; update the V4.10.29__schema.sql migration to perform this conditional check (use the same index name `idxVmMetadataDirtyUnclaimed` and table `VmMetadataDirtyVO`) rather than unconditionally issuing CREATE INDEX.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java-1966-1968 (1)
1966-1968:⚠️ Potential issue | 🟠 Major这里应按目标 host 选择 backend。
Line 1966 前面已经解析出了
hostUuid,这里再走getUsableBackend()会把请求绑定到“任意可用 backend”,而不是目标 host 对应的实现。这样既可能选错 backend,也会在没有其它 connected host 时直接拿到null。建议改为按hostUuid取 backend。🛠️ 建议修改
- final NfsPrimaryStorageBackend backend = getUsableBackend(); + final NfsPrimaryStorageBackend backend = getBackendByHostUuid(hostUuid);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java` around lines 1966 - 1968, 在 NfsPrimaryStorage.handle 调用前不要再用 getUsableBackend()(它返回任意可用 backend),而要基于已解析出的 hostUuid 选择对应的 backend;替换为按 hostUuid 获取 backend(例如新增或调用类似 getBackendByHostUuid(hostUuid) / findBackendForHost(hostUuid) 的方法),并在找不到对应 backend 时做明确的 null 处理/错误返回;确保后续调用 backend.handle(msg, hostUuid, ...) 使用该按 hostUuid 选出的后端实例。compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-3198-3208 (1)
3198-3208:⚠️ Potential issue | 🟠 MajorRegistering 态把只读 API 也一并拦掉了。
当前白名单只有
APIDestroyVmInstanceMsg,所以后面同样在本方法处理的APIGetVmBootOrderMsg、APIGetVmDnsMsg、APIGetVmCapabilitiesMsg等只读请求也会直接返回错误。这和注释里的“允许 Destroy 和 query/read 操作”不一致,注册期的查询能力会被破坏。建议把允许列表补齐,或抽成统一的isAllowedWhileRegistering()判断。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 3198 - 3208, The current guard in handleApiMessage treats VmInstanceState.Registering as allowing only APIDestroyVmInstanceMsg, which incorrectly blocks read/query messages like APIGetVmBootOrderMsg, APIGetVmDnsMsg, APIGetVmCapabilitiesMsg; update the check to allow all read-only APIs (either by extending the whitelist to include those API*Msg classes or by adding a helper isAllowedWhileRegistering(Message) used in handleApiMessage) so that queries are permitted during Registering while destructive operations remain blocked.plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java-2058-2064 (1)
2058-2064:⚠️ Potential issue | 🟠 Major元数据路径的推导方式太脆,容易写到错误位置。
这里既没有处理
installPath == null,也把目录提取硬编码成了vol-*的正则。只要查不到 root volume 的 install path,或者路径格式稍微不符合这个模式,metadataPath就会变成 NPE 或*.qcow2<suffix>这种错误地址,后面的更新会静默落到错地方。更稳妥的修正方向
String installPath = Q.New(VolumeVO.class) .eq(VolumeVO_.uuid, msg.getRootVolumeUuid()) .select(VolumeVO_.installPath) .findValue(); -String path = installPath.replaceFirst("^(.+/vol-[^/]+/).*$", "$1"); -String metadataPath = String.format("%s%s", path, VM_META_SUFFIX); +if (installPath == null) { + completion.fail(operr("cannot find install path for root volume[uuid:%s]", msg.getRootVolumeUuid())); + return; +} + +int dirEnd = installPath.lastIndexOf('/') + 1; +if (dirEnd <= 0) { + completion.fail(operr("invalid root volume install path[%s]", installPath)); + return; +} + +String metadataPath = String.format("%s%s", installPath.substring(0, dirEnd), VM_META_SUFFIX); umsg.setMetadataPath(metadataPath);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java` around lines 2058 - 2064, The code in NfsPrimaryStorageKVMBackend derives metadataPath from installPath unsafely: the query using Q.New(VolumeVO.class) may return null and the replaceFirst with a hardcoded "vol-*" regex can produce NPEs or incorrect paths; update the logic to null-check the installPath returned by the VolumeVO query, validate its expected directory structure (avoid relying solely on "vol-*" regex), and compute metadataPath by using safe path operations (e.g., locate the parent directory containing the volume folder or match a configurable segment) before calling umsg.setMetadataPath(metadataPath); ensure fallback handling/logging when installPath is missing or malformed so no silent writes occur.compute/src/main/java/org/zstack/compute/vm/VmExpungeMetadataFlow.java-51-71 (1)
51-71:⚠️ Potential issue | 🟠 Major根卷查询时机冲突导致元数据清理可能被静默跳过。
流程文档(第 27 行)明确说明
VmExpungeMetadataFlow运行在 Root/Memory/Cache Volume 删除之后,但代码第 60-63 行仍尝试查询VolumeVO来获取primaryStorageUuid。若实际编排遵循文档顺序,此查询会稳定返回 null,导致元数据清理始终进入 skip 分支,最后仅留下一条调试日志。修复方案: 从
spec.getVmInventory().getRootVolume().getPrimaryStorageUuid()直接读取缓存的 PS 信息,避免依赖已删除的 VolumeVO 记录。此方式在VmImageSelectBackupStorageFlow等其他流程中已成熟使用。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeMetadataFlow.java` around lines 51 - 71, The code currently queries VolumeVO for primaryStorageUuid (Q.New(...).select(...)) which races with document-described ordering and can return null after volumes are deleted; replace that DB lookup in VmExpungeMetadataFlow with reading the cached value via spec.getVmInventory().getRootVolume().getPrimaryStorageUuid() (use that as psUuid) and only fall back to the VolumeVO query if the cached rootVolume or its primaryStorageUuid is null; update logic around psUuid null-handling accordingly to avoid silent skips.compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java-133-138 (1)
133-138:⚠️ Potential issue | 🟠 Major批量扫描缺少稳定推进条件,超过 batch 后会有饥饿问题。
这里直接
limit(batchSize),但没有ORDER BY或游标式推进。待恢复记录一旦多于 batch,周期任务就可能反复命中同一小部分数据,其他 VM 长期拿不到重试机会。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java` around lines 133 - 138, The query in MetadataStaleRecoveryTask uses SQL.New(...).limit(batchSize) on VmMetadataPathFingerprintVO and can repeatedly hit the same subset when >batchSize records exist; make selection deterministic and page through results by adding an ORDER BY (e.g., ORDER BY fp.id or ORDER BY fp.lastUpdate) and implement cursor-style pagination (e.g., WHERE fp.id > :lastProcessedId or WHERE fp.lastUpdate <= :lastProcessedTimestamp) persisted between runs (or advanced each invocation) instead of just limit(batchSize); update the MetadataStaleRecoveryTask to supply the ordering parameter and maintain/advance the cursor so all stale records get fair retry opportunities.header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataMsg.java-23-25 (1)
23-25:⚠️ Potential issue | 🟠 Major这里有两个可写的 VM UUID 来源,API 合约会变得含糊。
这个消息继承了
APICreateMessage,因此会自动暴露resourceUuid(见header/src/main/java/org/zstack/header/message/APICreateMessage.java:8-42);但metadataContent自身又携带vmUuid。如果两者不一致,调用方和服务端都很难判断哪个才是权威值,最好在消息层直接禁止其一,或显式校验二者一致。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataMsg.java` around lines 23 - 25, APIRegisterVmInstanceFromMetadataMsg currently exposes two writable VM UUID sources (inherited resourceUuid from APICreateMessage and vmUuid inside metadataContent), causing ambiguity; update APIRegisterVmInstanceFromMetadataMsg to validate and reconcile them: in the message validation (e.g. override a validate/validateMessage method or add a custom check before processing) parse vmUuid from metadataContent and if resourceUuid is non-null ensure it equals the parsed vmUuid (reject the request on mismatch), otherwise set resourceUuid to the parsed vmUuid so only one authoritative UUID is used; reference APIRegisterVmInstanceFromMetadataMsg, APICreateMessage, metadataContent and resourceUuid when implementing this check.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-675-706 (1)
675-706:⚠️ Potential issue | 🟠 Major保存的路径指纹并不等于刚写出去的 payload。
deleted == 0分支说明这次成功写入的 metadata 已经落后于 DB 当前版本,但这里仍然会继续savePathFingerprint();同时savePathFingerprint()自己查询的卷集合只看vmInstanceUuid,没有带上lastVmInstanceUuid的已卸载卷,也没排除 shareable 卷。结果是指纹既可能“超前”于实际存储内容,也可能和 payload 的资源集合不一致,MetadataPathDriftDetector很容易误判。Also applies to: 813-829
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 675 - 706, The current onFlushSuccess() always calls savePathFingerprint(vmUuid) even when deleted==0 (meaning the DB has newer changes), and savePathFingerprint() queries volumes only by vmInstanceUuid and does not exclude shareable volumes or include volumes with lastVmInstanceUuid, leading to fingerprints that don't match the just-flushed payload; fix by: (1) only invoke savePathFingerprint(vmUuid) when the conditional delete succeeded (deleted != 0) inside onFlushSuccess() in VmMetadataDirtyMarker so the fingerprint reflects the flushed snapshotVersion, and (2) update savePathFingerprint(...) to build its volume query to include volumes where vmInstanceUuid == vmUuid OR lastVmInstanceUuid == vmUuid while excluding shareable volumes (and any other detached-but-relevant volumes), ensuring the recorded path fingerprint matches the actual resources in the flushed payload.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataUpdateInterceptor.java-149-157 (1)
149-157:⚠️ Potential issue | 🟠 Major
Impact.NONE需要直接跳过。PR 里已经把 500+ 个 API 标成了
NONE,但这里仍会把它们缓存下来并在成功后markDirty(false)。这样所有能解析出 vmUuid 的“无影响 API”都会触发一次 metadata flush,负载会被放大很多。最好在beforeDeliveryMessage()就提前返回,并在submitMarkDirty()再补一层防御。💡 建议修改
- APIMessage apiMsg = (APIMessage) msg; - List<String> vmUuids = resolveVmUuids(apiMsg); + MetadataImpact impact = msg.getClass().getAnnotation(MetadataImpact.class); + if (impact == null || impact.value() == MetadataImpact.Impact.NONE) { + return; + } + + APIMessage apiMsg = (APIMessage) msg; + List<String> vmUuids = resolveVmUuids(apiMsg); if (vmUuids.isEmpty()) { return; } - - MetadataImpact impact = msg.getClass().getAnnotation(MetadataImpact.class); pendingApis.put(apiMsg.getId(), new MetadataImpactInfo( vmUuids, impact.value(), impact.updateOnFailure())); @@ void submitMarkDirty(String vmInstanceUuid, MetadataImpact.Impact impact) { + if (impact == MetadataImpact.Impact.NONE) { + return; + } boolean storageStructureChange = (impact == MetadataImpact.Impact.STORAGE);Also applies to: 230-235
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataUpdateInterceptor.java` around lines 149 - 157, In VmMetadataUpdateInterceptor, when resolving MetadataImpact for a message (the code that builds pendingApis using apiMsg.getId() and new MetadataImpactInfo(...)), skip and return immediately if impact.value() == MetadataImpact.Impact.NONE so NONE-marked APIs are not cached; additionally add a defensive check in submitMarkDirty (the method that processes pendingApis and calls markDirty) to ignore entries whose MetadataImpactInfo indicates Impact.NONE before submitting any markDirty work. Ensure you reference VmMetadataUpdateInterceptor, the pendingApis map, MetadataImpact annotation usage, and submitMarkDirty/markDirty when making the changes.compute/src/main/java/org/zstack/compute/vm/VmInstanceMetadataFieldProcessor.java-196-199 (1)
196-199:⚠️ Potential issue | 🟠 Major收紧路径标识符的匹配/替换边界。
这里同时用了
contains()和全量replace()。oldPathIdentifier在 spec 里要么是“路径前缀”,要么是 “VG UUID”;当前写法会把其它路径段里偶然命中的同名片段也判成同一主存储,并把所有命中位置都改掉。跨存储过滤和 installPath 重写都会出现误判/误改。Also applies to: 226-230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceMetadataFieldProcessor.java` around lines 196 - 199, The belongsToPrimaryStorage check and the related installPath rewrite use naive contains()/replace(), causing false positives when oldPathIdentifier appears inside other path segments; update belongsToPrimaryStorage to only match when installPath has the identifier as the intended boundary (e.g., path prefix or whole VG UUID) by using startsWith for prefix matches or exact-equality for UUID matches, or use a regex that enforces segment boundaries (e.g., (^|/)<identifier>(/|$)); likewise change the corresponding replacement code (the code around lines 226-230 that does full replace) to only replace the matched prefix/segment (or use Path-based normalization) instead of replacing all occurrences so only the intended path portion is rewritten.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-899-917 (1)
899-917:⚠️ Potential issue | 🟠 Major批量初始化/升级刷新绕过了 KVM 过滤。
markDirty()的常规入口明确只处理 KVM UserVm,但这里两条 SQL 只筛了type = 'UserVm'。功能开关开启或做 upgrade refresh 时,非 KVM 的 UserVm 也会被塞进 dirty 表,和日常增量路径的目标集合不一致。Also applies to: 995-1003
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 899 - 917, The batch INSERT/UPDATE for VmMetadataDirtyVO is only filtering on v.type = 'UserVm' and thus includes non-KVM User VMs; update the SQL in the methods that perform the batch initialize/upgrade (the SQL.New(...) statements operating on VmInstanceVO -> VmMetadataDirtyVO and referenced by markDirty()) to also filter by the KVM hypervisor (e.g., add a condition like v.hypervisorType = 'KVM' or the equivalent column used in VmInstanceVO), and apply the same fix to the other instance of the SQL block (the similar statements around the 995-1003 region) so the batch path matches the normal markDirty() KVM-only behavior.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-842-845 (1)
842-845:⚠️ Potential issue | 🟠 Major成功刷写后需要清掉失败标记。
已有指纹行分支只更新时间和
pathSnapshot,没有把lastFlushFailed复位。一次历史失败会永久残留,后续恢复/巡检逻辑会一直把这台 VM 当成“上次 flush 失败”。💡 建议修改
} else { fp.setPathSnapshot(pathJson); fp.setLastFlushTime(new Timestamp(System.currentTimeMillis())); + fp.setLastFlushFailed(false); dbf.update(fp); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 842 - 845, The current success branch in VmMetadataDirtyMarker updates fp.setPathSnapshot(pathJson) and fp.setLastFlushTime(...) but does not clear the failure marker, so a prior failure remains forever; after a successful flush, reset the failure flag on the fingerprint object (e.g. call fp.setLastFlushFailed(false) or the appropriate setter) before calling dbf.update(fp) so the record no longer indicates a lastFlushFailed state. Ensure you reference VmMetadataDirtyMarker and the fp instance when making this change.compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java-519-526 (1)
519-526:⚠️ Potential issue | 🟠 Major全局并发上限检查不是原子的。
get()和incrementAndGet()分开执行,多个线程可以同时穿过这里,把globalFlushInFlight冲到maxConcurrent之上。这个限流对triggerFlushForVm()和 poller 的并发提交都不可靠。💡 建议修改
int maxConcurrent = VmGlobalConfig.VM_METADATA_GLOBAL_MAX_CONCURRENT.value(Integer.class); - if (globalFlushInFlight.get() >= maxConcurrent) { - // 全局并发已满,释放 claim,Poller 下轮重试 - releaseClaim(vmUuid); - return; - } - globalFlushInFlight.incrementAndGet(); + int inFlight = globalFlushInFlight.incrementAndGet(); + if (inFlight > maxConcurrent) { + globalFlushInFlight.decrementAndGet(); + // 全局并发已满,释放 claim,Poller 下轮重试 + releaseClaim(vmUuid); + return; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java` around lines 519 - 526, The current check uses separate globalFlushInFlight.get() and incrementAndGet(), which is not atomic and lets multiple threads exceed maxConcurrent; change it to an atomic compare-and-increment (e.g., a CAS loop or AtomicInteger.updateAndGet/getAndUpdate with a lambda) inside VmMetadataDirtyMarker where you currently check maxConcurrent so that you only increment when the previous value is < maxConcurrent; if the atomic update fails because value >= maxConcurrent, call releaseClaim(vmUuid) and return. Also ensure the corresponding decrement happens reliably (e.g., in the existing completion/finally path) so globalFlushInFlight is balanced. Reference symbols: globalFlushInFlight, triggerFlushForVm(), releaseClaim(vmUuid).
🟡 Minor comments (2)
conf/db/upgrade/V4.10.29__schema.sql-35-36 (1)
35-36:⚠️ Potential issue | 🟡 Minor清理语句也统一带上
zstackschema。这里直接
DELETE FROM \GarbageCollectorVO`仍然依赖默认库,和本目录升级脚本的固定 schema 约定不一致;建议改成DELETE FROM `zstack`.`GarbageCollectorVO`。 Based on learnings, “In ZStack, the database schema name is fixed as 'zstack'. For SQL upgrade scripts under conf/db/upgrade, continue using TABLE_SCHEMA = 'zstack' and reference tables as zstack.UsedIpVO`.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V4.10.29__schema.sql` around lines 35 - 36, Update the DELETE statement so it explicitly targets the fixed zstack schema instead of relying on the default database; change the statement that references the table GarbageCollectorVO in V4.10.29__schema.sql to DELETE FROM `zstack`.`GarbageCollectorVO` WHERE `name` LIKE 'update-vm-%-metadata-gc'; ensuring the schema-qualified table name (`zstack`.`GarbageCollectorVO`) is used anywhere this table is referenced in the upgrade script.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java-7382-7386 (1)
7382-7386:⚠️ Potential issue | 🟡 Minor读取失败时不要继续回填
metadata。当前 agent 返回失败后仍然把
rsp.metadata填进 reply,调用方如果只消费返回体字段而没有先判断 error,容易把失败结果当成有效元数据处理。💡 建议修改
public void success(ReadVmInstanceMetadataRsp rsp) { if (!rsp.isSuccess()) { reply.setError(operr("operation error, because:%s", rsp.getError())); + } else { + reply.setMetadata(rsp.metadata); } - reply.setMetadata(rsp.metadata); bus.reply(msg, reply); completion.done(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 7382 - 7386, In success(ReadVmInstanceMetadataRsp rsp) ensure you do not populate reply.metadata when the response indicates failure: if !rsp.isSuccess() set reply.setError(operr(...)) and return (or otherwise skip setting reply.setMetadata), otherwise set reply.setMetadata(rsp.metadata); this prevents failed responses from writing rsp.metadata into reply.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 382725ba-4cc9-4e3b-904a-6aa152d2bb90
⛔ Files ignored due to path filters (4)
conf/globalConfig/vm.xmlis excluded by!**/*.xmlconf/persistence.xmlis excluded by!**/*.xmlconf/serviceConfig/primaryStorage.xmlis excluded by!**/*.xmlconf/springConfigXml/VmInstanceManager.xmlis excluded by!**/*.xml
📒 Files selected for processing (136)
compute/src/main/java/org/zstack/compute/vm/AbstractVmInstance.javacompute/src/main/java/org/zstack/compute/vm/VmExpungeMetadataFlow.javacompute/src/main/java/org/zstack/compute/vm/VmGlobalConfig.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceBase.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceMetadataFieldProcessor.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceUtils.javacompute/src/main/java/org/zstack/compute/vm/metadata/DefaultVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/MetadataCascadeExtension.javacompute/src/main/java/org/zstack/compute/vm/metadata/MetadataOrphanDetector.javacompute/src/main/java/org/zstack/compute/vm/metadata/MetadataPathDriftDetector.javacompute/src/main/java/org/zstack/compute/vm/metadata/MetadataPathSnapshotBuilder.javacompute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.javacompute/src/main/java/org/zstack/compute/vm/metadata/NicBasedVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/ReflectionBasedVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/ResourceBasedVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/SnapshotBasedVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/SnapshotGroupBasedVmUuidFromApiResolver.javacompute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataBuilder.javacompute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.javacompute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataUpdateInterceptor.javacompute/src/main/java/org/zstack/compute/vm/metadata/VolumeBasedVmUuidFromApiResolver.javaconf/db/upgrade/V4.10.29__schema.sqlheader/src/main/java/org/zstack/header/storage/backup/APIExportImageFromBackupStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageReply.javaheader/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceEventDoc_zh_cn.groovyheader/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsg.javaheader/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsgDoc_zh_cn.groovyheader/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceReply.javaheader/src/main/java/org/zstack/header/storage/primary/CleanupVmInstanceMetadataOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/CleanupVmInstanceMetadataOnPrimaryStorageReply.javaheader/src/main/java/org/zstack/header/storage/primary/GetVmInstanceMetadataFromPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/storage/primary/GetVmInstanceMetadataFromPrimaryStorageReply.javaheader/src/main/java/org/zstack/header/storage/primary/ReadVmInstanceMetadataMsg.javaheader/src/main/java/org/zstack/header/storage/primary/ReadVmInstanceMetadataOnHypervisorMsg.javaheader/src/main/java/org/zstack/header/storage/primary/ReadVmInstanceMetadataOnHypervisorReply.javaheader/src/main/java/org/zstack/header/storage/primary/ReadVmInstanceMetadataReply.javaheader/src/main/java/org/zstack/header/storage/primary/RegisterVmInstanceException.javaheader/src/main/java/org/zstack/header/storage/snapshot/APIDeleteVolumeSnapshotMsg.javaheader/src/main/java/org/zstack/header/storage/snapshot/APIRevertVolumeFromSnapshotMsg.javaheader/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotTree.javaheader/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotVO_.javaheader/src/main/java/org/zstack/header/storage/snapshot/group/APIDeleteVolumeSnapshotGroupMsg.javaheader/src/main/java/org/zstack/header/tag/APICreateSystemTagMsg.javaheader/src/main/java/org/zstack/header/tag/APIDeleteTagMsg.javaheader/src/main/java/org/zstack/header/tag/APIUpdateSystemTagMsg.javaheader/src/main/java/org/zstack/header/vm/APIAttachVmNicToVmMsg.javaheader/src/main/java/org/zstack/header/vm/APIChangeVmNicNetworkMsg.javaheader/src/main/java/org/zstack/header/vm/APIChangeVmNicStateMsg.javaheader/src/main/java/org/zstack/header/vm/APICheckVmInstanceMetadataConsistencyMsg.javaheader/src/main/java/org/zstack/header/vm/APICheckVmInstanceMetadataConsistencyReply.javaheader/src/main/java/org/zstack/header/vm/APICleanupVmInstanceMetadataEvent.javaheader/src/main/java/org/zstack/header/vm/APICleanupVmInstanceMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/APIConvertTemplatedVmInstanceToVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/APIConvertVmInstanceToTemplatedVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/APIDeleteVmBootModeMsg.javaheader/src/main/java/org/zstack/header/vm/APIDeleteVmHostnameMsg.javaheader/src/main/java/org/zstack/header/vm/APIDeleteVmNicMsg.javaheader/src/main/java/org/zstack/header/vm/APIDeleteVmSshKeyMsg.javaheader/src/main/java/org/zstack/header/vm/APIPreCheckVmMetadataRegistrationMsg.javaheader/src/main/java/org/zstack/header/vm/APIPreCheckVmMetadataRegistrationReply.javaheader/src/main/java/org/zstack/header/vm/APIReadVmInstanceMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/APIReadVmInstanceMetadataReply.javaheader/src/main/java/org/zstack/header/vm/APIRecoverVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataEvent.javaheader/src/main/java/org/zstack/header/vm/APIRegisterVmInstanceFromMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/APIReimageVmInstanceEvent.javaheader/src/main/java/org/zstack/header/vm/APIReimageVmInstanceEventDoc_zh_cn.groovyheader/src/main/java/org/zstack/header/vm/APIReimageVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/APIScanVmInstanceMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/APIScanVmInstanceMetadataReply.javaheader/src/main/java/org/zstack/header/vm/APISetVmBootOrderMsg.javaheader/src/main/java/org/zstack/header/vm/APIUpdateVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/APIUpdateVmMetadataEvent.javaheader/src/main/java/org/zstack/header/vm/APIUpdateVmMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusMsg.javaheader/src/main/java/org/zstack/header/vm/BatchCheckMetadataStatusReply.javaheader/src/main/java/org/zstack/header/vm/ConsistencyCheckResult.javaheader/src/main/java/org/zstack/header/vm/DiffEntry.javaheader/src/main/java/org/zstack/header/vm/MetadataImpact.javaheader/src/main/java/org/zstack/header/vm/MetadataStatusResult.javaheader/src/main/java/org/zstack/header/vm/MetadataStorageHandler.javaheader/src/main/java/org/zstack/header/vm/PreCheckItem.javaheader/src/main/java/org/zstack/header/vm/RegisterVmFromMetadataInnerMsg.javaheader/src/main/java/org/zstack/header/vm/RegisterVmFromMetadataInnerReply.javaheader/src/main/java/org/zstack/header/vm/RepairMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/RepairMetadataReply.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataMsg.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataOnHypervisorMsg.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataOnHypervisorReply.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataOnPrimaryStorageMsg.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataOnPrimaryStorageReply.javaheader/src/main/java/org/zstack/header/vm/UpdateVmInstanceMetadataReply.javaheader/src/main/java/org/zstack/header/vm/VmInstanceConstant.javaheader/src/main/java/org/zstack/header/vm/VmInstanceMetadataCodec.javaheader/src/main/java/org/zstack/header/vm/VmInstanceMetadataConstants.javaheader/src/main/java/org/zstack/header/vm/VmInstanceMetadataDTO.javaheader/src/main/java/org/zstack/header/vm/VmInstanceMetadataRegistrationSpec.javaheader/src/main/java/org/zstack/header/vm/VmInstanceMetadataValidator.javaheader/src/main/java/org/zstack/header/vm/VmInstanceState.javaheader/src/main/java/org/zstack/header/vm/VmMetadata.javaheader/src/main/java/org/zstack/header/vm/VmMetadataCanonicalEvents.javaheader/src/main/java/org/zstack/header/vm/VmMetadataCategory.javaheader/src/main/java/org/zstack/header/vm/VmMetadataConstants.javaheader/src/main/java/org/zstack/header/vm/VmMetadataDirtyVO.javaheader/src/main/java/org/zstack/header/vm/VmMetadataDirtyVO_.javaheader/src/main/java/org/zstack/header/vm/VmMetadataErrors.javaheader/src/main/java/org/zstack/header/vm/VmMetadataPathFingerprintVO.javaheader/src/main/java/org/zstack/header/vm/VmMetadataScanResult.javaheader/src/main/java/org/zstack/header/vm/VmUuidFromApiResolver.javaheader/src/main/java/org/zstack/header/vm/VolumeResourceMetadata.javaheader/src/main/java/org/zstack/header/vm/cdrom/APIDeleteVmCdRomMsg.javaheader/src/main/java/org/zstack/header/volume/APIAttachDataVolumeToVmMsg.javaheader/src/main/java/org/zstack/header/volume/APICreateVolumeSnapshotGroupMsg.javaheader/src/main/java/org/zstack/header/volume/APIDeleteDataVolumeMsg.javaheader/src/main/java/org/zstack/header/volume/APIDetachDataVolumeFromVmMsg.javaheader/src/main/java/org/zstack/header/volume/APIFlattenVolumeMsg.javaheader/src/main/java/org/zstack/header/volume/APIRecoverDataVolumeMsg.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHost.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/APILocalStorageMigrateVolumeMsg.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.javaplugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.javaplugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.javaplugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.javaplugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.javaresourceconfig/src/main/java/org/zstack/resourceconfig/APIDeleteResourceConfigMsg.javaresourceconfig/src/main/java/org/zstack/resourceconfig/APIUpdateResourceConfigMsg.javasdk/src/main/java/org/zstack/sdk/RegisterVmInstanceAction.javasdk/src/main/java/org/zstack/sdk/RegisterVmInstanceResult.javastorage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.javatestlib/src/main/java/org/zstack/testlib/ApiHelper.groovytestlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy
🚧 Files skipped from review as they are similar to previous changes (9)
- header/src/main/java/org/zstack/header/storage/primary/ReadVmInstanceMetadataMsg.java
- compute/src/main/java/org/zstack/compute/vm/metadata/DefaultVmUuidFromApiResolver.java
- header/src/main/java/org/zstack/header/storage/primary/GetVmInstanceMetadataFromPrimaryStorageMsg.java
- compute/src/main/java/org/zstack/compute/vm/metadata/SnapshotBasedVmUuidFromApiResolver.java
- compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java
- header/src/main/java/org/zstack/header/storage/primary/RegisterVmInstanceException.java
- compute/src/main/java/org/zstack/compute/vm/VmInstanceUtils.java
- header/src/main/java/org/zstack/header/storage/primary/APIRegisterVmInstanceMsgDoc_zh_cn.groovy
- header/src/main/java/org/zstack/header/storage/primary/APIGetVmInstanceMetadataFromPrimaryStorageReply.java
| @Override | ||
| public void managementNodeReady() { | ||
| startTask(); | ||
| } | ||
|
|
||
| // ===================================================================== | ||
| // 任务管理 | ||
| // ===================================================================== | ||
|
|
||
| private synchronized void startTask() { | ||
| if (taskFuture != null) { | ||
| taskFuture.cancel(false); | ||
| } | ||
| taskFuture = thdf.submitPeriodicTask(new PeriodicTask() { | ||
| @Override | ||
| public TimeUnit getTimeUnit() { | ||
| return TimeUnit.SECONDS; | ||
| } | ||
|
|
||
| @Override | ||
| public long getInterval() { | ||
| return VmGlobalConfig.VM_METADATA_STALE_RECOVERY_INTERVAL.value(Long.class); | ||
| } | ||
|
|
||
| @Override | ||
| public String getName() { | ||
| return "vm-metadata-stale-recovery"; | ||
| } | ||
|
|
||
| @Override | ||
| public void run() { | ||
| recoverStaleVms(); | ||
| } | ||
| }); | ||
| logger.info("[MetadataStaleRecovery] task started (interval={}s)", | ||
| VmGlobalConfig.VM_METADATA_STALE_RECOVERY_INTERVAL.value(Long.class)); | ||
| } |
There was a problem hiding this comment.
这个定时任务现在会在每个管理节点上重复跑。
managementNodeReady() 无条件启动本地 periodic task,但后面的扫描又没有 owner/分片条件。集群场景下多个管理节点会并发处理同一批 lastFlushFailed=1 记录,轻则重复告警/重复入队,重则把 staleRecoveryCount 提前打满,过早进入 circuit-break。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/MetadataStaleRecoveryTask.java`
around lines 75 - 111, The periodic recovery task is started unconditionally in
managementNodeReady() causing every management node to run recoverStaleVms()
concurrently; modify startTask()/managementNodeReady() so only the elected/owner
management node runs the periodic task (or keep the task scheduled but add an
ownership/leader check at the start of the PeriodicTask.run), e.g. before
submitting via thdf.submitPeriodicTask or inside the run() wrapper verify node
is the cluster leader/owner and skip calling recoverStaleVms() when not,
ensuring taskFuture is started/cancelled only on the owner and the
VmGlobalConfig.VM_METADATA_STALE_RECOVERY_INTERVAL behavior remains unchanged.
| bus.send(msg, new CloudBusCallBack(null) { | ||
| @Override | ||
| public void run(MessageReply reply) { | ||
| if (reply.isSuccess()) { | ||
| onFlushSuccess(vmUuid, snapshotVersion); | ||
| } else { | ||
| onFlushFailure(vmUuid, reply.getError()); | ||
| } | ||
| chainNext.run(); |
There was a problem hiding this comment.
确保 chainNext.run() 一定执行。
onFlushSuccess() / onFlushFailure() 里只要抛一个运行时异常,chainNext.run() 就不会执行,per-VM chain 会卡住,globalFlushInFlight 也会泄漏。这里至少要用 try/finally 包住收尾。
💡 建议修改
bus.send(msg, new CloudBusCallBack(null) {
`@Override`
public void run(MessageReply reply) {
- if (reply.isSuccess()) {
- onFlushSuccess(vmUuid, snapshotVersion);
- } else {
- onFlushFailure(vmUuid, reply.getError());
- }
- chainNext.run();
+ try {
+ if (reply.isSuccess()) {
+ onFlushSuccess(vmUuid, snapshotVersion);
+ } else {
+ onFlushFailure(vmUuid, reply.getError());
+ }
+ } finally {
+ chainNext.run();
+ }
}
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java`
around lines 647 - 655, The callback in bus.send (CloudBusCallBack.run) must
guarantee chainNext.run() executes even if onFlushSuccess(vmUuid,
snapshotVersion) or onFlushFailure(vmUuid, reply.getError()) throws; update the
run(...) method to invoke onFlushSuccess/onFlushFailure inside a try block and
call chainNext.run() in a finally block (optionally log any exception thrown),
so the per-VM chain and globalFlushInFlight cannot get stuck; locate the run
method in the anonymous CloudBusCallBack and change the structure to try { if
(reply.isSuccess()) onFlushSuccess(vmUuid, snapshotVersion); else
onFlushFailure(vmUuid, reply.getError()); } finally { chainNext.run(); }.
| // C-SR-05: 在 PathFingerprintVO 上标记 lastFlushFailed=true | ||
| SQL.New("UPDATE VmMetadataPathFingerprintVO " + | ||
| "SET lastFlushFailed = 1 WHERE vmInstanceUuid = :vmUuid") | ||
| .param("vmUuid", vmUuid) | ||
| .execute(); | ||
|
|
||
| // 删除 dirty 行(释放 Poller 资源),stale 恢复由独立任务接管 | ||
| SQL.New(VmMetadataDirtyVO.class) | ||
| .eq(VmMetadataDirtyVO_.vmInstanceUuid, vmUuid) | ||
| .delete(); |
There was a problem hiding this comment.
首次 flush 失败会把恢复线索一并删掉。
这里先 UPDATE 指纹表,再无条件删除 dirty 行。对“还从未成功刷写过”的 VM,VmMetadataPathFingerprintVO 还不存在,lastFlushFailed 实际不会落库;dirty 行一删,后面的 stale-recovery 就没有入口了。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@compute/src/main/java/org/zstack/compute/vm/metadata/VmMetadataDirtyMarker.java`
around lines 739 - 748, The current flow updates VmMetadataPathFingerprintVO
(setting lastFlushFailed) then unconditionally deletes VmMetadataDirtyVO rows;
if the fingerprint row does not exist (first-ever flush) the UPDATE is a no-op
and the delete removes the only recovery clue. Change the logic in the code
around the SQL.New("UPDATE VmMetadataPathFingerprintVO ...").param("vmUuid",
vmUuid).execute() and the
SQL.New(VmMetadataDirtyVO.class).eq(VmMetadataDirtyVO_.vmInstanceUuid,
vmUuid).delete() call so that you either (a) ensure a
VmMetadataPathFingerprintVO is created/inserted with lastFlushFailed=true when
none exists before deleting dirty rows, or (b) skip deleting the
VmMetadataDirtyVO row unless the fingerprint row existed/was successfully
updated (i.e., check existence of VmMetadataPathFingerprintVO by vmInstanceUuid
or use the UPDATE result to decide). Ensure you reference
VmMetadataPathFingerprintVO, VmMetadataDirtyVO, lastFlushFailed and
vmInstanceUuid/vmUuid when making the change.
| // 1. 构建 payload(通过 VmMetadataBuilder 在 @Transactional(readOnly=true) 事务内完成) | ||
| String metadata = vmMetadataBuilder.buildVmInstanceMetadata(msg.getUuid()); | ||
|
|
||
| // 2. Payload 大小保护 | ||
| int payloadSize = metadata.getBytes(java.nio.charset.StandardCharsets.UTF_8).length; | ||
| if (payloadSize > org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD) { | ||
| logger.error(String.format("metadata payload too large: %d bytes for vm[uuid:%s], rejecting", | ||
| payloadSize, msg.getUuid())); | ||
| MessageReply reply = new MessageReply(); | ||
| reply.setError(Platform.operr("metadata payload too large (%d bytes, limit %d) for vm[uuid=%s]", | ||
| payloadSize, org.zstack.compute.vm.metadata.VmMetadataBuilder.REJECT_THRESHOLD, msg.getUuid())); | ||
| bus.reply(msg, reply); | ||
| return; | ||
| } | ||
| if (payloadSize > org.zstack.compute.vm.metadata.VmMetadataBuilder.WARN_THRESHOLD) { | ||
| logger.warn(String.format("metadata payload large: %d bytes for vm[uuid:%s]", | ||
| payloadSize, msg.getUuid())); | ||
| } | ||
|
|
||
| // 3. 发送到主存储 | ||
| Tuple tuple = Q.New(VolumeVO.class).select(VolumeVO_.primaryStorageUuid, VolumeVO_.uuid) | ||
| .eq(VolumeVO_.vmInstanceUuid, msg.getUuid()).eq(VolumeVO_.type, VolumeType.Root).findTuple(); | ||
| String primaryStorageUuid = tuple.get(0, String.class); | ||
| String rootVolumeUuid = tuple.get(1, String.class); |
There was a problem hiding this comment.
这里不能假定 metadata 和 Root 卷一定存在。
VmMetadataBuilder.buildVmInstanceMetadata() 在 VM 已删除、不再符合条件,或 payload 被 builder 自己按阈值拒绝时都可能返回 null;Line 9431 直接 getBytes() 会先 NPE,下面的 reject 分支根本走不到。另一方面,Line 9449 和 Line 9450 也默认 findTuple() 一定命中,一旦 flush 和删除并发,同样会异常退出且不回 reply。至少要为 metadata 和 tuple 明确 noop / error 分支。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
9427 - 9450, The code assumes VmMetadataBuilder.buildVmInstanceMetadata(...)
always returns a non-null metadata and that Q.New(...).findTuple() always finds
a root volume; add null/empty checks: after calling
VmMetadataBuilder.buildVmInstanceMetadata(msg.getUuid()) verify metadata != null
and if null either reply with an error via bus.reply(msg, reply) (using
Platform.operr(...) like the existing reject branch) or treat it as a no-op and
reply success, depending on intended semantics; guard the payloadSize
calculation (metadata.getBytes(...)) behind that check and preserve the
REJECT_THRESHOLD/WARN_THRESHOLD logic. Likewise, after Tuple tuple =
Q.New(VolumeVO.class)...findTuple() verify tuple is not null and contains
expected elements before calling tuple.get(0, ...) / get(1, ...); if missing,
log and reply with an error (or no-op reply) so the code does not NPE or throw
when the root volume was concurrently deleted.
| @RestRequest( | ||
| path = "/vm-instances/{vmUuid}/metadata", | ||
| method = HttpMethod.GET, | ||
| responseClass = APIReadVmInstanceMetadataReply.class | ||
| ) | ||
| public class APIReadVmInstanceMetadataMsg extends APISyncCallMessage implements VmInstanceMessage { |
There was a problem hiding this comment.
缺少 @MetadataImpact,和本 PR 的目标不一致。
这也是新增的 APISyncCallMessage,但类上没有 @MetadataImpact。既然这次改动要让 CI 强制检查所有 APIMessage 子类,这里不补注解的话会直接留下一个漏网点;读接口应归到非变更类目。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@header/src/main/java/org/zstack/header/vm/APIReadVmInstanceMetadataMsg.java`
around lines 9 - 14, 在 APIReadVmInstanceMetadataMsg 类上补上缺失的 `@MetadataImpact`
注解,表明该类是只读/非变更元数据的 API(将此消息类从 APISyncCallMessage/ APIMessage 子类标记为非变更类别),以符合本次
CI 对所有 APIMessage 子类的校验。定位到类 APIReadVmInstanceMetadataMsg 并在类定义前添加
`@MetadataImpact`(...),使用表示“只读/非变更”的枚举值(即将读接口归到非变更类目)。
| } else if (msg instanceof APIRegisterVmInstanceMsg) { | ||
| handle((APIRegisterVmInstanceMsg) msg); | ||
| } else if (msg instanceof APIGetVmInstanceMetadataFromPrimaryStorageMsg) { | ||
| handle((APIGetVmInstanceMetadataFromPrimaryStorageMsg) msg); |
There was a problem hiding this comment.
补上 GetVmInstanceMetadataFromPrimaryStorageMsg 的本地分发。
这里 API 入口已经接了 APIGetVmInstanceMetadataFromPrimaryStorageMsg,但 handleLocalMessage() 里没有对应的 GetVmInstanceMetadataFromPrimaryStorageMsg 分支。当前这条链路发到 PrimaryStorage service 后会直接落到 bus.dealWithUnknownMessage(msg),后端实现永远不会被调用。
🔧 建议修复
@@
} else if (msg instanceof GetVolumeSnapshotEncryptedOnPrimaryStorageMsg) {
handle((GetVolumeSnapshotEncryptedOnPrimaryStorageMsg) msg);
+ } else if (msg instanceof GetVmInstanceMetadataFromPrimaryStorageMsg) {
+ handle((GetVmInstanceMetadataFromPrimaryStorageMsg) msg);
} else if (msg instanceof DeleteVolumeChainOnPrimaryStorageMsg) {
handle((DeleteVolumeChainOnPrimaryStorageMsg) msg);
} else if (msg instanceof CleanUpStorageTrashOnPrimaryStorageMsg) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java`
around lines 951 - 954, PrimaryStorageBase's local dispatcher misses handling of
the internal GetVmInstanceMetadataFromPrimaryStorageMsg, so messages of that
type fall through to bus.dealWithUnknownMessage and the backend handler is never
invoked; in handleLocalMessage() add a branch like else if (msg instanceof
GetVmInstanceMetadataFromPrimaryStorageMsg) {
handle((GetVmInstanceMetadataFromPrimaryStorageMsg) msg); } (matching the
existing APIGetVmInstanceMetadataFromPrimaryStorageMsg -> handle(...) pattern)
so the internal message is routed to the existing
handle(GetVmInstanceMetadataFromPrimaryStorageMsg) method.
| @Override | ||
| public void fail(ErrorCode errorCode) { | ||
| bus.publish(event); | ||
| chain.next(); | ||
| } |
There was a problem hiding this comment.
失败分支没有把错误回给 API 调用方。
fail() 里只发布了事件,没有 event.setError(errorCode)。这样注册失败时,API 层可能收到一个“成功但无 inventory”的响应,错误被静默吞掉。
🔧 建议修复
`@Override`
public void fail(ErrorCode errorCode) {
+ event.setError(errorCode);
bus.publish(event);
chain.next();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public void fail(ErrorCode errorCode) { | |
| bus.publish(event); | |
| chain.next(); | |
| } | |
| `@Override` | |
| public void fail(ErrorCode errorCode) { | |
| event.setError(errorCode); | |
| bus.publish(event); | |
| chain.next(); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java`
around lines 1851 - 1855, 在 PrimaryStorageBase.java 的该匿名 Callback 的
fail(ErrorCode errorCode) 分支中目前只调用了 bus.publish(event) 并继续
chain.next(),但没有把错误回传给 API。修改 fail 方法:在发布事件前调用
event.setError(errorCode)(并根据项目约定设置 event.setSuccess(false) 如果存在此方法),然后再
bus.publish(event) 并 chain.next(),以确保调用方能收到失败信息。
Resolves: ZSV-10000
Change-Id: I746a6d7266686d637a6d76756861636a74766b74
sync from gitlab !9328