Skip to content

fix(primaryStorage): rollback persisted records on controller build failure#3868

Open
zstack-robot-1 wants to merge 240 commits into5.5.6from
sync/jin.ma/fix/ZSTAC-84817
Open

fix(primaryStorage): rollback persisted records on controller build failure#3868
zstack-robot-1 wants to merge 240 commits into5.5.6from
sync/jin.ma/fix/ZSTAC-84817

Conversation

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Summary

AddExternalPrimaryStorage with malformed JSON config left dirty rows in DB, breaking PrimaryStorage service permanently (QueryPrimaryStorage 503).

Root Cause

ExternalPrimaryStorageFactory.createPrimaryStorage persists ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO before invoking saveControllerIfNeed → buildControllerSvc → ZbsStorageController.reloadDbInfo → JSONObjectUtil.toObject. A RuntimeException from JSON parsing left those rows persisted with no rollback. The dirty VO then made every subsequent buildPsController() throw the same parse error, so the PrimaryStorage service stayed unhealthy and QueryPrimaryStorage kept returning 503.

Changes

  • storage/ExternalPrimaryStorageFactory.java: wrap saveControllerIfNeed in try/catch; on Throwable, dbf.remove(ref) + dbf.remove(lvo), clear controllers/nodes map entries, then rethrow.
  • test/.../ZbsPrimaryStorageCase.groovy: new SubCase testAddExternalPrimaryStorageWithMalformedJsonShouldRollback asserts no leftover rows after malformed-JSON Add and that QueryPrimaryStorage still works.

Testing

  • mvn compile -pl storage -am -Dmaven.test.skip passes locally.
  • CI: ZbsPrimaryStorageCase runs the new SubCase.
  • Manual: tested via reporter on MN 172.24.227.139.

Resolves: ZSTAC-84817

sync from gitlab !9743

shan.wu and others added 30 commits December 22, 2025 18:07
…igration.

support starting vm without nic in other clusters.

Resolves/Related: ZSTAC-80468

Change-Id: I6370646f7a796265677a6a656c716f6867706d69
…isk offering

Resolves: ZSTAC-74683

Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP
call now times out after 30s instead of 5+ minutes, and automatically
retries the next available MDS via tryNext mechanism.

Resolves: ZSTAC-80595

Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report
the VM as Stopped. Without this transition, the state machine throws
an exception and the VM stays stuck in Destroying state forever.

Resolves: ZSTAC-80620

Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity

Resolves: ZSTAC-79709

Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
Resolves: ZSTAC-78989

Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153

Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback

Resolves: ZSTAC-81741

Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182

Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
<fix>[vm]: add Destroying->Stopped state transition

See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume

See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients

See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation

See merge request zstackio/zstack!9155
…talling

In dual management node scenarios, concurrent modifications to the
consistent hash ring from heartbeat reconciliation and canonical event
callbacks can cause NodeHash/Nodes inconsistency, leading to message
routing failures and task timeouts.

Fix: (1) synchronized all ResourceDestinationMakerImpl methods to
ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in
ManagementNodeManagerImpl to serialize heartbeat reconciliation with
event callbacks, (3) added two-round delayed confirmation before
removing nodes from hash ring to avoid race with NodeJoin events.

Resolves: ZSTAC-77711

Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp

See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity

See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor

See merge request zstackio/zstack!9169
The mdsUrls field in ExternalPrimaryStorage config contains
user:password@host format credentials. Add desensitization to
mask credentials as ***@host in API/CLI output.

Resolves: ZSTAC-80664

Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to
CloudOperationsErrorCode for use in premium volumebackup module.

Resolves: ZSTAC-82195

Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition

See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally,
triggering ServiceLoader for ContextStorageProvider even when
telemetry is disabled. If sentry-opentelemetry-bootstrap jar is
on classpath, ServiceLoader fails with "not a subtype" due to
ClassLoader isolation in Tomcat, throwing ServiceConfigurationError
(extends Error) that escapes all catch(Exception) blocks.

1. Why is this change necessary?
MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because
Context.current() triggers ServiceLoader unconditionally in
SyncTaskFuture constructor, even when telemetry is disabled.

2. How does it address the problem?
Only call Context.current() when isTelemetryEnabled() returns
true, matching the existing guard pattern used in other
DispatchQueueImpl code paths (lines 351, 1069).

3. Are there any side effects?
None. When telemetry is disabled, parentContext was never used.

# Summary of changes (by module):
- core/thread: conditionalize Context.current() in SyncTaskFuture

Related: ZSTAC-82275
Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check

See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195

Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade

See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code

See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields

See merge request zstackio/zstack!9171
…plit-brain

When a management node departs, its VM skip-trace entries were
immediately removed. If VMs were still being started by kvmagent,
the next VM sync would falsely detect them as Stopped and trigger
HA, causing split-brain.

Fix: transfer departed MN skip-trace entries to an orphaned set with
10-minute TTL instead of immediate deletion. VMs in the orphaned set
remain skip-traced until the TTL expires or they are explicitly
continued, preventing false HA triggers during MN restart scenarios.

Resolves: ZSTAC-80821

Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup

Resolves: ZSTAC-80821

Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-77544

Change-Id: I1f711bff9c1e87a8cbf6a2eb310ca6086f0f99ba
Resolves: ZSTAC-80821

Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
zhong.xian and others added 16 commits April 16, 2026 15:12
Add virtiofs-based model mounting capability for user VMs:
- Add VmModelMountVO database table and schema with hostUuid tracking
- Add AttachVirtiofsCmd/DetachVirtiofsCmd for KVM agent
- Add MountModelCenterCmd with storageUrl field
- Add SDK actions for mount/unmount/query operations
- Add error codes 10138-10149

Resolves: ZSTAC-83157
Change-Id: I746679736f7a7176646e646d797969766f697a76
Resolves: ZSTAC-83157

Change-Id: I62696a6d667468766a6575656763707374757277
- add dedicated container image pull error code mapping

Resolves: ZSTAC-84175

Change-Id: Iabcdef1234567890abcdef1234567890abcdef12
Resolves: ZSTAC-83157

Change-Id: I756c7073707a6468676c70696a7a746870767777
<fix>[conf]: update global-error

See merge request zstackio/zstack!9557
Revert the flat userdata hostname fallback so VM creation and
reboot do not inject derived hostname metadata into userdata
payloads when no hostname is configured.

Add CreateRebootVmKeepUserdataContentCase to capture flat
userdata apply requests during VM create and reboot, and assert
the userdata list remains unchanged while hostname metadata stays
absent.

Resolves: ZSTAC-84529

Change-Id: I6f6e6b6b796e6e7a73727673626b7569696a6172
Expose requestCpu/requestMemory on the generated PodInventory so
SDK callers can deserialize the fields returned by ZQL.

Upgrade script adds matching columns on PodVO and backfills
existing pods with request=limit for legacy-equivalent behaviour.

Resolves: ZSTAC-80103

Change-Id: I5b2a9d4e8c6f3b7a1d0e9c5b4a8f2d7e6c1b3a9f
<feature>[dpu-bm2]: support attaching novlan and vxlan network to baremetal2 instance

See merge request zstackio/zstack!9530
fix(network): keep userdata content immutable

See merge request zstackio/zstack!9686
… redeploy

Resolves: ZSTAC-84446

Change-Id: I8af5e3887a5bad286b43dda00c874c9de999e1cb
<fix>[kvm]: unify TLS cert IP collection to avoid reconnect-triggered redeploy

See merge request zstackio/zstack!9683
Resolves: ZSTAC-84751

Change-Id: I84751b1c2d3e4f5a678901234567890abcdef1235
<fix>[aios]: add i18n mappings for fatal image pull error

See merge request zstackio/zstack!9670
<fix>[sdk,db]: add PodInventory request fields

See merge request zstackio/zstack!9682
…0165

Resolves: ZSTAC-84677
Related: ZSTAC-80103

Two overlapping problems are addressed in the ai module's global error
codes; both root from the same author reusing codes that were already
allocated for other semantics.

(1) ZSTAC-84677: AIModelManagerImpl model-service startup-timeout path
    passed the string literal "ORG_ZSTACK_AI_10144" to operr(). That
    code is owned by VmModelMountManager (mount path conflict, 5 %s
    template); the timeout only supplies 3 formatArgs, so the UI
    rendered a garbled "挂载失败:路径..." message for a health-check
    timeout.

(2) ZSTAC-80103: error codes 10138-10143 had two conflicting owners:
    - AIModelManagerApiInterceptor: CPU / memory request validation
      (this is the canonical semantics per the AI error-code spec)
    - VmModelMountManager: VM/model not-found, VM-state checks,
      cross-account mount, ModelCenter not-found
    The mount feature wrote zh_CN i18n templates with mount semantics,
    so when the CPU/memory validation actually fired, users saw the
    unrelated mount text with %s slots filled by CPU/memory args
    (e.g. "虚拟机「1」(UUID: 4) 必须处于运行状态才能挂载模型,当前状态: 4"
    for a CPU-below-min error).

This commit (zstack side):
- Add 7 new error-code constants in CloudOperationsErrorCode:
  10159 (model service startup timeout)
  10160-10165 (relocated mount-feature errors that previously squatted
  on 10138-10143)
- Rewrite globalErrorCodeMapping entries for 10138-10143 to the
  canonical CPU/memory validation semantics
- Add localized templates for 10159-10165 across all 10 locales
  (zh_CN, en_US, zh_TW, ja-JP, ko-KR, de-DE, fr-FR, ru-RU, id-ID,
  th-TH) so non-Chinese users see localized messages

The companion premium MR switches:
- AIModelManagerImpl startup-timeout call site to 10159
- VmModelMountManager 6 call sites from 10138-10143 to 10160-10165

Change-Id: I076f8da0b935a47cd4956e039e1e0b05206d085e
<fix>[sdk]: sync OSPF neighbor adjacency fields

See merge request zstackio/zstack!9737
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

代码变更总结

Walkthrough

本次PR更新了版本号至5.5.16,实现了多项核心功能增强:主存储容错初始化、虚拟机静态IP和DNS管理、L3网络IPAM改进、KVM主机TLS迁移支持、错误代码国际化、外部服务配置API、以及广泛的数据库schema更新。

Changes

Cohort / File(s) Summary
主存储和外部存储
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java, storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java, storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java, header/src/main/java/org/zstack/header/storage/addon/primary/ExternalPrimaryStorageInventory.java, plugin/expon/src/main/java/org/zstack/expon/*, plugin/zbs/src/main/java/org/zstack/storage/zbs/*, plugin/cbd/src/main/java/org/zstack/cbd/kvm/KvmCbdNodeServer.java
容错初始化、配置验证、备份存储选择改进、敏感信息脱敏、多协议支持和防御性拷贝。
虚拟机生命周期和配置
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java, compute/src/main/java/org/zstack/compute/vm/MigrateVmLongJob.java, compute/src/main/java/org/zstack/compute/vm/VmCapabilitiesJudger.java, header/src/main/java/org/zstack/header/vm/VmInstanceState.java, header/src/main/java/org/zstack/header/vm/VmInstanceConstant.java, header/src/main/java/org/zstack/header/vm/VmGlobalConfig.java
虚拟机迁移、状态转换、DNS支持、PMU配置和存储容量计算改进。
虚拟机静态IP和网络
compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java, compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java, header/src/main/java/org/zstack/header/vm/APIChangeVmNicNetworkMsg.java, header/src/main/java/org/zstack/header/vm/APISetVmStaticIpMsg.java, header/src/main/java/org/zstack/header/vm/SetVmStaticIpMsg.java, header/src/main/java/org/zstack/header/vm/ChangeVmNicNetworkMsg.java, header/src/main/java/org/zstack/header/vm/VmSystemTags.java
静态DNS标签、IPv4/IPv6冲突检测、API验证和系统标签管理。
L3网络和IP分配
network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java, network/src/main/java/org/zstack/network/l3/L3NetworkApiInterceptor.java, network/src/main/java/org/zstack/network/l3/L3NetworkManagerImpl.java, network/src/main/java/org/zstack/network/l3/NormalIpRangeFactory.java, network/src/main/java/org/zstack/network/l3/IpRangeHelper.java, network/src/main/java/org/zstack/network/service/DhcpExtension.java, network/src/main/java/org/zstack/network/service/NetworkServiceManager.java, header/src/main/java/org/zstack/header/network/l3/UsedIpVO.java, header/src/main/java/org/zstack/header/network/l3/AllocateIpMsg.java
IP范围验证、DHCP过滤、DNS解析、prefixLen支持和ipRangeUuid级联行为。
错误代码国际化
core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nService.java, core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java, core/src/main/java/org/zstack/core/errorcode/LocaleUtils.java, header/src/main/java/org/zstack/header/errorcode/ErrorCode.java, core/src/main/java/org/zstack/core/Platform.java, rest/src/main/java/org/zstack/rest/RestServer.java, core/src/main/java/org/zstack/core/rest/RESTApiController.java
多语言错误消息、Locale解析、消息格式化和REST API本地化。
外部服务配置
header/src/main/java/org/zstack/header/core/external/service/APIAdd/Update/Delete/QueryExternalServiceConfigurationMsg.java, header/src/main/java/org/zstack/header/core/external/service/APIAdd/Update/DeleteExternalServiceConfigurationEvent.java, header/src/main/java/org/zstack/header/core/external/service/ExternalServiceConfiguration*.java, header/src/main/java/org/zstack/header/core/external/service/ApplyExternalServiceConfiguration*.java, core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java, core/src/main/java/org/zstack/core/externalservice/ExternalService.java, externalservice/src/main/java/org/zstack/externalservice/cronjob/CronJobImpl.java
完整的CRUD API、消息模型、库存类和配置管理。
核心基础设施
core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java, core/src/main/java/org/zstack/core/db/DatabaseFacade.java, core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java, core/src/main/java/org/zstack/core/thread/AbstractCoalesceQueue.java, core/src/main/java/org/zstack/core/thread/CoalesceQueue.java, core/src/main/java/org/zstack/core/thread/ReturnValueCoalesceQueue.java
生命周期回调、线程同步、哈希环管理和请求合并。
任务进度和长任务
header/src/main/java/org/zstack/header/core/progress/TaskProgressInventory.java, header/src/main/java/org/zstack/header/core/progress/LongJobProgressDetail.java, header/src/main/java/org/zstack/header/core/progress/LongJobProgressDetailBuilder.java, header/src/main/java/org/zstack/header/longjob/LongJobProgressNotificationMessage.java, core/src/main/java/org/zstack/core/progress/ProgressReportService.java
进度跟踪、详情解析和通知消息。
KVM主机和迁移
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMHostDeployArguments.java, plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java, plugin/kvm/src/main/java/org/zstack/kvm/hypervisor/*
TLS迁移、SAN IP检测、虚拟机启动命令扩展、孤立虚拟机跟踪和OS版本规范化。
网络服务
plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatDhcpBackend.java, plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatEipBackend.java, plugin/flatNetworkProvider/src/main/java/org/zstack/network/service/flat/FlatUserdataBackend.java, plugin/eip/src/main/java/org/zstack/network/service/eip/EipApiInterceptor.java, plugin/eip/src/main/java/org/zstack/network/service/eip/EipManagerImpl.java, plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java, plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java, plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingApiInterceptor.java, plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java
DHCP合并、EIP验证、负载均衡器过滤和端口转发约束。
身份和安全
identity/src/main/java/org/zstack/identity/AuthorizationManager.java, identity/src/main/java/org/zstack/identity/QuotaUtil.java, compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java, compute/src/main/java/org/zstack/compute/allocator/QuotaAllocatorFlow.java, compute/src/main/java/org/zstack/compute/vm/VmQuotaOperator.java, header/src/main/java/org/zstack/header/allocator/AllocateHostMsg.java, header/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.java
配额检查、会话验证、账户跟踪和返回值变体。
数据库迁移和Schema
conf/db/upgrade/V5.5.16__schema.sql, conf/db/upgrade/beforeMigrate.sql
新表(DPU、模型、配置)、列扩展(prefixLen、resourceType)、索引和数据回填。
配置和工具
VERSION, .gitignore, conf/tools/install.sh, core/src/main/java/org/zstack/core/CoreGlobalProperty.java, core/src/main/java/org/zstack/core/ansible/AnsibleFacadeImpl.java, core/src/main/java/org/zstack/core/ansible/CallBackNetworkChecker.java, core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java, core/src/main/java/org/zstack/core/telemetry/TelemetryGlobalProperty.java
版本更新、Python 3.11迁移、Ansible升级和遥测配置。
标签和存储协议
header/src/main/java/org/zstack/header/tag/TagPatternVO.java, header/src/main/java/org/zstack/header/tag/TagPatternInventory.java, header/src/main/java/org/zstack/header/volume/VolumeProtocol.java, header/src/main/java/org/zstack/header/volume/VolumeProtocolCapability.java, header/src/main/java/org/zstack/header/cluster/APICreateClusterMsg.java, header/src/main/java/org/zstack/header/zql/BeforeCallZWatchExtensionPoint.java, header/src/main/java/org/zstack/header/zwatch/ResourceMetricBindingExtensionPoint.java
资源类型范围、RBD协议和指标绑定扩展。
插件和存储
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageMonBase.java, plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java, plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageAllocatorFactory.java, plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java, plugin/virtualRouterProvider/src/main/java/org/zstack/network/service/virtualrouter/VirtualRouterManagerImpl.java, plugin/xinfini/src/main/java/org/zstack/xinfini/XInfiniStorageFactory.java
缓存清理、Ceph监控连接、iSCSI自护栏和防御性拷贝。
文档
docs/modules/network/nav.adoc, docs/modules/network/pages/networkResource/l3Ipam.adoc, docs/modules/network/pages/networkResource/networkResource.adoc
L3 IPAM指南和导航更新。
测试用例
test/src/test/groovy/org/zstack/test/integration/configuration/systemTag/TagPatternResourceTypeCase.groovy, test/src/test/groovy/org/zstack/test/integration/core/ErrorCodeI18nCase.groovy, test/src/test/groovy/org/zstack/test/integration/core/taskqueue/CoalesceQueueCase.groovy, test/src/test/groovy/org/zstack/test/integration/kvm/vm/VmPmuConfigCase.groovy, test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovy, test/src/test/groovy/org/zstack/test/integration/network/l2network/AttachL2NetworkCase.groovy
标签范围、国际化、合并队列、PMU配置、TLS迁移和L2网络验证。
代码生成
rest/src/main/resources/scripts/GoInventory.groovy, rest/src/main/resources/scripts/GoTestTemplate.groovy, runMavenProfile
Go SDK测试生成器和Maven本地仓库支持。
虚拟机和主机分配
compute/src/main/java/org/zstack/compute/allocator/DesignatedHostAllocatorFlow.java, compute/src/main/java/org/zstack/compute/allocator/HostAllocatorManagerImpl.java, compute/src/main/java/org/zstack/compute/host/HostBase.java, compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java, compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java, compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java, header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java, header/src/main/java/org/zstack/header/identity/RBACInfo.java, header/src/main/java/org/zstack/header/core/external/service/RBACInfo.java
主机选择规范化、内存溢出处理、裸金属DPU支持和RBAC权限。
安装和虚拟机相关
plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmConstant.java, plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmFacadeImpl.java, plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmNicTO.java, network/src/main/java/org/zstack/network/l2/L2NetworkApiInterceptor.java, search/src/main/java/org/zstack/query/QueryFacadeImpl.java, storage/src/main/java/org/zstack/storage/primary/ImageCacheCleaner.java, portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java, core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java, core/src/main/java/org/zstack/core/log/LogSafeGson.java
管理节点VIP、IPv6前缀支持、虚拟Linux网桥验证、生命周期同步和空值保护。

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 五点五一六版本来,
IP分配更妥贴,TLS迁移真精彩,
国际化错误码,网络工程展宏才,
存储容错心里安,测试覆盖全周才!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/jin.ma/fix/ZSTAC-84817

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 729-753: Replace the Chinese comment block above the test method
testAddExternalPrimaryStorageWithMalformedJsonShouldRollback with an English
comment that explains: this test verifies AddExternalPrimaryStorage rolls back
when config contains malformed JSON (JSONObjectUtil.toObject throws), and no
ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
records remain to avoid dirty VO causing buildPsController failures and
QueryPrimaryStorage 503s; update any inline Chinese in the same comment region
to concise, correct English while keeping references to ExternalPrimaryStorageVO
and the failure expectations intact.
- Around line 733-755: The test only checks ExternalPrimaryStorageVO but misses
asserting rollback for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO;
capture counts for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO (using
Q.New(PrimaryStorageVO.class).count() and
Q.New(PrimaryStorageOutputProtocolRefVO.class).count()) before the failing
addExternalPrimaryStorage call, then after the expect(AssertionError.class)
block assert those counts are unchanged and also assert no rows exist with name
"zbs-bad-json" in PrimaryStorageVO and no protocol refs tied to that primary
storage in PrimaryStorageOutputProtocolRefVO, so failures don't leave residual
rows across those related tables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 1e13331a-5bab-4deb-bbc0-e3ac9ea06519

📥 Commits

Reviewing files that changed from the base of the PR and between cf48ab8 and f0dc8d3.

📒 Files selected for processing (2)
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy

huhu0316 and others added 3 commits April 27, 2026 18:28
Resolves: ZSTAC-84446

Change-Id: I9bed31c0cefddd6ed11f59cd13e36eb1c2abc029
<fix>[conf]: add ORG_ZSTACK_AI_10159 for model service startup timeout

See merge request zstackio/zstack!9734
<fix>[kvm]: decouple TLS cert detection from libvirtd restart toggle

See merge request zstackio/zstack!9741
@MatheMatrix MatheMatrix force-pushed the sync/jin.ma/fix/ZSTAC-84817 branch from f0dc8d3 to 04a51b2 Compare April 28, 2026 03:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)

80-92: ⚠️ Potential issue | 🟠 Major

缓存未命中时这里会返回 null

Line 88-89 把 Map.put() 的返回值赋给了 info,但 put() 返回的是旧值。这样首次从 DB 加载节点时,getNodeInfo() 实际会返回 null,而不是刚创建的 NodeInfo;当前改动里的调用方会因此落到异常分支,其他调用方也可能直接触发 NPE。

🔧 建议修复
         if (info == null) {
             ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class);
             if (vo == null) {
                 throw new ManagementNodeNotFoundException(nodeUuid);
             }

             nodeHash.add(nodeUuid);
-            info = nodes.put(nodeUuid, new NodeInfo(vo));
+            info = new NodeInfo(vo);
+            nodes.put(nodeUuid, info);
         }

         return info;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java`
around lines 80 - 92, getNodeInfo currently assigns info = nodes.put(nodeUuid,
new NodeInfo(vo)) which stores the new NodeInfo but returns the previous value
(null on first insert), causing getNodeInfo to return null; change the logic in
getNodeInfo so that after fetching ManagementNodeVO via dbf.findByUuid and
creating new NodeInfo(vo), you put that new NodeInfo into the nodes map and then
set info to that newly created NodeInfo (or retrieve it with
nodes.get(nodeUuid)), ensuring the method returns the actual NodeInfo instance;
ensure references to nodeHash.add(nodeUuid) remain correct and preserve throwing
ManagementNodeNotFoundException when vo is null.
conf/tools/install.sh (1)

143-151: ⚠️ Potential issue | 🟠 Major

升级路径下会继续复用旧的 zstack-dashboard 虚拟环境。

Line 145 这里只判断目录是否存在;如果主机上已经有旧的 Python 2 zstack-dashboard venv,这里不会重建 3.11 环境,后面的 activatepip install 就还会落到旧解释器里。前面的 zstack-cli/zstack-ctl/zstack-sys 已经统一走了 ensure_python3_venv(),这里也应该保持一致。

🛠️ 建议修改
 elif [ $tool = 'zstack-dashboard' ]; then
     UI_VIRENV_PATH=/var/lib/zstack/virtualenv/zstack-dashboard
     [ ! -z $force ] && rm -rf $UI_VIRENV_PATH
-    if [ ! -d "$UI_VIRENV_PATH" ]; then
-        python3.11 -m venv $UI_VIRENV_PATH
-        if [ $? -ne 0 ]; then
-            rm -rf $UI_VIRENV_PATH
-            exit 1
-        fi
-    fi
+    ensure_python3_venv "$UI_VIRENV_PATH"
     . $UI_VIRENV_PATH/bin/activate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/tools/install.sh` around lines 143 - 151, 当前逻辑只检查 UI_VIRENV_PATH
目录存在性,可能复用旧的 Python2 venv;请改为使用与其他组件一致的 ensure_python3_venv() 或在此处验证虚拟环境解释器版本:调用
$UI_VIRENV_PATH/bin/python(或 bin/python3)检查其版本是否为 Python 3.11,若不存在或版本不符则删除并重新创建
venv(referencing UI_VIRENV_PATH and ensure_python3_venv()); 保证后续 activate 与 pip
安装都使用新建的 Python 3.11 环境。
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

3182-3195: ⚠️ Potential issue | 🟠 Major

不要把迁移 TLS 绑定到 RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE

Line 3194-3195 这里会在 LIBVIRT_TLS_ENABLED=trueRECONNECT_HOST_RESTART_LIBVIRTD_SERVICE=false 时直接关闭迁移 TLS。后者是“重连时是否允许重启 libvirtd”的策略,不是“迁移链路是否加密”的开关;这样会悄悄改变已有集群行为,并把原本应走 TLS 的迁移降级成非 TLS。

💡 建议修改
-                        cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class)
-                                && rcf.getResourceConfigValue(KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE, self.getUuid(), Boolean.class));
+                        cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class));

As per coding guidelines "向后兼容原则 - 之前的代码产生的行为不要直接去改动,即使是 bug 也可能用户有依赖,不应直接改动,可以考虑升级用户一套行为、新安装用户一套行为,开关控制等".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3182 -
3195, The code incorrectly binds migration TLS to
KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE when setting
cmd.setUseTls in KVMHost.java; remove the dependency on
RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE and decide TLS solely from the
TLS-related config (e.g. KVMGlobalConfig.LIBVIRT_TLS_ENABLED) or a dedicated
migration-TLS config if one exists—update the setUseTls(...) call to use only
KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class) (or the proper
migration-specific resource/config via rcf.getResourceConfigValue if such a key
exists) and remove the rcf.getResourceConfigValue(..., self.getUuid(),
Boolean.class) check that references RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE.
network/src/main/java/org/zstack/network/l3/NormalIpRangeFactory.java (1)

73-95: ⚠️ Potential issue | 🟠 Major

回滚路径未恢复 UsedIpVO.ipRangeUuid,失败后会留下脏引用。

Line 73-95 会把 ipRangeUuid == nullUsedIpVO 绑定到新建 IP Range,但 Line 112-116 的回滚只删了 IpRangeVO,没有把这些 UsedIpVO 复原。后续流程失败时会产生不一致数据。

建议修复(记录并在 rollback 中恢复)
@@
-                    List<UsedIpVO> updateVos = new ArrayList<>();
+                    List<UsedIpVO> updateVos = new ArrayList<>();
+                    List<String> reboundUsedIpUuids = new ArrayList<>();
@@
                                 ipvo.setIpRangeUuid(vo.getUuid());
                                 updateVos.add(ipvo);
+                                reboundUsedIpUuids.add(ipvo.getUuid());
@@
                                 ipvo.setIpRangeUuid(vo.getUuid());
                                 updateVos.add(ipvo);
+                                reboundUsedIpUuids.add(ipvo.getUuid());
@@
                     if (!updateVos.isEmpty()) {
                         dbf.updateCollection(updateVos);
+                        List<String> allReboundUuids = (List<String>) data.computeIfAbsent("ReboundUsedIpUuids", k -> new ArrayList<String>());
+                        allReboundUuids.addAll(reboundUsedIpUuids);
                     }
@@
             public void rollback(FlowRollback trigger, Map data) {
+                List<String> reboundUsedIpUuids = (List<String>) data.get("ReboundUsedIpUuids");
+                if (reboundUsedIpUuids != null && !reboundUsedIpUuids.isEmpty()) {
+                    SQL.New("update UsedIpVO u set u.ipRangeUuid = null where u.uuid in (:uuids)")
+                            .param("uuids", reboundUsedIpUuids)
+                            .execute();
+                }
                 List<IpRangeVO> vos = (List<IpRangeVO>) data.get("IpRangeVO");
                 dbf.removeCollection(vos, IpRangeVO.class);
                 trigger.rollback();
             }

Also applies to: 112-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@network/src/main/java/org/zstack/network/l3/NormalIpRangeFactory.java` around
lines 73 - 95, The code in NormalIpRangeFactory updates UsedIpVO.ipRangeUuid for
matching records (see UsedIpVO, NetworkUtils/IPv6NetworkUtils checks and
dbf.updateCollection) but the rollback path only deletes the IpRangeVO and does
not restore those UsedIpVO entries, leaving dirty references; modify the create
path to record the list of modified UsedIpVO UUIDs and their previous
ipRangeUuid values (or at least mark which were changed), and in the rollback
handler (the block that currently deletes IpRangeVO) iterate those recorded
UUIDs to restore ipRangeUuid to null (or their prior value) via
dbf.updateCollection/DAO so the UsedIpVO state is reverted on failure.
🟡 Minor comments (10)
header/src/main/java/org/zstack/header/vm/VmInstanceState.java-174-175 (1)

174-175: ⚠️ Potential issue | 🟡 Minor

请确认注释中的工单号是否写错。

Line 174 注释为 ZSTAC-80898,与当前 PR 目标中的 ZSTAC-84817 不一致;如果不是同一问题,后续排障和追溯会被误导。建议改为正确编号或更通用的原因描述。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@header/src/main/java/org/zstack/header/vm/VmInstanceState.java` around lines
174 - 175, 注释中工单号可能写错:在 VmInstanceState 类的注释里将 “ZSTAC-80898” 与本 PR 目标
“ZSTAC-84817” 不一致;请打开 VmInstanceState
的注释并将工单号改为正确的编号(ZSTAC-84817)或替换为通用描述(例如“Expunging safety net — if expunge fails,
allow recovery”),以确保追溯无歧义并保留原意。
test/src/test/groovy/org/zstack/test/integration/configuration/systemTag/TagPatternResourceTypeCase.groovy-12-22 (1)

12-22: ⚠️ Potential issue | 🟡 Minor

类注释第 3 点与实际断言不一致,建议修正。

当前用例没有验证“在 prepareDbInitialValue() 后回填 resourceType = "ModelVO"”这一行为。建议补上对应测试步骤与断言,或删除/改写该条说明,避免误导。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/configuration/systemTag/TagPatternResourceTypeCase.groovy`
around lines 12 - 22, The class-level doc's third bullet claims old AI tags are
backfilled with resourceType = "ModelVO" on the next prepareDbInitialValue() run
but the test does not exercise or assert that; update the test or the comment:
either add a test step that calls prepareDbInitialValue() (or the helper that
triggers DB migration) and assert TagPatternVO.resourceType for upgraded tags
becomes "ModelVO" (use the existing test utilities and query methods in
TagPatternResourceTypeCase to locate the TagPatternVO), or remove/rewrite the
third bullet to reflect the actual assertions present; reference
prepareDbInitialValue(), TagPatternVO.resourceType and the test class
TagPatternResourceTypeCase.groovy when making the change.
search/src/main/java/org/zstack/query/QueryFacadeImpl.java-381-381 (1)

381-381: ⚠️ Potential issue | 🟡 Minor

建议保留完整异常上下文,避免仅返回 e.getMessage()

这里改成只拼接 e.getMessage() 后抛出,会丢失堆栈信息;线上查询失败时很难定位根因。建议至少在此处记录完整异常堆栈(同时保留当前对外错误语义)。

可选修复示例
         } catch (Exception e) {
-            throw new OperationFailureException(inerr("failed to query: %s", e.getMessage()));
+            logger.warn(String.format("failed to query for API[%s]", msg.getClass().getName()), e);
+            throw new OperationFailureException(inerr("failed to query: %s",
+                    Optional.ofNullable(e.getMessage()).orElse(e.getClass().getSimpleName())));
         }
As per coding guidelines "在必要时,应继续抛出异常,使上层业务处理者可以转换为用户友好的错误提示。"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@search/src/main/java/org/zstack/query/QueryFacadeImpl.java` at line 381, The
throw in QueryFacadeImpl that currently uses throw new
OperationFailureException(inerr("failed to query: %s", e.getMessage())) loses
the original stacktrace; change it to preserve the full exception context by
attaching the caught exception as the cause (or logging the full stacktrace)
when creating the OperationFailureException so the original Throwable e is
available for debugging; locate the throw site in QueryFacadeImpl (the block
catching the query exception) and pass e into the OperationFailureException
constructor or call processLogger/error with the full exception before
rethrowing.
core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java-67-79 (1)

67-79: ⚠️ Potential issue | 🟡 Minor

不要将 ERROR 级别降级为 WARN。

Line 67 和 Line 76 当前把 ERROR 记录为 warn,会影响日志分级检索和告警准确性。建议 ERROR -> logger.errorWARNING -> logger.warn

建议修改
-                                    case ERROR: logger.warn("[Sentry] " + formatted); break;
+                                    case ERROR: logger.error("[Sentry] " + formatted); break;
                                     case WARNING: logger.warn("[Sentry] " + formatted); break;
                                     default: logger.info("[Sentry] " + formatted); break;
-                                    case ERROR: logger.warn("[Sentry] " + message, throwable); break;
+                                    case ERROR: logger.error("[Sentry] " + message, throwable); break;
                                     case WARNING: logger.warn("[Sentry] " + message, throwable); break;
                                     default: logger.info("[Sentry] " + message, throwable); break;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` around
lines 67 - 79, In SentryInitHelper update the log mappings so ERROR is logged at
error level instead of warn: in both overridden methods log(SentryLevel level,
String formatted) and log(SentryLevel level, String message, Throwable
throwable) change the case for ERROR from logger.warn(...) to logger.error(...),
keep WARNING mapped to logger.warn(...) and the default to logger.info(...);
this preserves correct severity for Sentry ERROR events.
core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java-931-932 (1)

931-932: ⚠️ Potential issue | 🟡 Minor

日志文案语法有误,建议修正。

Line 931 当前 "not entity events will be fired" 语法不正确,建议改为 "no entity events will be fired",便于排障时准确理解。

建议修改
-            logger.warn(String.format("cannot find EntityInfo for the class[%s], not entity events will be fired",
+            logger.warn(String.format("cannot find EntityInfo for the class[%s], no entity events will be fired",
                     entity.getClass()));

As per coding guidelines "代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java` around lines
931 - 932, The logger.warn call in DatabaseFacadeImpl (the warning for missing
EntityInfo) contains a grammatical error: replace the message text in the
logger.warn(String.format(...)) that currently ends with "not entity events will
be fired" to read "no entity events will be fired" so the log becomes clear and
grammatically correct when EntityInfo for the class (entity.getClass()) is
missing.
plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java-42-44 (1)

42-44: ⚠️ Potential issue | 🟡 Minor

修正这里的告警日志占位内容。

日志文案写的是“fallback to mgmtIp”,但第二个 %s 实际传入的是 r.getExitErrorMessage()。线上排查时会把错误原因误看成回退 IP。

建议修改
-            logger.warn(String.format(
-                    "ssh-collect host IPs failed on host[uuid:%s], fallback to mgmtIp: %s",
-                    hostUuid, r.getExitErrorMessage()));
+            logger.warn(String.format(
+                    "ssh-collect host IPs failed on host[uuid:%s], reason: %s, fallback to managementIp: %s",
+                    hostUuid, r.getExitErrorMessage(), managementIp));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java` around lines 42 -
44, The log message in KVMHostUtils.java incorrectly labels the second %s as
"fallback to mgmtIp" while passing r.getExitErrorMessage(); update the
logger.warn call so the placeholder matches the actual argument—either pass the
mgmtIp variable if you meant to log the fallback IP, or change the text to
something like "ssh-collect host IPs failed on host[uuid:%s], error: %s" to
correctly show r.getExitErrorMessage(); ensure hostUuid and
r.getExitErrorMessage() (or hostUuid and mgmtIp if you choose that) are the two
arguments to String.format in the logger.warn invocation.
plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java-371-376 (1)

371-376: ⚠️ Potential issue | 🟡 Minor

Line 375 对空 IP 放行会返回不可绑定 NIC

这里建议把 nicIp == null 直接过滤掉,避免 attachable 列表出现后续无法成功绑定的候选项。

♻️ 建议修改
-                filtered = filtered.stream().filter(nic -> {
-                    String nicIp = nic.getIp();
-                    String nicL3 = nic.getL3NetworkUuid();
-                    return nicIp == null || IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3);
-                }).collect(Collectors.toList());
+                filtered = filtered.stream().filter(nic -> {
+                    String nicIp = nic.getIp();
+                    String nicL3 = nic.getL3NetworkUuid();
+                    return nicIp != null && IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3);
+                }).collect(Collectors.toList());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`
around lines 371 - 376, The current filter in PortForwardingManagerImpl that
builds the attachable NIC list lets through NICs with null IPs (nic.getIp() ==
null), which later cannot be bound; change the predicate to exclude null IPs
instead of allowing them by removing the nicIp == null branch so only NICs with
a non-null nic.getIp() that satisfy IpRangeHelper.isIpInL3NetworkCidr(nicIp,
nicL3) remain; update the stream/filter logic where "filtered =
filtered.stream().filter(nic -> { ... }).collect(...)" to drop null-IP NICs.
header/src/main/java/org/zstack/header/core/external/service/APIUpdateExternalServiceConfigurationMsgDoc_zh_cn.groovy-10-20 (1)

10-20: ⚠️ Potential issue | 🟡 Minor

补全这里的 API 文档描述。

desc 现在还是占位文本,而且 request 的描述也是空的;这会直接生成一份几乎不可用的接口文档。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/core/external/service/APIUpdateExternalServiceConfigurationMsgDoc_zh_cn.groovy`
around lines 10 - 20, Replace the placeholder desc and empty request description
with meaningful Chinese documentation: update the top-level desc to briefly
explain what APIUpdateExternalServiceConfigurationMsg does (e.g.,
更新外部服务的配置信息并返回更新结果), and populate rest.request.desc to describe the HTTP
method/URL semantics, required header (Authorization: OAuth the-session-uuid),
the path parameter {uuid}, and the request body fields as defined on
APIUpdateExternalServiceConfigurationMsg.class (list required vs optional
fields, expected formats and an example payload); ensure you keep the existing
url, header and clz entries but provide clear usage, validation rules, and a
short success/failure behavior summary.
compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java-409-425 (1)

409-425: ⚠️ Potential issue | 🟡 Minor

这里最好先 trim 并回写 DNS,再做校验。

dnsAddresses 是用户输入,当前实现直接校验原值;像 " 8.8.8.8 " 这种输入会被误判,后续写 system tag 时也拿不到干净值。既然这里已经是统一校验入口,建议先做 trim() 和回写。

💡 可参考的修正方向
     private void validateDnsAddresses(List<String> dnsAddresses) {
         if (dnsAddresses == null || dnsAddresses.isEmpty()) {
             return;
         }
 
         if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) {
@@
-        for (String dns : dnsAddresses) {
-            if (!NetworkUtils.isIpv4Address(dns) && !IPv6NetworkUtils.isIpv6Address(dns)) {
+        ListIterator<String> iterator = dnsAddresses.listIterator();
+        while (iterator.hasNext()) {
+            String dns = iterator.next();
+            String trimmedDns = dns == null ? null : dns.trim();
+            iterator.set(trimmedDns);
+            if (!NetworkUtils.isIpv4Address(trimmedDns) && !IPv6NetworkUtils.isIpv6Address(trimmedDns)) {
                 throw new ApiMessageInterceptionException(argerr(
-                        ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns));
+                        ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns));
             }
         }
     }

As per coding guidelines, “注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`
around lines 409 - 425, The validateDnsAddresses method should trim and
overwrite entries in the dnsAddresses list before any validation so inputs like
" 8.8.8.8 " are normalized; modify validateDnsAddresses to iterate dnsAddresses
(e.g., using index or ListIterator), replace each element with its trimmed value
(and remove empty strings if appropriate) prior to the size checks and calls to
NetworkUtils.isIpv4Address / IPv6NetworkUtils.isIpv6Address, and then throw
ApiMessageInterceptionException via argerr as currently done when a trimmed
value is invalid.
compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java-662-669 (1)

662-669: ⚠️ Potential issue | 🟡 Minor

StaticIpOperator 中对 IPv6 tag 编码方法的调用不一致。

IPv6NetworkUtils 工具类定义了两个方法:ipv6AddessToTagValue(行 376,拼写错误)和 ipv6AddressToTagValue(行 391,拼写正确)。StaticIpOperator 同文件中混用了两个版本:行 201、245 调用拼写错误的版本,而行 669 调用拼写正确的版本。应统一使用 ipv6AddressToTagValue(正确拼写)来保持一致性,符合编码规范中的英文拼写要求。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java` around
lines 662 - 669, In StaticIpOperator, replace all uses of the misspelled
IPv6NetworkUtils.ipv6AddessToTagValue with the correctly spelled
IPv6NetworkUtils.ipv6AddressToTagValue to ensure consistent encoding of IPv6 tag
values; specifically update the calls at the locations where
ipv6AddessToTagValue is referenced so they match the existing correct call used
near the IPv6_GATEWAY instantiation and remove any lingering references to the
misspelled symbol.
🧹 Nitpick comments (21)
header/src/main/java/org/zstack/header/identity/RBACInfo.java (1)

15-15: 请确认 APIQueryAccountMsg 改为管理员专属是否为预期(存在兼容性风险)

Line 15APIQueryAccountMsg 放入 adminOnlyAPIs 会直接收紧现有 API 权限,可能导致非管理员查询流程/自动化脚本回归。若这不是明确的权限策略变更,建议撤回该行;若是预期变更,请补充对应测试与变更说明。

可选修正(若非预期权限收紧)
-                        APIQueryAccountMsg.class,

As per coding guidelines, “向后兼容原则:之前的代码产生的行为不要直接去改动…应当对原状态做好记录,做好二次确认,做好回退准备。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@header/src/main/java/org/zstack/header/identity/RBACInfo.java` at line 15, 将
APIQueryAccountMsg 加入 adminOnlyAPIs(在 RBACInfo 中)会收紧该 API
权限并可能破坏向后兼容;如果这不是有意的权限变更,请从 adminOnlyAPIs 中移除 APIQueryAccountMsg
的条目以恢复原行为;如果这是预期的策略改动,保留 APIQueryAccountMsg 并补充回归/权限测试、在变更说明中记录该兼容性破坏(并在
RBACInfo 处明确注明变更理由),同时准备回退步骤以便必要时撤回该修改。
compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java (1)

105-106: 建议消除新增魔法值,改为命名常量再组装跳过集合

这里新增了两个超管类型字符串字面量,建议提取为独立常量(或复用已有类型常量)后再组装集合,后续扩展和排错更稳。

♻️ 可选改法
-    private static final List<String> SKIP_ARCH_CHECK_HYPERVISOR_TYPES = Arrays.asList("baremetal2", "baremetal2Dpu");
+    private static final String HYPERVISOR_TYPE_BAREMETAL2 = "baremetal2";
+    private static final String HYPERVISOR_TYPE_BAREMETAL2_DPU = "baremetal2Dpu";
+    private static final Set<String> SKIP_ARCH_CHECK_HYPERVISOR_TYPES = new HashSet<>(
+            Arrays.asList(HYPERVISOR_TYPE_BAREMETAL2, HYPERVISOR_TYPE_BAREMETAL2_DPU)
+    );

As per coding guidelines, “避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为枚举或常量。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java` around
lines 105 - 106, Replace the two inline hypervisor string literals with named
constants and use those constants to build SKIP_ARCH_CHECK_HYPERVISOR_TYPES;
specifically, declare descriptive constants (e.g., BAREMETAL2_HYPERVISOR_TYPE
and BAREMETAL2_DPU_HYPERVISOR_TYPE or reuse existing type constants), then
initialize SKIP_ARCH_CHECK_HYPERVISOR_TYPES via
Arrays.asList(BAREMETAL2_HYPERVISOR_TYPE, BAREMETAL2_DPU_HYPERVISOR_TYPE) so the
magic values are centralized for easier reuse and debugging (refer to
SKIP_ARCH_CHECK_HYPERVISOR_TYPES to locate the usage).
header/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.java (1)

169-170: 建议在 setAccountUuid 内统一做 trim 归一化。

该值来自 Message,若携带首尾空白,后续按 UUID 做比较/查询时可能出现隐性不一致。建议在 setter 一处收敛处理。

建议修改
 public void setAccountUuid(String accountUuid) {
-    this.accountUuid = accountUuid;
+    this.accountUuid = accountUuid == null ? null : accountUuid.trim();
 }

As per coding guidelines: 注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@header/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.java`
around lines 169 - 170, The setter setAccountUuid in HostAllocatorSpec should
normalize incoming values by trimming whitespace to avoid mismatches when later
comparing or querying UUIDs; update setAccountUuid(String accountUuid) to assign
a null-safe trimmed value (e.g., set this.accountUuid = accountUuid == null ?
null : accountUuid.trim()) so leading/trailing spaces or newlines from Message
inputs are removed consistently.
compute/src/main/java/org/zstack/compute/host/HostBase.java (1)

1445-1445: 建议使用稳定标识替代 "unknown" 作为兜底去重键。

空指针保护是正确的;但统一回退到 "unknown" 可能让不同主机在极端场景下共用同一个去重键。建议回退到实例级稳定标识(如 id)以保持去重粒度。

可选修改示例
-                return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid());
+                return String.format("connect-host-%s", self == null ? id : self.getUuid());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/host/HostBase.java` at line 1445,
The fallback literal "unknown" in the connect-host key can cause different hosts
to share the same dedupe key; update the return in HostBase that builds the
connect key so the fallback uses a stable instance-level identifier instead of
the string "unknown" (e.g. use self.getId() or this.id when available) while
preserving the existing null check around self and still formatting as
"connect-host-%s"; adjust the ternary or surrounding logic in the method that
contains the String.format(...) to return "connect-host-" + (self == null ?
<instance-stable-id> : self.getUuid()) so each host has a unique fallback key.
plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java (1)

1181-1185: 这里可以顺手合并成一次判断。

Line 1181 和 Line 1184 的条件主体与异常信息完全相同,只是分别判断 ingress/egress 数量;合成一次 toCreateIngressRuleCount > 0 || toCreateEgressRuleCount > 0 的检查会更直观,也能减少后续漏改风险。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java`
around lines 1181 - 1185, Combine the two identical priority checks in
SecurityGroupApiInterceptor into a single if that tests msg.getPriority()
against
SecurityGroupGlobalConfig.SECURITY_GROUP_RULES_NUM_LIMIT.value(Integer.class)
and also checks (toCreateIngressRuleCount > 0 || toCreateEgressRuleCount > 0);
if true, throw the ApiMessageInterceptionException once (use the existing error
message and one of the existing error codes) so the logic references
msg.getPriority(),
SecurityGroupGlobalConfig.SECURITY_GROUP_RULES_NUM_LIMIT.value(Integer.class),
toCreateIngressRuleCount and toCreateEgressRuleCount and prevents duplicated
conditions.
test/src/test/groovy/org/zstack/test/integration/network/l2network/AttachL2NetworkCase.groovy (1)

107-135: 建议补齐 VLAN 空白字符串用例,并收紧失败断言。

当前 VLAN 只覆盖了“不传 physicalInterface”,建议再补一个空白字符串(如 " """)场景;另外如果测试框架支持,建议断言具体错误码 ORG_ZSTACK_NETWORK_L2_10021,避免被非目标异常“误通过”。

🧪 可补充的最小用例
         // creating L2 Vlan network without physicalInterface should fail when vSwitchType is LinuxBridge
         expect(AssertionError.class) {
             createL2VlanNetwork {
                 name = "test-vlan-no-physical-interface"
                 zoneUuid = zone.uuid
                 vlan = 100
             }
         }
+
+        // creating L2 Vlan network with blank physicalInterface should also fail
+        expect(AssertionError.class) {
+            createL2VlanNetwork {
+                name = "test-vlan-blank-physical-interface"
+                zoneUuid = zone.uuid
+                vlan = 101
+                physicalInterface = "   "
+            }
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/network/l2network/AttachL2NetworkCase.groovy`
around lines 107 - 135, Add a new VLAN test case in
testCreateL2NetworkWithoutPhysicalInterface to cover empty-string
physicalInterface (e.g., physicalInterface = "" or "   ") when calling
createL2VlanNetwork, and change the broad expect(AssertionError.class) checks
for createL2NoVlanNetwork/createL2VlanNetwork to assert the specific error code
ORG_ZSTACK_NETWORK_L2_10021 (or the framework's way to match error codes) so the
test only fails for the intended validation error.
plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (2)

247-248: 999 提取成有语义的常量

999 这里是新的策略参数,但从上下文看不出它表示什么边界或配额。建议提取成命名常量,避免后续修改时误用。

As per coding guidelines, "避免使用魔法值(Magic Value)。"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 247 - 248, The magic value 999 passed to ExternalHostIdGetter(...)
is unclear; replace it with a well-named constant (e.g., NEW_POLICY_PARAM_LIMIT
or DEFAULT_HOST_ID_STRATEGY) declared near the class level in KvmIscsiNodeServer
(or a related constants holder) and use that constant in the call to
ExternalHostIdGetter.getOrAllocateHostIdRef(param.getHostUuid(),
param.getPrimaryStorage().getUuid()); ensure the constant name documents what
the value represents and add a brief comment describing the boundary/quota
meaning.

240-243: 只读取 hostId 字段即可

这里后续只用到了 hostId,没必要把整条 ExternalPrimaryStorageHostRefVO 实体加载出来。建议把查询收窄到单列,回退分支再从新分配的 ref 里取值即可,能少一次不必要的实体装载。

As per coding guidelines, "仅查询实际需要的列,避免加载整个实体。"

Also applies to: 257-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`
around lines 240 - 243, The code currently loads the full
ExternalPrimaryStorageHostRefVO via Q.New(...).eq(...).find() even though only
hostId is used; change the query to select only the hostId column (use
ExternalPrimaryStorageHostRefVO_.hostId) and fetch the scalar value (e.g., via
the Q select/findValue API) or map it to a lightweight ref object, then use that
hostId; apply the same change to the second occurrence around
ExternalPrimaryStorageHostRefVO usage at the other location.
core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java (1)

58-58: 避免在 Line 58 使用魔法值 1000

建议将该值改为可配置项(例如新增 Telemetry.sentryMaxQueueSize),避免后续调优需要改代码发布。

As per coding guidelines,直接使用未经定义的数值属于 Magic Value,应替换为常量或配置项。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` at line
58, Line uses magic value 1000 in options.setMaxQueueSize(1000); update
SentryInitHelper to replace the literal with a configurable property (e.g.,
Telemetry.sentryMaxQueueSize) or a constant: add a new Telemetry configuration
entry (sentryMaxQueueSize) with a sensible default and read it in
SentryInitHelper to pass into options.setMaxQueueSize(...); ensure
validation/defaulting when the config is absent and reference the Telemetry
property name in logs/comments.
core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java (1)

911-915: uninstallEntityLifeCycleCallback 对未知实体类静默忽略,容易掩盖调用错误。

Line 911-915 若 clz 不存在直接 no-op,而 installEntityLifeCycleCallback 是断言失败;建议保持一致,至少 Assertwarn 一次,避免回调泄漏问题被吞掉。

建议修改
     public void uninstallEntityLifeCycleCallback(Class clz, EntityEvent evt, EntityLifeCycleCallback cb) {
         if (clz != null) {
             EntityInfo info = entityInfoMap.get(clz);
-            if (info != null) {
-                info.uninstallLifeCycleCallback(evt, cb);
-            }
+            DebugUtils.Assert(info != null, String.format("cannot find EntityInfo for the class[%s]", clz));
+            info.uninstallLifeCycleCallback(evt, cb);
         } else {
             for (EntityInfo info : entityInfoMap.values()) {
                 info.uninstallLifeCycleCallback(evt, cb);
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java` around lines
911 - 915, In DatabaseFacadeImpl, make uninstallEntityLifeCycleCallback mirror
installEntityLifeCycleCallback’s error handling: when clz is null or
entityInfoMap.get(clz) returns null, do not silently no-op—either assert or at
minimum log a warning including the evt, cb and clz class name to surface
misuse; then only call info.uninstallLifeCycleCallback(evt, cb) when info is
non-null (EntityInfo and entityInfoMap are the referenced symbols).
core/src/main/java/org/zstack/core/db/DatabaseFacade.java (1)

88-88: 为新增接口方法补充 Javadoc,明确行为边界。

Line 88 新增了公共接口能力,但缺少方法注释;建议至少说明 entityClass == null 时是否作用于全部实体、以及未命中回调时的行为。

建议修改
     void installEntityLifeCycleCallback(Class entityClass, EntityEvent evt, EntityLifeCycleCallback cb);

+    /**
+     * Unregister an entity lifecycle callback.
+     * `@param` entityClass target entity class; if null, unregister from all entity classes
+     * `@param` evt lifecycle event
+     * `@param` cb callback instance to remove
+     */
     void uninstallEntityLifeCycleCallback(Class entityClass, EntityEvent evt, EntityLifeCycleCallback cb);

As per coding guidelines "接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/db/DatabaseFacade.java` at line 88, Add a
Javadoc to the DatabaseFacade#uninstallEntityLifeCycleCallback method that
documents its exact behavior: state that this interface method has no extra
access modifier (do not add `public`), explain that passing entityClass == null
means the uninstall applies to all entity types, specify whether evt can be null
and how it is treated (e.g., match all events or require non-null), and describe
the no-op behavior when the specified callback cb is not found (e.g., silently
return without error). Also document thread-safety expectations and any
return/exception behavior so callers know the boundary cases.
test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovy (1)

146-173: try/finally 恢复 GlobalConfig,避免场景间串扰。

这些场景都在同一个 env.create {} 里串行跑;如果中间某个断言提前失败,后面的场景会继承错误的全局配置,排查时会出现级联失败。

Also applies to: 178-203, 212-236, 241-272

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovy`
around lines 146 - 173, The test modifies global config keys
(KVMGlobalConfig.LIBVIRT_TLS_ENABLED and
KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE) but does not restore
them on failure; wrap the config changes in each test (e.g.,
testMigrateWithTlsEnabled) with a try/finally: save the original values, update
to the desired values before the scenario, run the test body, and always restore
the originals in the finally block so other scenarios in the same env.create
block don’t inherit mutated globals; apply the same try/finally restore pattern
to the other TLS-related tests in this file referenced in the comment.
plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java (1)

140-146: 避免把三个布尔开关直接暴露在公共方法签名里。

调用点很容易把 allowRestartLibvirtdisNewAdded 传反;后面如果再加条件,这个签名会继续变得难读。更稳妥的是收敛成一个小的上下文对象,或者把“是否允许重启”抽成更明确的枚举/策略。

As per coding guidelines, “避免使用布尔型参数造成含义不明确。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java` around lines 140 -
146, The public method KVMHostUtils.shouldForceTlsRedeploy exposes three
ambiguous boolean flags; refactor its signature to accept a small context object
(e.g., TlsRedeployContext) or a clear enum/strategy instead of
allowRestartLibvirtd and isNewAdded so callers cannot accidentally swap flags
and future conditions remain readable; update the method to read from
context.isNewlyAdded(), context.allowsLibvirtdRestart() (or a RedeployPolicy
enum) and change callers to construct that context/enum before calling
shouldForceTlsRedeploy.
plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java (1)

1058-1063: 把这段 NIC CIDR 过滤抽成单一 helper。

同样的过滤逻辑现在在两个 candidate API 里各写了一遍。前一处修完后,这里很容易继续保留旧语义,最终让两个候选接口返回不一致的结果。抽成一个公共 helper 后,后续只需要维护一处。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java`
around lines 1058 - 1063, Extract the NIC CIDR filtering logic used in
LoadBalancerBase into a single reusable helper method to avoid duplicate logic
across candidate APIs: create a private/static method (e.g., filterNicsByL3Cidr
or isNicInL3Cidr) referenced from LoadBalancerBase that encapsulates the stream
filter currently applied to "nics" (the lambda using nic.getIp(),
nic.getL3NetworkUuid() and IpRangeHelper.isIpInL3NetworkCidr), replace the
inline filter with a call to this helper, and update the other candidate API to
call the same helper so both candidate-returning paths share identical behavior.
network/src/main/java/org/zstack/network/service/NetworkServiceManager.java (1)

21-29: 这个接口名和注释会误导调用方。

getVmNicDns 看起来像是按 vmNicUuid 查询,但签名实际是 vmUuid + l3NetworkUuid,Javadoc 也写成了 “for a VM NIC”。这是 public interface,后续实现和调用都很容易按错语义。建议至少把注释改成 “for a VM on the specified L3 network”;如果现在还没被广泛依赖,方法名也最好一起收敛到与签名一致。 As per coding guidelines, "命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@network/src/main/java/org/zstack/network/service/NetworkServiceManager.java`
around lines 21 - 29, The method getVmNicDns(vmUuid, l3NetworkUuid) and its
Javadoc are misleading because the name implies a vmNicUuid but the signature
takes a VM UUID and an L3 network UUID; rename the method to reflect the actual
inputs (for example getVmDnsForL3Network or getVmDnsForVmOnL3Network) and update
the Javadoc to read something like “Get DNS servers for a VM on the specified L3
network. Priority: VM NIC system tag > L3 Network DNS”; then refactor all
implementations and callers of getVmNicDns to the new method name and adjust any
unit tests and docs accordingly to keep behavior unchanged.
plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmNicTO.java (1)

38-50: 不要在 ApplianceVm 的 IPv6 NIC 上兜底成 SLAAC

ApplianceVm 的 NIC 按约束始终应绑定到 ipRangeUuid。这里新增的 prefixLen + SLAAC 兜底会把本该暴露的数据异常静默转成一个看似合法但可能错误的 addressMode;一旦 NormalIpRangeVO 查不到,agent 侧拿到的就是错误的 IPv6 配置。更稳妥的是继续把 NormalIpRangeVO 作为 addressMode 的唯一真值源,ipRangeUuid == null 时直接显式失败。

Based on learnings In VirtualRouterManagerImpl.java (plugin/virtualRouterProvider/src/main/java/org/zstack/network/service/virtualrouter/VirtualRouterManagerImpl.java), ApplianceVMs (including virtual router VMs) are by design only allowed to have IPs within an IP range (ipRangeUuid is always non-null for appliance VM NICs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmNicTO.java`
around lines 38 - 50, The code currently defaults addressMode to
IPv6Constants.SLAAC when uip.getPrefixLen() is present, which can silently
produce incorrect IPv6 configs for ApplianceVm NICs; change ApplianceVmNicTO
handling so that ipRangeUuid is the authoritative source: remove the fallback
that sets addressMode = IPv6Constants.SLAAC in the uip.getPrefixLen() branch,
only set prefixLength/addressMode from NormalIpRangeVO when uip.getIpRangeUuid()
is non-null (use ipRangeVO.getPrefixLen() and ipRangeVO.getAddressMode()), and
if uip.getIpRangeUuid() is null return/throw an explicit error (fail fast)
instead of silently defaulting; reference symbols: ApplianceVmNicTO, uip,
prefixLength, addressMode, IPv6Constants.SLAAC, NormalIpRangeVO, ipRangeUuid.
header/src/main/java/org/zstack/header/core/external/service/APIQueryExternalServiceConfigurationMsg.java (1)

16-18: 查询路径建议改成复数资源名。

这是新 API,/external/service/configuration 现在还是单数形式,建议趁还没稳定前改成复数资源路径,避免后续对外兼容成本。

As per coding guidelines, path: 针对资源使用复数形式。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/core/external/service/APIQueryExternalServiceConfigurationMsg.java`
around lines 16 - 18, The REST path uses singular resource names; update the
RestRequest annotation on the APIQueryExternalServiceConfigurationMsg class to
use plural resource names (e.g. change path="/external/service/configuration" to
a plural form like "/external/services/configurations" and update optionalPaths
accordingly such as "/external/services/configurations/{uuid}") so the API
follows the plural-noun resource convention; ensure only the annotation values
are changed and any references to the old path in tests or routing are updated
to the new plural paths.
header/src/main/java/org/zstack/header/core/progress/TaskProgressInventory.java (1)

21-22: 建议在 TaskProgressInventory(TaskProgressVO vo) 里同步填充 progressDetail

现在这个新字段只能依赖调用方额外 setProgressDetail(),而同类的 VO 映射字段已经在构造函数里统一完成了。这样会让直接走该构造函数的路径始终拿不到 progressDetail,同一个 inventory 在不同构造路径下输出不一致。

♻️ 可选修改
     public TaskProgressInventory(TaskProgressVO vo) {
         taskUuid = vo.getTaskUuid();
         parentUuid = vo.getParentUuid();
         type = vo.getType().toString();
         if (vo.getOpaque() != null) {
             opaque = JSONObjectUtil.toObject(vo.getOpaque(), LinkedHashMap.class);
         }
         taskName = vo.getTaskName();
         time = vo.getTime();
         arguments = vo.getArguments();
+        progressDetail = LongJobProgressDetailBuilder.fromTaskProgressVO(vo);
     }

Also applies to: 111-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/core/progress/TaskProgressInventory.java`
around lines 21 - 22, The TaskProgressInventory(TaskProgressVO vo) constructor
currently doesn't populate the new progressDetail field, forcing callers to call
setProgressDetail() separately; update that constructor to parse vo.getOpaque()
(handling nulls and parse errors) into a LongJobProgressDetail instance and
assign it to the progressDetail field so inventories built from VO contain the
same typed detail as other constructors; use the same parsing utility or factory
used elsewhere for LongJobProgressDetail (or catch/log JSON/parse exceptions and
leave progressDetail null) and apply the same change to the other VO-based
mapping code paths noted around the lines referenced (the existing
setProgressDetail()/progressDetail references).
plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java (1)

660-682: 批量 NIC 校验里有明显的 N+1 查库。

这两段都是先按 nicUuid 逐个 findByUuid(),再访问 getUsedIps();批量加后端时会把拦截器路径放大成多轮数据库往返。更稳妥的做法是一次性按 vmNicUuid 查询 UsedIpVO,再按 NIC 分组做 CIDR 校验。

As per coding guidelines "禁止循环里套查询,避免嵌套查询带来的性能问题。"

Also applies to: 1614-1637

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java`
around lines 660 - 682, The interceptor LoadBalancerApiInterceptor performs an
N+1 DB lookup by calling dbf.findByUuid for each nicUuid and then accessing
VmNicVO.getUsedIps(); fix it by batching the UsedIpVO fetch for all
msg.getVmNicUuids at once (query UsedIpVO where vmNicUuid IN (...)), group the
results by vmNicUuid, then iterate the grouped UsedIpVOs to run the existing
IpRangeHelper.isIpOutsideL3NetworkCidr checks against lbVO (vipUuid/ipv6VipUuid)
and throw the same ApiMessageInterceptionException when needed; replace per-nic
dbf.findByUuid + getUsedIps logic with this batched query and grouping to avoid
looped DB calls.
core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java (1)

102-106: 默认 locale 不要再硬编码成 en_US

getTemplate()localizeErrorCode() 现在各自维护了一套默认 locale 来源;后面只要有一边改了,回退行为就会悄悄分叉。这里直接复用 LocaleUtils.DEFAULT_LOCALE 会更稳。

建议修改
-        // fallback to en_US
-        if (!"en_US".equals(locale)) {
-            Map<String, String> enMessages = localeMessages.get("en_US");
+        // fallback to default locale
+        if (!LocaleUtils.DEFAULT_LOCALE.equals(locale)) {
+            Map<String, String> enMessages = localeMessages.get(LocaleUtils.DEFAULT_LOCALE);
             if (enMessages != null) {
                 return enMessages.get(globalErrorCode);
             }
         }

As per coding guidelines 避免使用魔法值(Magic Value)。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java`
around lines 102 - 106, The fallback hard-coded "en_US" literal should be
replaced with the shared default locale constant to avoid divergent behavior;
update GlobalErrorCodeI18nServiceImpl (specifically the localizeErrorCode()
fallback) to use LocaleUtils.DEFAULT_LOCALE instead of "en_US" so getTemplate()
and localizeErrorCode() rely on the same default source and eliminate the magic
value.
test/src/test/groovy/org/zstack/test/integration/core/ErrorCodeI18nCase.groovy (1)

146-180: 补一条“已加载映射”的断言会更稳。

这几条 case 都是直接 new GlobalErrorCodeI18nServiceImpl(),没有走 start(),所以目前只覆盖到了 message 的兜底路径。建议再补一条初始化后的用例,至少命中一次真实的 globalErrorCode -> localized template 映射,并顺手覆盖 ErrorCodeList.causes 的递归分支。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/src/test/groovy/org/zstack/test/integration/core/ErrorCodeI18nCase.groovy`
around lines 146 - 180, 补充一个初始化后的测试用例:new 一个 GlobalErrorCodeI18nServiceImpl 后调用
start() 以加载映射,然后构造至少一个 ErrorCode(例如 code 在全局映射中存在)并给它设置 causes(递归链上的
ErrorCode),调用 localizeErrorCode(mid, "en_US"),断言 mid.getMessage() 返回映射到的本地化模板而不是
description/details,同时断言链上每个 cause 的 getMessage()
都按映射或兜底规则被正确填充;参考符号:GlobalErrorCodeI18nServiceImpl.start(),
GlobalErrorCodeI18nServiceImpl.localizeErrorCode(),
ErrorCode.setCause()/getMessage() 和 ErrorCodeList.causes。

Comment on lines +260 to 269
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH),
AnsibleConstant.ROOT_DIR);
} else if (AnsibleGlobalProperty.DEBUG_MODE) {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
} else {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'",
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

在 Ansible 官方文档里,ansible/ansible-playbook-B 选项是什么意思?它是否能替代原本放置 playbook 路径的位置参数?

💡 Result:

在 Ansible 官方文档中,ansible 和 ansible-playbook 的 -B 选项含义不同。 对于 ansible 命令(ad hoc 模式),-B 或 --background SECONDS 表示异步运行操作,并在指定秒数后失败(默认 N/A)。它与 -P(轮询间隔)一起使用,用于异步任务。 对于 ansible-playbook,官方文档中没有 -B 选项。-b 或 --become 用于以 become(特权提升)运行操作,不暗示密码提示。 -B 不能替代 playbook 路径的位置参数。playbook 路径是 ansible-playbook 的必需位置参数,而 -B 是 ansible 的异步选项,不适用于 playbook 执行,且两者语法不同。

Citations:


移除 -B 选项,playBookPath 应作为位置参数传递。

Ansible 官方文档确认:-Bansible 命令(ad hoc 模式)的异步超时参数,ansible-playbook 命令中不存在该选项。playbook 路径是 ansible-playbook 的必需位置参数。第 260、264、268 行将 playBookPath 传递给 -B %s 导致语法错误——playbook 参数丢失,所有三个分支都会在运行时失败。应移除 -B,直接将 playbook 路径作为可执行文件后的位置参数。

修改方案
-                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
+                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
                                 AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH),
                                 AnsibleConstant.ROOT_DIR);
                     } else if (AnsibleGlobalProperty.DEBUG_MODE) {
-                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
+                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
                                 AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
                                 AnsibleConstant.ROOT_DIR);
                     } else {
-                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'",
+                        output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'",
                                 AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
                                 AnsibleConstant.ROOT_DIR);
                     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH),
AnsibleConstant.ROOT_DIR);
} else if (AnsibleGlobalProperty.DEBUG_MODE) {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
} else {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'",
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH),
AnsibleConstant.ROOT_DIR);
} else if (AnsibleGlobalProperty.DEBUG_MODE) {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
} else {
output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/ansible/AnsibleFacadeImpl.java` around
lines 260 - 269, In AnsibleFacadeImpl update the three ShellUtils.run format
strings to stop using the invalid "-B %s" option and instead pass playBookPath
as a positional argument to the ansible-playbook command; specifically remove
the "-B %s" token from the format templates (including the LOG_PATH, DEBUG_MODE
and default branches) and place %s for playBookPath immediately after the
executable variable in each String.format call, adjusting the format arguments
order so JSONObjectUtil.dumpPretty(arguments) and other params remain correct.

Comment on lines +111 to +125
} catch (Throwable t) {
logger.warn(String.format("[%s] failed to calculate result for item %s", getName(), req.item), t);
req.notifyFailure(operr(ORG_ZSTACK_CORE_THREAD_10003, "failed to calculate result: %s", t.getMessage()));
}
}
cleanup(syncSignature);
chain.next();
}

protected final void handleFailure(String syncSignature, List<PendingRequest> requests, ErrorCode errorCode, SyncTaskChain chain) {
for (PendingRequest req : requests) {
req.notifyFailure(errorCode);
}
cleanup(syncSignature);
chain.next();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

单个 completion.fail() 抛异常会把整条 chain 卡死。

handleFailure() 以及 handleSuccess() 的降级分支里,req.notifyFailure(...) 没有隔离。任一回调抛出后,后续请求不会继续通知,cleanup(syncSignature)chain.next() 也执行不到。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/zstack/core/thread/AbstractCoalesceQueue.java` around
lines 111 - 125, The failure is that a thrown exception from any
req.notifyFailure / req.notifySuccess can abort the loop and prevent
cleanup(syncSignature) and chain.next() from running; update methods
handleFailure and handleSuccess (and any downgrade branches where
PendingRequest.notifyFailure/notifySuccess or completion.fail is called) to wrap
each per-request callback invocation in its own try/catch so a single bad
callback cannot stop notifying remaining requests, and ensure
cleanup(syncSignature) and chain.next() are executed afterwards (use a finally
or guarantee after the loop) so the sync chain always advances.

Comment on lines +501 to +507
// createIfAbsent: DB has no record → write; DB has record → return DB value
JsonLabelInventory caInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca);
JsonLabelInventory keyInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key);

// Use DB as source of truth — overwrite local files (HA: MN2 uses MN1's CA from DB)
FileUtils.writeStringToFile(caFile, caInv.getLabelValue());
FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CA 与私钥分两次 createIfAbsent() 会在多 MN 并发启动时写出错配对。

libvirtTLSCAlibvirtTLSPrivateKey 是两个独立键,而且这里分别调用两次 createIfAbsent()。如果两个管理节点同时首次初始化,完全可能出现“CA 来自 MN1、私钥来自 MN2”的交叉写入,随后所有节点都会把这组不匹配的 PEM 覆盖到本地,TLS 会直接失效。这里需要把 CA/私钥作为一个原子对象持久化,或至少放进同一个事务/同一条 label 里。

一个更安全的方向
-    private static final String LIBVIRT_TLS_CA_KEY = "libvirtTLSCA";
-    private static final String LIBVIRT_TLS_PRIVATE_KEY = "libvirtTLSPrivateKey";
+    private static final String LIBVIRT_TLS_CA_BUNDLE_KEY = "libvirtTLSCABundle";

-    JsonLabelInventory caInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca);
-    JsonLabelInventory keyInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key);
-    FileUtils.writeStringToFile(caFile, caInv.getLabelValue());
-    FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue());
+    String bundleValue = JSONObjectUtil.toJsonString(new LibvirtTlsCaBundle(ca, key));
+    JsonLabelInventory bundleInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_BUNDLE_KEY, bundleValue);
+    LibvirtTlsCaBundle bundle = JSONObjectUtil.toObject(bundleInv.getLabelValue(), LibvirtTlsCaBundle.class);
+    FileUtils.writeStringToFile(caFile, bundle.ca);
+    FileUtils.writeStringToFile(keyFile, bundle.privateKey);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 501
- 507, The two separate createIfAbsent calls for LIBVIRT_TLS_CA_KEY and
LIBVIRT_TLS_PRIVATE_KEY in KVMHostFactory risk cross-writes under concurrent MN
startup; instead persist CA+private-key atomically by writing a single label
(e.g., combine ca and key into one JSON/PEM payload) via one
JsonLabel.createIfAbsent call or perform both writes inside a single DB
transaction/synchronized upsert so both values are inserted or read together;
then read that single label and split to caFile and keyFile (replace references
to separate LIBVIRT_TLS_CA_KEY and LIBVIRT_TLS_PRIVATE_KEY usages with the
combined label access).

@MatheMatrix MatheMatrix force-pushed the sync/jin.ma/fix/ZSTAC-84817 branch from 04a51b2 to 29ef261 Compare April 28, 2026 06:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`:
- Around line 183-190: The catch block after calling saveControllerIfNeed(vo)
can leave a half-initialized state because saveControllerIfNeed() inserts into
controllers before fully creating nodes; update the error handling so that on
any Throwable you remove the partially inserted entries for that UUID from both
controllers and nodes (use vo.getUuid()) to avoid skipping future retries, or
preferably refactor saveControllerIfNeed() to fully construct controller and
node services off-map and only put them into the controllers and nodes maps in
one atomic step once both are successfully created; modify
ExternalPrimaryStorageFactory's catch to perform the cleanup if you choose the
minimal fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: d80f4d12-31b4-4ae0-8d9c-eb9818881cc8

📥 Commits

Reviewing files that changed from the base of the PR and between 04a51b2 and 29ef261.

📒 Files selected for processing (1)
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java

Comment on lines +183 to +190
if (controllers.containsKey(vo.getUuid())) {
continue;
}
try {
saveControllerIfNeed(vo);
} catch (Throwable t) {
logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable",
vo.getUuid(), vo.getIdentity()), t);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

失败分支需要清理半初始化的 controllers / nodes 状态。

saveControllerIfNeed() 会先把对象放进 controllers,再去构建/写入 nodes。如果异常发生在这两步之间,这里的 catch 只记录日志,不回滚内存状态;而 Line 183 下次又会因为 controllers.containsKey() 直接跳过重建,最终把这个 PS 卡在“有 controller、无 nodeSvc”的半初始化状态。

建议至少在这里同步移除两个 map 中的该 UUID;更稳妥的是把 saveControllerIfNeed() 改成先完整构建两个 svc,再一次性发布到 map。

💡 最小修复示例
         try {
             saveControllerIfNeed(vo);
         } catch (Throwable t) {
+            controllers.remove(vo.getUuid());
+            nodes.remove(vo.getUuid());
             logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable",
                     vo.getUuid(), vo.getIdentity()), t);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`
around lines 183 - 190, The catch block after calling saveControllerIfNeed(vo)
can leave a half-initialized state because saveControllerIfNeed() inserts into
controllers before fully creating nodes; update the error handling so that on
any Throwable you remove the partially inserted entries for that UUID from both
controllers and nodes (use vo.getUuid()) to avoid skipping future retries, or
preferably refactor saveControllerIfNeed() to fully construct controller and
node services off-map and only put them into the controllers and nodes maps in
one atomic step once both are successfully created; modify
ExternalPrimaryStorageFactory's catch to perform the cleanup if you choose the
minimal fix.

@MatheMatrix MatheMatrix force-pushed the sync/jin.ma/fix/ZSTAC-84817 branch from 29ef261 to 6a403bf Compare April 28, 2026 08:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java (1)

183-190: ⚠️ Potential issue | 🟠 Major

启动期失败分支仍会留下半初始化的 controller 状态。

saveControllerIfNeed() 先写 controllers,再补 nodes。如果在这两步之间抛异常,Line 183-185 的新 guard 之后会一直 continue,这个 UUID 就不会再重建成功了。这里至少要在 catch 里同步清掉两个 map;更稳妥的是把两个 svc 都构造完成后再一次性发布到 map。

💡 最小修复示例
         try {
             saveControllerIfNeed(vo);
         } catch (Throwable t) {
+            controllers.remove(vo.getUuid());
+            nodes.remove(vo.getUuid());
             logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable",
                     vo.getUuid(), vo.getIdentity()), t);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`
around lines 183 - 190, The current saveControllerIfNeed() flow can leave a
partially-initialized controller in the controllers map if an exception occurs
between writing controllers and nodes, causing the early guards
(controllers.containsKey(vo.getUuid())) to permanently skip reconstruction; fix
by making publication atomic: change the logic in ExternalPrimaryStorageFactory
so you fully construct the controller and its node(s) off-map and only put them
into the controllers and nodes maps together in one step, or if you prefer a
minimal fix, ensure the catch block removes any partial entries from both
controllers and nodes for vo.getUuid() before continuing; reference
saveControllerIfNeed(), controllers, nodes and the catch block around the try in
ExternalPrimaryStorageFactory.
🧹 Nitpick comments (1)
search/src/main/java/org/zstack/zql/ZQL.java (1)

120-124: 考虑捕获 Exception 而非 Throwable

当前捕获 Throwable 会同时捕获 Error 类型(如 OutOfMemoryErrorStackOverflowError),这些严重错误通常应当向上传播而非被静默处理。建议改为捕获 Exception 以保持对严重系统错误的敏感性。

♻️ 建议修改
-            } catch (Throwable t) {
+            } catch (Exception e) {
                 logger.warn(String.format("skipping a VO that failed inventory conversion in query[target:%s, vo:%s]; the rest of the result is unaffected",
                         astResult != null && astResult.inventoryMetadata != null ? astResult.inventoryMetadata.fullInventoryName() : "?",
-                        identifyVoForLog(vo)), t);
+                        identifyVoForLog(vo)), e);
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@search/src/main/java/org/zstack/zql/ZQL.java` around lines 120 - 124, The
catch currently swallows all Throwables; change the catch clause in ZQL (the
block that handles inventory conversion failures) from "catch (Throwable t)" to
"catch (Exception e)" and update the logger call to use the new exception
variable (e) so only non-fatal exceptions are logged/handled while Errors can
propagate. Ensure the surrounding code (the try that contains
identifyVoForLog(vo) and astResult usage) remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`:
- Around line 183-190: The current saveControllerIfNeed() flow can leave a
partially-initialized controller in the controllers map if an exception occurs
between writing controllers and nodes, causing the early guards
(controllers.containsKey(vo.getUuid())) to permanently skip reconstruction; fix
by making publication atomic: change the logic in ExternalPrimaryStorageFactory
so you fully construct the controller and its node(s) off-map and only put them
into the controllers and nodes maps together in one step, or if you prefer a
minimal fix, ensure the catch block removes any partial entries from both
controllers and nodes for vo.getUuid() before continuing; reference
saveControllerIfNeed(), controllers, nodes and the catch block around the try in
ExternalPrimaryStorageFactory.

---

Nitpick comments:
In `@search/src/main/java/org/zstack/zql/ZQL.java`:
- Around line 120-124: The catch currently swallows all Throwables; change the
catch clause in ZQL (the block that handles inventory conversion failures) from
"catch (Throwable t)" to "catch (Exception e)" and update the logger call to use
the new exception variable (e) so only non-fatal exceptions are logged/handled
while Errors can propagate. Ensure the surrounding code (the try that contains
identifyVoForLog(vo) and astResult usage) remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: b2ff67e7-c3d9-446e-8080-769ae7c05512

📥 Commits

Reviewing files that changed from the base of the PR and between 29ef261 and 6a403bf.

📒 Files selected for processing (3)
  • search/src/main/java/org/zstack/zql/ZQL.java
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java

… marshal time

Per ZSTAC-84817, malformed JSON in APIAddExternalPrimaryStorage.config slipped
past the API boundary and surfaced as SYS.1000 InternalError from a deep Gson
parse, after the message had already reached the cloud-bus handler.

Fix:

* PrimaryStorageApiInterceptor now validates msg.getConfig() with org.json
  (strict — rejects unquoted keys / missing colons that Gson lenient mode
  silently tolerates) and throws ApiMessageInterceptionException with
  ORG_ZSTACK_STORAGE_PRIMARY_10053 if it isn't a valid JSON object.
* conf/serviceConfig/externalPrimaryStorage.xml wires
  PrimaryStorageApiInterceptor at service level so the interceptor actually
  runs for APIAddExternalPrimaryStorageMsg (the existing wiring in
  primaryStorage.xml does not cover messages declared in this file).
* CloudOperationsErrorCode.ORG_ZSTACK_STORAGE_PRIMARY_10053 constant added.

Test: ZbsPrimaryStorageCase gains
testAddExternalPrimaryStorageWithMalformedJsonRejectedByInterceptor — sends
"{this is not valid json" via API, asserts the call fails AND no
PrimaryStorageVO/ExternalPrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
row leaks. Verified locally: Tests run: 1, Failures: 0, Errors: 0,
Time elapsed: 92.788 sec, BUILD SUCCESS.

Resolves: ZSTAC-84817

Change-Id: Icf6c648133d7866edf35940d56a28f74f4c64817
@MatheMatrix MatheMatrix force-pushed the sync/jin.ma/fix/ZSTAC-84817 branch from 6a403bf to f8e039a Compare April 28, 2026 08:52
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java (1)

91-102: 建议在 Line [92]-Line [97] 先标准化 config 并回写消息体。

当前只在判空时用 trim();建议统一 trimmsg.setConfig(config),减少后续处理链对首尾空白的重复处理与歧义。

♻️ 建议修改
 private void validate(APIAddExternalPrimaryStorageMsg msg) {
     String config = msg.getConfig();
-    if (config == null || config.trim().isEmpty()) {
+    if (config == null) {
+        return;
+    }
+    config = config.trim();
+    if (config.isEmpty()) {
         return;
     }
+    msg.setConfig(config);
     try {
         new JSONObject(config);
     } catch (JSONException e) {
         throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_STORAGE_PRIMARY_10053,
                 "config is not a valid JSON object: %s", e.getMessage()));
     }
 }

As per coding guidelines **/*Interceptor.java: 注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java`
around lines 91 - 102, In validate(APIAddExternalPrimaryStorageMsg msg) trim and
normalize the incoming config before further checks: obtain config, if non-null
then do config = config.trim(); call msg.setConfig(config) to write the
normalized value back into the message, then proceed with the existing empty
check and JSON parse (in the validate method of PrimaryStorageApiInterceptor) so
downstream handlers see the trimmed config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java`:
- Around line 91-102: In validate(APIAddExternalPrimaryStorageMsg msg) trim and
normalize the incoming config before further checks: obtain config, if non-null
then do config = config.trim(); call msg.setConfig(config) to write the
normalized value back into the message, then proceed with the existing empty
check and JSON parse (in the validate method of PrimaryStorageApiInterceptor) so
downstream handlers see the trimmed config.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: b80f17df-d3b3-42e6-90af-a6110ea2696e

📥 Commits

Reviewing files that changed from the base of the PR and between 6a403bf and f8e039a.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/externalPrimaryStorage.xml is excluded by !**/*.xml
📒 Files selected for processing (1)
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.