Skip to content

[ZSTAC-86445] Backport ZSTAC-85281 to 5.4.10#4380

Open
MatheMatrix wants to merge 4 commits into
5.4.10from
sync/shixin.ruan/shixin-ZSTAC-86445
Open

[ZSTAC-86445] Backport ZSTAC-85281 to 5.4.10#4380
MatheMatrix wants to merge 4 commits into
5.4.10from
sync/shixin.ruan/shixin-ZSTAC-86445

Conversation

@MatheMatrix

Copy link
Copy Markdown
Owner

Clone Jira: ZSTAC-86445

Original Jira: ZSTAC-85281

Source fix MR: zstack!9929

Cherry-picked commit(s): 8939994, e44d41a

Target branch: 5.4.10

Validation: cherry-pick completed on 5.4.10 and git diff --check passed. Full regression was not run locally.

sync from gitlab !10340

ruansteve and others added 2 commits June 30, 2026 17:22
Change-Id: If07dce918c4d7d9f62445184ff5984e30b83ce64
(cherry picked from commit 8939994)
Trigger TF port sync from TF controller ping success instead of MN startup. Keep the original one day interval and port cleanup algorithm, while scoping sync to the current controller.

Resolves: ZSTAC-85281

Change-Id: I84a88a34135cd47fc79b3019d2f0fc2953f06ac9
(cherry picked from commit e44d41a)
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 42 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b55fe662-f24e-4045-9483-962eefa1709e

📥 Commits

Reviewing files that changed from the base of the PR and between cb15bc2 and 9a49ced.

📒 Files selected for processing (2)
  • plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java
  • test/src/test/groovy/org/zstack/test/integration/network/sdnController/SugonSdnControllerCase.groovy

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Could not fetch remote config from http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml: TimeoutError: The operation was aborted due to timeout
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

TfZstackPortSync 从基于 ManagementNodeReadyExtensionPoint 的周期任务改为由 SdnControllerPingMsg 按控制器 UUID 显式触发,引入时间节流与并发去重;TfPortClient/TfPortService 新增基于 UUID 的重载,集成测试验证 Ping 触发同步行为。

Changes

Ping 触发端口同步重构

Layer / File(s) Summary
TfZstackPortSync 重构为按需触发
plugin/.../network/TfZstackPortSync.java
移除 ManagementNodeReadyExtensionPoint 实现;新增 triggerSyncIfDue(sdnControllerUuid),通过 lastSyncTime/runningSyncs 实现 1 天节流与并发去重;SyncPortPeriodicTask 改为 Task<Void>getPortToDelete 改为基于 sdnControllerUuid 查询且返回空集合替代 null
TfPortClient 与 TfPortService 按 UUID 重载
plugin/.../neutronClient/TfPortClient.java, plugin/.../network/TfPortService.java
TfPortClient 新增接受 sdnControllerUuid 的构造器,校验 vendorType 后调用 initTfPortService 新增 getTfPortsDetail(uuid)deleteTfPort(uuid, portUuid) 重载。
SugonSdnController Ping 消息处理
plugin/.../controller/SugonSdnController.java
handleMessage 新增 SdnControllerPingMsg 分支,handMessage 通过 client.getDomain() 验证控制器默认域,成功时调用 triggerSyncIfDue,失败或异常时返回错误回复;注入 TfZstackPortSync 字段。
集成测试:Ping 触发同步验证
test/.../sdnController/SugonSdnControllerCase.groovy
新增 testTfPortSyncTriggeredByPingSuccess,注册 TF_CREATE_VMI 模拟回调统计触发次数,通过 sendTfPing 发送 Ping 消息,断言同步仅触发一次且不重复触发。

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant SugonSdnController
  participant TfZstackPortSync
  participant TfPortService
  Client->>SugonSdnController: SdnControllerPingMsg
  SugonSdnController->>SugonSdnController: client.getDomain() 验证默认域
  SugonSdnController->>TfZstackPortSync: triggerSyncIfDue(sdnControllerUuid)
  TfZstackPortSync->>TfZstackPortSync: 检查 lastSyncTime & runningSyncs
  TfZstackPortSync->>TfPortService: getTfPortsDetail(sdnControllerUuid)
  TfPortService-->>TfZstackPortSync: VMI 列表
  TfZstackPortSync->>TfPortService: deleteTfPort(sdnControllerUuid, portUuid)
  TfZstackPortSync->>TfZstackPortSync: finally 清理 runningSyncs
  SugonSdnController-->>Client: SdnControllerPingReply(success)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Ping 一声,兔子醒来跑!
不再守钟按时同步跳,
按 UUID 节流又去重,
域存在才触发,安全稳又好。
测试断言次数一,绝不多扰。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确描述了将 ZSTAC-85281 回迁到 5.4.10 的主要变更。
Description check ✅ Passed 描述与回迁、cherry-pick 和目标分支信息一致,和实际变更相关。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/shixin.ruan/shixin-ZSTAC-86445

Comment @coderabbitai help to get the list of available commands.

Remove DHCP message handlers and generated error code reference
unavailable on 5.4.10.

Change-Id: I1a15fb1eeeab60ca9c6fa239904bda89839b64e5

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java (1)

123-126: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

把新增重载里的 UUID 变量名改成 lowerCamelCase。

portUUidtfPortUUid 的大小写不符合新增 Java 代码的命名要求,建议改为 portUuid / tfPortUuid

建议调整
-    public TfPortResponse deleteTfPort(String sdnControllerUuid, String portUUid) {
-        String tfPortUUid = StringDSL.transToTfUuid(portUUid);
+    public TfPortResponse deleteTfPort(String sdnControllerUuid, String portUuid) {
+        String tfPortUuid = StringDSL.transToTfUuid(portUuid);
         TfPortClient tfPortClient = new TfPortClient(sdnControllerUuid);
-        return tfPortClient.deletePort(tfPortUUid);
+        return tfPortClient.deletePort(tfPortUuid);
     }

As per path instructions, “方法名、参数名、成员变量和局部变量:使用 lowerCamelCase 风格。”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java`
around lines 123 - 126, In TfPortService.deleteTfPort, rename the UUID
parameters and locals to lowerCamelCase to match the Java naming convention:
change the method argument and the translated UUID variable from the current
mixed-case forms to portUuid and tfPortUuid, and update the deletePort call
accordingly. Keep the change confined to deleteTfPort and any directly related
references so the overload remains consistent with the rest of the class.

Source: Path instructions

plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java (1)

65-71: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

handMessage 重命名为表达明确的处理方法。

handMessage 像是 handleMessage 的拼写遗漏,建议改为 handlePingMessage,避免后续维护时混淆。

建议重命名
-            handMessage((SdnControllerPingMsg) msg);
+            handlePingMessage((SdnControllerPingMsg) msg);
@@
-    void handMessage(SdnControllerPingMsg msg) {
+    void handlePingMessage(SdnControllerPingMsg msg) {

As per path instructions,命名应尽量用完整的单词组合表达意图。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java`
around lines 65 - 71, `handMessage` in `SugonSdnController` looks like a typo
and should be renamed to a clearer, intent-revealing method name. Update the
call site in the message dispatch path and rename the method itself to
`handlePingMessage` so the `SdnControllerPingMsg` handling is explicit and
consistent with the rest of the controller’s message-handling flow.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java`:
- Around line 63-69: The new SdnController message entrypoint in
SugonSdnController.handleMessage needs the same MessageSafe protection as other
ZStack handlers so unexpected exceptions are converted into standard error
replies. Add `@MessageSafe` to the handleMessage(SdnControllerMessage) method,
keeping the existing SdnControllerPingMsg dispatch and
bus.dealWithUnknownMessage fallback unchanged.

In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfZstackPortSync.java`:
- Around line 42-64: `TfZstackPortSync.triggerSyncIfDue` is recording
`lastSyncTime` before `SyncPort` actually succeeds, and the current
read-check-add sequence around `lastSyncTime`/`runningSyncs` is not atomic.
Update the throttling so the sync timestamp is written only after a successful
`SyncPort.call` path, and make failures in `getPortToDelete`/`call` leave the
entry eligible for retry instead of being treated as a completed sync. Also
guard the `lastSyncTime.get(...)` plus `runningSyncs.add(...)` decision with a
single atomic section (or equivalent concurrent primitive) so concurrent
triggers cannot bypass `runningSyncs` and submit duplicate syncs.

In
`@test/src/test/groovy/org/zstack/test/integration/network/sdnController/SugonSdnControllerCase.groovy`:
- Around line 86-93: 第二次 Ping 仍可能落在第一次异步同步未结束的窗口里,导致测试只验证了 runningSyncs
去重而不是一天节流逻辑。请在 SugonSdnControllerCase 的 sendTfPing / retryInSecs
这段测试里,把等待第一次同步完成的逻辑放到第二次 ping 之前,或增加一个能确认 SyncPort 已结束的确定性等待点,再发起第二次
sendTfPing;这样才能确保后续断言真正覆盖节流行为。

---

Nitpick comments:
In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java`:
- Around line 65-71: `handMessage` in `SugonSdnController` looks like a typo and
should be renamed to a clearer, intent-revealing method name. Update the call
site in the message dispatch path and rename the method itself to
`handlePingMessage` so the `SdnControllerPingMsg` handling is explicit and
consistent with the rest of the controller’s message-handling flow.

In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java`:
- Around line 123-126: In TfPortService.deleteTfPort, rename the UUID parameters
and locals to lowerCamelCase to match the Java naming convention: change the
method argument and the translated UUID variable from the current mixed-case
forms to portUuid and tfPortUuid, and update the deletePort call accordingly.
Keep the change confined to deleteTfPort and any directly related references so
the overload remains consistent with the rest of the class.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9140d990-ee78-427e-856b-6faca6c106b6

📥 Commits

Reviewing files that changed from the base of the PR and between 784c703 and cb15bc2.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/sugonSdnController.xml is excluded by !**/*.xml
📒 Files selected for processing (5)
  • plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java
  • plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/neutronClient/TfPortClient.java
  • plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfPortService.java
  • plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfZstackPortSync.java
  • test/src/test/groovy/org/zstack/test/integration/network/sdnController/SugonSdnControllerCase.groovy

Comment on lines +42 to 64
public void triggerSyncIfDue(String sdnControllerUuid) {
if (sdnControllerUuid == null) {
logger.warn("Port_Sync_Task: skip sync because sdn controller uuid is null.");
return;
}
trackerThread = thdf.submitPeriodicTask(new SyncPort());
}

private class SyncPort implements PeriodicTask {
long now = System.currentTimeMillis();
Long lastSync = lastSyncTime.get(sdnControllerUuid);
if (lastSync != null && now - lastSync < SYNC_INTERVAL_MILLIS) {
return;
}
if (!runningSyncs.add(sdnControllerUuid)) {
return;
}

@Override
public TimeUnit getTimeUnit() {
return TimeUnit.DAYS;
lastSyncTime.put(sdnControllerUuid, now);
try {
thdf.submit(new SyncPort(sdnControllerUuid));
} catch (RuntimeException e) {
lastSyncTime.remove(sdnControllerUuid);
runningSyncs.remove(sdnControllerUuid);
throw e;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

修正同步节流状态的竞态和失败标记。

当前 lastSyncTime 在任务成功前写入;而 getPortToDelete/call 会吞掉异常,导致一次失败同步也会被 Line 50 节流一天。另外,Line 49 的读取和 Line 57 的写入不是原子操作,并发 Ping 下可能绕过 runningSyncs 重复提交。

建议调整
-    public void triggerSyncIfDue(String sdnControllerUuid) {
+    public synchronized void triggerSyncIfDue(String sdnControllerUuid) {
         if (sdnControllerUuid == null) {
             logger.warn("Port_Sync_Task: skip sync because sdn controller uuid is null.");
             return;
         }
 
@@
         if (!runningSyncs.add(sdnControllerUuid)) {
             return;
         }
 
-        lastSyncTime.put(sdnControllerUuid, now);
         try {
             thdf.submit(new SyncPort(sdnControllerUuid));
         } catch (RuntimeException e) {
-            lastSyncTime.remove(sdnControllerUuid);
             runningSyncs.remove(sdnControllerUuid);
             throw e;
         }
@@
             } catch (Exception e) {
                 logger.error(String.format("Port_Sync_Task: Fetch tf VirtualMachineInterface failed: %s.", e));
-                return new HashSet<>();
+                throw new RuntimeException("Fetch tf VirtualMachineInterface failed", e);
             }
@@
         public Void call() {
             logger.info("Port_Sync_Task: begin.");
+            boolean syncSucceeded = false;
             try {
                 HashSet<String> portsToDelete = getPortToDelete();
                 int maxDeleteCount = MAX_DELETE_COUNT;
                 for (String portUuid: portsToDelete) {
@@
                         break;
                     }
                 }
+                syncSucceeded = true;
             } catch (Exception e) {
                 logger.error(String.format("Port_Sync_Task failed: %s.", e));
             } finally {
+                if (syncSucceeded) {
+                    lastSyncTime.put(sdnControllerUuid, System.currentTimeMillis());
+                }
                 runningSyncs.remove(sdnControllerUuid);
             }
             return null;
         }

Also applies to: 104-139

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/network/TfZstackPortSync.java`
around lines 42 - 64, `TfZstackPortSync.triggerSyncIfDue` is recording
`lastSyncTime` before `SyncPort` actually succeeds, and the current
read-check-add sequence around `lastSyncTime`/`runningSyncs` is not atomic.
Update the throttling so the sync timestamp is written only after a successful
`SyncPort.call` path, and make failures in `getPortToDelete`/`call` leave the
entry eligible for retry instead of being treated as a completed sync. Also
guard the `lastSyncTime.get(...)` plus `runningSyncs.add(...)` decision with a
single atomic section (or equivalent concurrent primitive) so concurrent
triggers cannot bypass `runningSyncs` and submit duplicate syncs.

Add MessageSafe to the SDN message entrypoint. Move the second ping
after the first async sync window in the focused test.

Resolves: ZSTAC-86445

Change-Id: I4d7748a0de3b5d6a052a47966d2265249b2779e2
@zstack-robot-2

Copy link
Copy Markdown
Collaborator

Comment on plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnController.java:

Comment from shixin.ruan:

已处理。

Defect class: MessageSafe / message error contract.

Root cause: 这次 backport 在 SugonSdnController.handleMessage 新增了 SdnControllerPingMsg 处理入口,但没有补 @MessageSafe,异常路径不符合 ZStack message handler 的标准保护语义。

Fix: commit 9a49ced343SugonSdnController.handleMessage 增加 @MessageSafe

Verification:

  • mvn install -DskipTests -Djacoco.skip=true -pl plugin/sugonSdnController,testlib -am -rf :sugonSdnController: PASS
  • mvn test-compile -Djacoco.skip=true -DskipTests -Dtest=SugonSdnControllerCase in test/: PASS

Remaining risk: 完整 case 在本地会因测试框架初始化阶段 JMX ThreadFacade 重复注册退出,未跑到业务断言;该问题和本次改动路径无关。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants