Skip to content

fix(openclaw): prevent duplicate viewer instances on gateway restart#1477

Open
fancyboi999 wants to merge 2 commits intoMemTensor:mainfrom
fancyboi999:fix/1471-prevent-duplicate-instances
Open

fix(openclaw): prevent duplicate viewer instances on gateway restart#1477
fancyboi999 wants to merge 2 commits intoMemTensor:mainfrom
fancyboi999:fix/1471-prevent-duplicate-instances

Conversation

@fancyboi999
Copy link
Copy Markdown
Contributor

Description

OpenClaw 在 gateway restart 或 deferred reload 时会重新调用插件的 register(),但旧的 viewer HTTP server 没有关闭。每次重注册都会泄漏一个新的 server 到下一个端口(18799 → 18800 → 18801……),#1471 里的日志就是这样。

问题出在 serviceStartedviewer 都是 register() 闭包里的局部变量。第二次调 register() 会创建新闭包,旧闭包里的 viewer 就没人管了。

改法:在模块层面用 Map 追踪活跃实例,以 stateDir 为 key。register() 进来先查有没有同 stateDir 的旧实例,有的话在 startServiceCore() 里先 await 把旧 viewer 关掉(等端口真正释放),再启动新的。

ViewerServer.stop() 改成返回 Promise,这样调用方能等 server.close() 回调完成,不会出现端口还没释放就尝试 rebind 的竞态。

用 stateDir 做 key 而不是全局单例,是因为集成测试里会用不同 stateDir 同时跑多个 register(),全局单例会误杀别人的实例。

Related Issue: Fixes #1471

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Unit Test
  • Test Script Or Test Steps (please provide)

跑了三层验证:

  1. npm run build — TypeScript 编译通过
  2. npx vitest run — 182 个既有测试全绿(排除 main 上已有的 2 个 flaky test)
  3. 新增 e2e-duplicate-instance.test.ts — 用真实 HTTP server 验证:找一个空闲端口,对同一 stateDir 调两次 register() + start(),检查:
    • 旧 viewer 被检测并停止
    • 新 viewer 绑到同一个端口(不是 port+1)
    • 不存在泄漏的第二个 server
    • stop 后端口完全释放

Checklist

  • I have performed a self-review of my own code | 我已自行检查了自己的代码
  • I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释
  • I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常
  • I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR(如果适用)

When OpenClaw calls register() again (deferred reload / gateway restart),
the old viewer HTTP server was not stopped. Each re-registration leaked
a new server on an incrementing port (18799 → 18800 → 18801…).

Root cause: serviceStarted and viewer were closure-scoped inside
register(), so a second call created a fresh closure with no knowledge
of the previous one.

Fix: track active instances at module level, keyed by stateDir.  On
re-registration with the same stateDir, startServiceCore() now awaits
the previous viewer's port release before binding the new one.

Also makes ViewerServer.stop() return a Promise so the caller can
properly wait for the HTTP server to close.

Closes MemTensor#1471
Copilot AI review requested due to automatic review settings April 13, 2026 09:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a resource leak where repeated register() calls (e.g., gateway restart / deferred reload) would leave the previous Memory Viewer HTTP server running, causing port increments and multiple viewer instances.

Changes:

  • Add a module-level activeInstances map keyed by stateDir to detect and replace existing instances on re-registration.
  • Make ViewerServer.stop() async so callers can await server.close() and avoid rebind races.
  • Add an end-to-end test to verify re-registration stops the previous viewer and reuses the same port.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
apps/memos-local-openclaw/index.ts Tracks active instances per stateDir, stops previous instance during startup, awaits viewer shutdown on service stop.
apps/memos-local-openclaw/src/viewer/server.ts Changes stop() to return a Promise and resolves when the HTTP server is fully closed.
apps/memos-local-openclaw/tests/shutdown-lifecycle.test.ts Updates mock viewer to match new async stop() signature.
apps/memos-local-openclaw/tests/e2e-duplicate-instance.test.ts New integration test ensuring duplicate viewer instances aren’t leaked across re-registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add stale-instance guard in self-start setTimeout: skip if this
  instance was already replaced by a newer register() call.
- ViewerServer.stop(): add 3s timeout fallback and Node 18.0.x compat
  (closeIdleConnections + unref when closeAllConnections is missing).
- Use explicit error formatting in cleanup catch block.
- Bind error handler before listen() in test helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: memOS一直启动多个实例

2 participants