Skip to content

[memos-local-openclaw] embedLocal leaks native ONNX memory per call; old-space fills until OOM in ~15-30s #1863

@wbbbanan

Description

@wbbbanan

Symptom

With EMBEDDING_PROVIDER=local and EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2, the OpenClaw gateway crashes every ~15-30 seconds under normal conversation load (a Claude/agent turn every few seconds is enough). The V8 stderr ends with:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 00007FF702D3AE1F node::OnFatalError+1343
 2: 00007FF7039837B7 v8::Function::NewInstance+423
 ...
 [31624:000002312686B000]    16813 ms: Mark-Compact 377.5 (387.9) -> 375.2 (387.0) MB

Exit code reported by NSSM is 3221225786 (0xC0000409, STATUS_STACK_BUFFER_OVERRUN — V8's fast-fail for OOM).

Each crash only reclaims ~2 MB out of 387 MB. The 387 MB ceiling appears to be V8's incremental heap-growth cap; we also configured NODE_OPTIONS=--max-old-space-size=4096 but V8 never gets a chance to grow past ~387 MB because GC fails first.

Repro

  1. Install memos-local-openclaw-plugin with the local embedding model (Xenova/all-MiniLM-L6-v2, dtype=q8)
  2. Enable allowConversationAccess=true so the agent_end hook actually runs (otherwise no embeds fire and the leak is masked)
  3. Run a normal session — a single user/assistant exchange is enough to seed the leak; sustained conversation makes it crash repeatedly
  4. Watch ~/.openclaw/logs/nssm-gateway-stderr.logMark-Compact lines keep climbing to 387 MB then FATAL

Suspected root cause

apps/memos-local-openclaw/src/embedding/local.ts calls ext(text, { pooling: "mean", normalize: true }) in a tight loop. The @huggingface/transformers feature-extraction pipeline retains intermediate ONNX runtime tensors in its session state between calls. V8 Mark-Compact only sees the JS-side slices (Array.from(output.data).slice(...)) — the native-backed tensor arena stays referenced via the ONNX session and is not collectable. So heap grows monotonically until OOM.

Observed leak rate: ~24 MB/s of old-space growth in our deployment, sustained.

Workaround in our fork

We patched src/embedding/local.ts to (1) explicitly dispose() the output tensor and null its .data after each call, and (2) call extractor.dispose() + reload the pipeline every N calls (default 50, env-tunable via MEMOS_EMBED_RESET_AFTER). After the patch, leak rate dropped from ~24 MB/s to ~0.25 MB/s and the gateway stays up indefinitely.

// Inside embedLocal, after slicing into results:
try { (output as any).data = null; } catch {}
try { (output as any).dispose?.(); } catch {}

// After every N calls:
if (state.calls >= RESET_AFTER_CALLS) {
  try { await (ext as any).dispose?.(); } catch {}
  state = null; // next call triggers fresh load; model files are cached so reload is ~1-2s
}

Suggested upstream fix

  1. Batch multiple texts per ext() call so the ONNX session reuses its activation buffer instead of allocating a new one each time. The feature-extraction pipeline does accept arrays of strings and returns batched output — would cut the leak by a factor of the batch size.
  2. Add an internal periodic pipeline.dispose() / reload mechanism inside @huggingface/transformers itself, or expose a hook so consumers can trigger it on idle.
  3. At minimum, document that the feature-extraction pipeline leaks under repeated single-text invocation and that callers must manage session lifecycle.

Environment

  • memos-local-openclaw-plugin v1.0.6-beta.11
  • @huggingface/transformers whatever ships with that release
  • Node 24.13.0 (V8 12.x)
  • Windows 11, Session 0 service via NSSM

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-taskAutoDev task dispatched to AI coding agent | AI 编码任务bugSomething isn't working | 功能异常help wantedExtra attention is needed | 需要社区帮助pluginPlugin/adapter/bridge layer (apps/ directory) | 插件/适配层

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions