Symptom
With EMBEDDING_PROVIDER=local and EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2, the OpenClaw gateway crashes every ~15-30 seconds under normal conversation load (a Claude/agent turn every few seconds is enough). The V8 stderr ends with:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
1: 00007FF702D3AE1F node::OnFatalError+1343
2: 00007FF7039837B7 v8::Function::NewInstance+423
...
[31624:000002312686B000] 16813 ms: Mark-Compact 377.5 (387.9) -> 375.2 (387.0) MB
Exit code reported by NSSM is 3221225786 (0xC0000409, STATUS_STACK_BUFFER_OVERRUN — V8's fast-fail for OOM).
Each crash only reclaims ~2 MB out of 387 MB. The 387 MB ceiling appears to be V8's incremental heap-growth cap; we also configured NODE_OPTIONS=--max-old-space-size=4096 but V8 never gets a chance to grow past ~387 MB because GC fails first.
Repro
- Install
memos-local-openclaw-plugin with the local embedding model (Xenova/all-MiniLM-L6-v2, dtype=q8)
- Enable
allowConversationAccess=true so the agent_end hook actually runs (otherwise no embeds fire and the leak is masked)
- Run a normal session — a single user/assistant exchange is enough to seed the leak; sustained conversation makes it crash repeatedly
- Watch
~/.openclaw/logs/nssm-gateway-stderr.log — Mark-Compact lines keep climbing to 387 MB then FATAL
Suspected root cause
apps/memos-local-openclaw/src/embedding/local.ts calls ext(text, { pooling: "mean", normalize: true }) in a tight loop. The @huggingface/transformers feature-extraction pipeline retains intermediate ONNX runtime tensors in its session state between calls. V8 Mark-Compact only sees the JS-side slices (Array.from(output.data).slice(...)) — the native-backed tensor arena stays referenced via the ONNX session and is not collectable. So heap grows monotonically until OOM.
Observed leak rate: ~24 MB/s of old-space growth in our deployment, sustained.
Workaround in our fork
We patched src/embedding/local.ts to (1) explicitly dispose() the output tensor and null its .data after each call, and (2) call extractor.dispose() + reload the pipeline every N calls (default 50, env-tunable via MEMOS_EMBED_RESET_AFTER). After the patch, leak rate dropped from ~24 MB/s to ~0.25 MB/s and the gateway stays up indefinitely.
// Inside embedLocal, after slicing into results:
try { (output as any).data = null; } catch {}
try { (output as any).dispose?.(); } catch {}
// After every N calls:
if (state.calls >= RESET_AFTER_CALLS) {
try { await (ext as any).dispose?.(); } catch {}
state = null; // next call triggers fresh load; model files are cached so reload is ~1-2s
}
Suggested upstream fix
- Batch multiple texts per
ext() call so the ONNX session reuses its activation buffer instead of allocating a new one each time. The feature-extraction pipeline does accept arrays of strings and returns batched output — would cut the leak by a factor of the batch size.
- Add an internal periodic
pipeline.dispose() / reload mechanism inside @huggingface/transformers itself, or expose a hook so consumers can trigger it on idle.
- At minimum, document that the feature-extraction pipeline leaks under repeated single-text invocation and that callers must manage session lifecycle.
Environment
memos-local-openclaw-plugin v1.0.6-beta.11
@huggingface/transformers whatever ships with that release
- Node 24.13.0 (V8 12.x)
- Windows 11, Session 0 service via NSSM
Symptom
With
EMBEDDING_PROVIDER=localandEMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2, the OpenClaw gateway crashes every ~15-30 seconds under normal conversation load (a Claude/agent turn every few seconds is enough). The V8 stderr ends with:Exit code reported by NSSM is
3221225786(0xC0000409,STATUS_STACK_BUFFER_OVERRUN— V8's fast-fail for OOM).Each crash only reclaims ~2 MB out of 387 MB. The 387 MB ceiling appears to be V8's incremental heap-growth cap; we also configured
NODE_OPTIONS=--max-old-space-size=4096but V8 never gets a chance to grow past ~387 MB because GC fails first.Repro
memos-local-openclaw-pluginwith the local embedding model (Xenova/all-MiniLM-L6-v2, dtype=q8)allowConversationAccess=trueso theagent_endhook actually runs (otherwise no embeds fire and the leak is masked)~/.openclaw/logs/nssm-gateway-stderr.log—Mark-Compactlines keep climbing to 387 MB then FATALSuspected root cause
apps/memos-local-openclaw/src/embedding/local.tscallsext(text, { pooling: "mean", normalize: true })in a tight loop. The@huggingface/transformersfeature-extraction pipeline retains intermediate ONNX runtime tensors in its session state between calls. V8 Mark-Compact only sees the JS-side slices (Array.from(output.data).slice(...)) — the native-backed tensor arena stays referenced via the ONNX session and is not collectable. So heap grows monotonically until OOM.Observed leak rate: ~24 MB/s of old-space growth in our deployment, sustained.
Workaround in our fork
We patched
src/embedding/local.tsto (1) explicitlydispose()the output tensor and null its.dataafter each call, and (2) callextractor.dispose()+ reload the pipeline every N calls (default 50, env-tunable viaMEMOS_EMBED_RESET_AFTER). After the patch, leak rate dropped from ~24 MB/s to ~0.25 MB/s and the gateway stays up indefinitely.Suggested upstream fix
ext()call so the ONNX session reuses its activation buffer instead of allocating a new one each time. The feature-extraction pipeline does accept arrays of strings and returns batched output — would cut the leak by a factor of the batch size.pipeline.dispose()/ reload mechanism inside@huggingface/transformersitself, or expose a hook so consumers can trigger it on idle.Environment
memos-local-openclaw-pluginv1.0.6-beta.11@huggingface/transformerswhatever ships with that release