[memos-local-openclaw] embedLocal leaks native ONNX memory per call; old-space fills until OOM in ~15-30s

## Symptom

With `EMBEDDING_PROVIDER=local` and `EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2`, the OpenClaw gateway crashes every ~15-30 seconds under normal conversation load (a Claude/agent turn every few seconds is enough). The V8 stderr ends with:

```
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 00007FF702D3AE1F node::OnFatalError+1343
 2: 00007FF7039837B7 v8::Function::NewInstance+423
 ...
 [31624:000002312686B000]    16813 ms: Mark-Compact 377.5 (387.9) -> 375.2 (387.0) MB
```

Exit code reported by NSSM is `3221225786` (`0xC0000409`, `STATUS_STACK_BUFFER_OVERRUN` — V8's fast-fail for OOM).

Each crash only reclaims ~2 MB out of 387 MB. The 387 MB ceiling appears to be V8's incremental heap-growth cap; we also configured `NODE_OPTIONS=--max-old-space-size=4096` but V8 never gets a chance to grow past ~387 MB because GC fails first.

## Repro

1. Install `memos-local-openclaw-plugin` with the local embedding model (Xenova/all-MiniLM-L6-v2, dtype=q8)
2. Enable `allowConversationAccess=true` so the `agent_end` hook actually runs (otherwise no embeds fire and the leak is masked)
3. Run a normal session — a single user/assistant exchange is enough to seed the leak; sustained conversation makes it crash repeatedly
4. Watch `~/.openclaw/logs/nssm-gateway-stderr.log` — `Mark-Compact` lines keep climbing to 387 MB then FATAL

## Suspected root cause

`apps/memos-local-openclaw/src/embedding/local.ts` calls `ext(text, { pooling: "mean", normalize: true })` in a tight loop. The `@huggingface/transformers` feature-extraction pipeline retains intermediate ONNX runtime tensors in its session state between calls. V8 Mark-Compact only sees the JS-side slices (`Array.from(output.data).slice(...)`) — the native-backed tensor arena stays referenced via the ONNX session and is not collectable. So heap grows monotonically until OOM.

Observed leak rate: ~24 MB/s of old-space growth in our deployment, sustained.

## Workaround in our fork

We patched `src/embedding/local.ts` to (1) explicitly `dispose()` the output tensor and null its `.data` after each call, and (2) call `extractor.dispose()` + reload the pipeline every N calls (default 50, env-tunable via `MEMOS_EMBED_RESET_AFTER`). After the patch, leak rate dropped from ~24 MB/s to ~0.25 MB/s and the gateway stays up indefinitely.

```ts
// Inside embedLocal, after slicing into results:
try { (output as any).data = null; } catch {}
try { (output as any).dispose?.(); } catch {}

// After every N calls:
if (state.calls >= RESET_AFTER_CALLS) {
  try { await (ext as any).dispose?.(); } catch {}
  state = null; // next call triggers fresh load; model files are cached so reload is ~1-2s
}
```

## Suggested upstream fix

1. Batch multiple texts per `ext()` call so the ONNX session reuses its activation buffer instead of allocating a new one each time. The feature-extraction pipeline does accept arrays of strings and returns batched output — would cut the leak by a factor of the batch size.
2. Add an internal periodic `pipeline.dispose()` / reload mechanism inside `@huggingface/transformers` itself, or expose a hook so consumers can trigger it on idle.
3. At minimum, document that the feature-extraction pipeline leaks under repeated single-text invocation and that callers must manage session lifecycle.

## Environment

- `memos-local-openclaw-plugin` v1.0.6-beta.11
- `@huggingface/transformers` whatever ships with that release
- Node 24.13.0 (V8 12.x)
- Windows 11, Session 0 service via NSSM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[memos-local-openclaw] embedLocal leaks native ONNX memory per call; old-space fills until OOM in ~15-30s #1863

Symptom

Repro

Suspected root cause

Workaround in our fork

Suggested upstream fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[memos-local-openclaw] embedLocal leaks native ONNX memory per call; old-space fills until OOM in ~15-30s #1863

Description

Symptom

Repro

Suspected root cause

Workaround in our fork

Suggested upstream fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions