Skip to content

fix(test): prevent EBUSY on Windows from failing embedding-regression cleanup (#1353)#1356

Merged
carlos-alm merged 8 commits into
mainfrom
fix/embedding-regression-ebusy-1353
Jun 7, 2026
Merged

fix(test): prevent EBUSY on Windows from failing embedding-regression cleanup (#1353)#1356
carlos-alm merged 8 commits into
mainfrom
fix/embedding-regression-ebusy-1353

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Add flushDeferredClose() call before temp dir deletion to close any pending deferred DB handles
  • Wrap rmSync with maxRetries: 10 / retryDelay: 200 and catch remaining errors so a transient EBUSY lock never propagates as a test failure (all assertions already passed before cleanup)
  • Register process.once('exit') as a safety net — temp dir is cleaned up at process exit if the WAL lock outlasts the retry budget

Root cause

SQLite WAL mode on Windows holds OS-level file locks for hundreds of ms after db.close() returns. The plain fs.rmSync in afterAll had no retry budget, so the lock on graph.db caused intermittent EBUSY: resource busy or locked, unlink failures in CI even though all 2878 test assertions passed.

Test plan

  • CI Windows-latest green on embedding-regression.test.ts
  • macOS/Linux unaffected (cleanup still runs immediately, errors swallowed only if EBUSY)

Fixes #1353

… cleanup

SQLite WAL checkpoint holds OS-level file locks for hundreds of ms after
db.close() returns on Windows. The plain rmSync in afterAll has no retry
budget and fails intermittently with EBUSY: resource busy or locked.

Three-part fix:
- Call flushDeferredClose() to close any pending deferred DB handles first
- Register process.once('exit') safety net so the temp dir is cleaned up
  even if the immediate attempt is blocked by a WAL lock
- Wrap rmSync in try/catch with maxRetries:10/retryDelay:200 so a transient
  EBUSY never propagates as a test failure (all assertions already passed)

Fixes #1353
@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes intermittent EBUSY: resource busy or locked failures during afterAll cleanup on Windows CI caused by SQLite WAL-mode file locks outlasting db.close(). The fix adds a layered cleanup strategy: retry with Node's built-in maxRetries/retryDelay, swallow only EBUSY/EPERM if all retries are exhausted, and register a process.once('exit') safety net so the temp directory is cleaned up even if the WAL lock persists through the test run.

  • Narrowed error handling: the previous bare catch {} has been replaced with a typed guard that re-throws anything that is not EBUSY or EPERM, so genuine failures (quota, path corruption, etc.) still surface.
  • Safety-net registration: a process.once('exit') handler captures the temp dir path before the primary rmSync attempt, meaning cleanup is attempted a second time at process exit with force: true, which silently no-ops if the directory was already deleted successfully.

Confidence Score: 5/5

The change only touches test cleanup logic; all test assertions are unaffected and the new cleanup path is purely best-effort.

The fix is narrowly scoped to afterAll cleanup. It correctly captures tmpDir before the primary delete attempt, re-throws non-EBUSY/EPERM errors, and uses force: true in the exit handler so a double-delete is a no-op. No test assertions or production code are touched.

No files require special attention.

Important Files Changed

Filename Overview
tests/search/embedding-regression.test.ts Adds a process.once('exit') safety-net cleanup handler and narrows the catch block to EBUSY/EPERM only, fixing intermittent Windows CI failures from SQLite WAL locks.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[afterAll fires] --> B{tmpDir set?}
    B -- No --> Z[return early]
    B -- Yes --> C[flushDeferredClose]
    C --> D[Register process.once exit handler\ncapturedDir = tmpDir]
    D --> E[fs.rmSync with maxRetries:10\nretryDelay:200]
    E -- success --> F[Directory deleted\nexit handler runs rmSync again\nforce:true = no-op on ENOENT]
    E -- throws EBUSY or EPERM --> G[Swallow error\nexit handler is safety net]
    E -- throws other error --> H[Re-throw — surfaces as test failure]
    G --> I[process exit event fires]
    I --> J[fs.rmSync capturedDir force:true]
    J -- dir already gone --> K[No-op]
    J -- dir still locked --> L[Catch bare — best-effort]
Loading

Reviews (11): Last reviewed commit: "Merge branch 'main' into fix/embedding-r..." | Re-trigger Greptile

Comment thread tests/search/embedding-regression.test.ts
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Added the process.once('exit') safety net in commit 16ab06f. The captured dir is now registered for best-effort cleanup at process exit, so temp directories won't silently accumulate on Windows CI when the WAL lock outlasts all 10 retries.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit e3855d3 into main Jun 7, 2026
23 checks passed
@carlos-alm carlos-alm deleted the fix/embedding-regression-ebusy-1353 branch June 7, 2026 19:46
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: embedding-regression test flaky on Windows (EBUSY: resource busy or locked on graph.db)

1 participant