feat: split database adapter architecture by techiejd · Pull Request #30 · techiejd/payloadcms-vectorize

techiejd · 2026-02-07T04:16:41Z

Re-opening #27 after reverting the premature merge.

WIP WIP WIP Uses mock adapter WIP WIP WIP WIP WIP WIP

* feat(cf-adapter): add Cloudflare Vectorize adapter * feat(cf-adapter): enhance Cloudflare Vectorize integration with config-based bindings and add tests * feat(cf-adapter): refactor Cloudflare Vectorize integration to use config-based bindings and update tests * chore: update pnpm-lock.yaml

…but hopefully the rest will be done automatically

…we can be sure the whole project compiles

…eddings in Cloudflare Vectorize integration (#31)

… of many for querying (#35)

* Deduplicate shared logic across plugin and adapter packages Extract repeated production patterns (chunk validation, delete embeddings, task slug constants) into shared utilities exported from the root plugin. Consolidate test helpers via vitest path aliases so adapter tests import from the canonical root dev/ copies. Remove CF adapter dead test code (unused utils, constants, helpers). Fix chunkRichText join bug in CF adapter tests (was joining child nodes without spaces). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cf adapter limitation acknowledgement and more DRY --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

techiejd · 2026-02-18T11:04:05Z

beta.3 out now.

* adds should embed (#38) * adds should embed - merged

* adds should embed (#38) * adds should embed * Ups version to get ready for release * splits the job into one per batch (#41) * splits the job into one per batch * fix: remove waitUntil delay and persist failedChunkData on batch records - Remove 30s waitUntil delay from per-batch task re-queue (was causing test timeouts since the original code had no such delay) - Add failedChunkData JSON field to batch collection so per-batch tasks can store chunk-level failure data independently - Aggregate failedChunkData from batch records in finalizeRunIfComplete() instead of relying on in-memory accumulation from the old single-task flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add batchLimit to CollectionVectorizeOption with coordinator/worker architecture Splits prepare-bulk-embedding into coordinator + per-collection workers. Each worker processes one page of one collection, queuing a continuation job before processing to ensure crash safety. Default batchLimit is 1000 when not explicitly set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rewrite batchLimit test 2 to reuse same Payload instance The second test was creating a separate Payload instance sharing the same DB and job queues, causing two crons to compete for jobs. This led to double-execution and mock state inconsistency (expected 4 to be 2). Now both tests use the single beforeAll instance with cleanup between. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add payload.destroy() in afterAll to prevent OOM from leaked crons Every test file that creates a Payload instance now calls payload.destroy() in afterAll (or try/finally for in-test instances). This stops background cron jobs from accumulating across tests, which was causing heap exhaustion in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Trying to not destroy our heap * Runs tests in parallel now that each test gets its own db * WIP * fix: fix OOM, polling test assertions, and add diagnostic logging - Add --max-old-space-size=8192 to test:int NODE_OPTIONS (cross-env was overriding the CI env var, so the heap limit never took effect) - Fix polling.spec.ts queueSpy assertions: coordinator/worker adds an extra queue call, so poll-or-complete-single-batch is now call 3 and 4 instead of 2 and 3 - Add extensive [vectorize-debug] console.log throughout task handlers (coordinator, worker, poll-single, finalize, streamAndBatchDocs) to diagnose any remaining CI hangs - Remove redundant NODE_OPTIONS from CI workflow (now in the script) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove poll-or-complete-bulk-embedding task and aggregate incrementally Remove the backward-compatible fan-out task since the per-batch architecture hasn't been released yet. Refactor finalizeRunIfComplete to aggregate batch counts incrementally during pagination instead of collecting all batch objects into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump to 0.5.5, update changelog, remove debug logging - Bump version 0.5.4 → 0.5.5 - Add 0.5.5 entry to CHANGELOG.md (coordinator/worker, batchLimit, per-batch polling) - Document batchLimit in README CollectionVectorizeOption section - Remove all diagnostic console.log statements from bulkEmbedAll.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adds upgrade note --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.6.0-beta.5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 4 CI test failures from merge - chunkers.spec.ts: remove getPayload() call that crashes on dummy db, pass SanitizedConfig directly to chunkRichText - batchLimit.spec.ts: add missing dbAdapter (createMockAdapter) required by split_db_adapter architecture - extensionFieldsVectorSearch.spec.ts: pass adapter as second arg to createVectorSearchHandlers (new signature from split_db_adapter) - versionBump.spec.ts: destroy payload0 before creating payload1 to prevent cron worker race condition between two instances Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Cleans a nit double line * Undoes a weird test fix done by the bot * fix: harden versionBump test with sequential steps and queue isolation - Use test.step() to enforce sequential execution of each phase - Add separate realtimeQueueName per payload instance to prevent cron worker cross-talk on the default queue - Use dynamic Date.now() keys to avoid cached state interference - Increase waitForBulkJobs timeout to 30s for CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent waitForBulkJobs from returning prematurely waitForBulkJobs could return early in the coordinator/worker fan-out pattern when there's a brief window with 0 pending jobs between job transitions. Now it also checks the bulk embeddings run status — only returns when both no pending jobs exist AND no runs are in queued/running state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove test.step() — not available in Vitest test.step() is a Playwright API, not Vitest. Reverted to flat sequential code with phase comments for readability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rewrite versionBump test with single Payload instance Instead of creating two Payload instances (which caused cron cross-talk, timeout, and queue isolation issues on CI), use one instance and mutate the knowledgePools config version between bulk embed runs. Tests the same code path (versionMismatch in streamAndBatchDocs) without the multi-instance fragility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

techiejd and others added 17 commits February 1, 2026 16:23

WIP

7c42c1a

WIP WIP WIP Uses mock adapter WIP WIP WIP WIP WIP WIP

fix: ignore node_modules everywhere

bfbe191

Adds split_db_adapter to CI run

5e28697

Preparing for automated pubishes. This one beta will be done by hand …

edfdd25

…but hopefully the rest will be done automatically

Bumps version since we added deleteEmbeddings. Also runs tsc so that …

ce35542

…we can be sure the whole project compiles

Adds the type check to ci

c1db547

fixes type check

d79cc0f

removes silly double checking on split_db_adapter for push

4a0ee51

Adds root pnpm workspace

d737cbe

feat(cf-adapter): update query parameters and method for deleting emb…

79f686e

…eddings in Cloudflare Vectorize integration (#31)

Bumps version to release

b1fc554

Better typings (#34)

2a141dc

Adds better id tracking for deletion and does only one search instead…

227947a

… of many for querying (#35)

Removes dead code (#37)

f83a8c9

Bumps version for rollout

11280c7

Merge main (#40)

303c939

* adds should embed (#38) * adds should embed - merged

techiejd linked an issue Feb 23, 2026 that may be closed by this pull request

Export issues #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: split database adapter architecture#30

feat: split database adapter architecture#30
techiejd wants to merge 19 commits intomainfrom
split_db_adapter

techiejd commented Feb 7, 2026

Uh oh!

techiejd commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

techiejd commented Feb 7, 2026

Uh oh!

techiejd commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants