Skip to content

feat: support deferred embedding via update/RecordPatch (#88)#93

Open
dcfocus wants to merge 1 commit into
lance-format:mainfrom
dcfocus:feat/issue-88-deferred-embedding
Open

feat: support deferred embedding via update/RecordPatch (#88)#93
dcfocus wants to merge 1 commit into
lance-format:mainfrom
dcfocus:feat/issue-88-deferred-embedding

Conversation

@dcfocus

@dcfocus dcfocus commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Lets callers append raw records first and enrich them with embeddings later — the common pattern for bulk ingestion where source chunks must be persisted immediately but embeddings are computed asynchronously (large documents, rate-limited or remote embedding providers).

Adds an embedding field to the record patch path so a vector can be attached or replaced by id or external_id after the original write. A record without an embedding is durably stored but excluded from vector search until enriched (search skips null-embedding records).

Closes #88.

What changed

  • coreRecordPatch.embedding, applied in update_visible_record.
  • apiRecordPatchDto.embedding; copied through in both the core and server patch_from_dto.
  • pythonembedding parameter on the PyO3 update and Context.update().
  • Rust client — inherits the field automatically via the UpdateRecordRequest JSON body.
  • README — documents the raw-first → enrich-later pattern.

Tests

  • Rust core round-trip (enrich by id and by external_id).
  • Python e2e: raw-first → enrich, including add_many bulk ingestion.

🤖 Generated with Claude Code

)

Append raw records first and enrich them with embeddings later. Adds an
`embedding` field to the record patch path so a vector can be attached or
replaced by id or external_id after the original write.

- core: RecordPatch.embedding, applied in update_visible_record
- api: RecordPatchDto.embedding; copied in core + server patch_from_dto
- python: embedding param on PyO3 update and Context.update()
- Rust client inherits the field via the UpdateRecordRequest JSON body

A record without an embedding is durably stored but excluded from vector
search until enriched, since search skips null-embedding records.

Tests: Rust core round-trip (by id + external_id) and Python e2e
(raw-first -> enrich, incl. add_many bulk). README documents the pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support deferred embedding workflows for bulk ingestion

1 participant