perf(core): reduce postgres vector sync work#723
Conversation
Signed-off-by: phernandez <paul@basicmachines.co>
|
Additional local verification note: Failure:
The targeted tests that passed before opening the PR were:
|
|
Claude finished @phernandez's task in 5m 32s —— View job PR Review:
|
| Item | Status | Notes |
|---|---|---|
| Python 3.12+ types and async patterns | progress_callback missing annotation on override |
|
| SQLAlchemy 2.0 best practices | ✅ | F-string SQL building is safe but should be commented |
| 100-character line length | ✅ | |
| No commented-out code | ✅ | |
| Unit tests for new code | ✅ | New TestBatchPrepareConcurrency and TestWriteEmbeddings |
| 100% test coverage | _update_timestamp_sql has # pragma: no cover on line 892 — verify that's still needed |
|
| No hardcoded secrets | ✅ | |
| Error handling / logging | ✅ | Good structured logging with loguru |
| Performance | ✅ | Core intent: latency hiding via concurrent prepare |
| Code duplication | ❌ | ~100 lines duplicated from base class orchestration loop |
| Known test regression | ❌ | test_semantic_quality[asyncio-postgres-openai] recall@5 = 0.0 |
| File path / Windows compat | ✅ | No file path operations in changed code |
| No emoji/unicode in logs | ✅ | |
| Literate programming comments | ✅ | Good Trigger/Why/Outcome comments on the new method |
| Alembic migration | N/A | No schema changes |
Bottom Line
The bulk-embedding upsert and concurrent-prepare window are good ideas. The main concerns are:
-
Unresolved test regression — please verify whether
recall@5 = 0.0is a pre-existing failure or introduced by this PR. If it's a regression, the concurrentasyncio.gatherprepare path or the new bulk-upsert chunk path may have a bug that silently loses embedding writes. -
Code duplication — the nearly verbatim copy of the base class orchestration loop should be refactored before this lands.
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 4m 35s —— View job PR Review:
|
| Item | Status | Notes |
|---|---|---|
| Python 3.12+ types and async patterns | progress_callback missing type annotation on Postgres override |
|
| SQLAlchemy 2.0 best practices | ✅ | |
| FastAPI / Typer conventions | N/A | No API/CLI changes |
| 100-character line length | ✅ | |
| No commented-out code | ✅ | |
| Unit tests for new code | ✅ | TestBatchPrepareConcurrency and TestWriteEmbeddings added |
| 100% test coverage | _update_timestamp_sql (line 1097 in postgres) carries # pragma: no cover — verify it's still needed |
|
| Documentation / docstrings | ✅ | Good Trigger/Why/Outcome comments on new method |
| No hardcoded secrets | ✅ | |
| Proper error handling and logging | ✅ | Structured loguru logging throughout |
| Performance | ✅ | Core intent (concurrent prepare + bulk upsert) is sound |
| Code duplication | ❌ | ~100-line orchestration loop duplicated from base class |
| Known test regression | ❌ | postgres-openai quality thresholds removed to mask recall@5 = 0.0 failure |
| Alembic migration | ✅ | m6h7i8j9k0l1 adds entity_fingerprint + embedding_model columns |
| Windows file path compatibility | ✅ | No file path operations in changed code |
| No emoji/unicode in log output | ✅ | |
| CLAUDE.md conventions | ✅ | Literate programming style followed |
Bottom Line
Two blockers before merging:
-
Restore the
postgres-openaiquality thresholds (revert the-3lines removed fromtest_semantic_quality.py) and fix whatever is causingrecall@5 = 0.0. The concurrent prepare window or the bulk embedding upsert likely has a bug that silently produces zero queryable results for the OpenAI provider. -
Refactor the duplicated orchestration loop — the Postgres override should be a minimal hook on the base class, not a full copy of it.
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 6m 37s —— View job PR Review:
|
| Issue | Status |
|---|---|
| ~100-line orchestration loop duplicated from base class | ✅ Fixed by 4529bfa refactor — _prepare_entity_vector_jobs_window hook + base class owns the full loop |
Missing progress_callback type annotation on Postgres override |
✅ Fixed — Postgres no longer overrides sync_entity_vectors_batch |
incoming_by_key built but only partially used |
✅ Removed in refactor |
Missing explicit session.commit() before early returns |
✅ Both Postgres and base class have explicit commits |
Minor Issues
3. _update_timestamp_sql pragma on line 874
async def _update_timestamp_sql(self) -> str:
return "NOW()" # pragma: no coverThis method has a body (return "NOW()") and is not abstract, yet carries # pragma: no cover. Either it should have a test that calls it, or if it's a Postgres-specific concrete implementation of an abstract base method, document why it can't be tested.
4. Progress-callback burst semantics
_sync_entity_vectors_internal lines 827–829 fire progress_callback per entity before the concurrent _prepare_entity_vector_jobs_window gather fires. Progress events arrive in window-sized bursts (default 4 at a time), not one-by-one. Not a bug, but callers implementing progress bars may observe jumpy reporting. This is worth a comment near the callback invocation.
Team Checklist
| Item | Status | Notes |
|---|---|---|
| Python 3.12+ types and async patterns | ✅ | |
| SQLAlchemy 2.0 best practices | ✅ | |
| 100-character line length | ✅ | |
| No commented-out code | ✅ | |
| Unit tests for new code | ✅ | TestBatchPrepareConcurrency, TestWriteEmbeddings, fingerprint skip tests added |
| 100% test coverage | _update_timestamp_sql pragma on line 874 — verify still needed |
|
| No hardcoded secrets | ✅ | |
| Proper error handling and logging | ✅ | |
| Performance | ✅ | Concurrent prepare + bulk upsert + skip-unchanged are all sound ideas |
| Code duplication | ✅ | Resolved by 4529bfa |
| Chunk-skip logic correctness | ❌ | skipped_chunks_count += 1; continue misplaced in base class — blocker |
| Quality test regression | ❌ | postgres-openai thresholds removed and not restored — blocker |
| Alembic migration | ✅ | m6h7i8j9k0l1 adds entity_fingerprint + embedding_model columns |
| Windows file path compatibility | ✅ | |
| No emoji/unicode in logs | ✅ | |
| Literate programming comments | ✅ | Good Trigger/Why/Outcome style throughout |
Bottom Line
The refactoring in 4529bfa resolved the main architectural concern from previous reviews. Two blockers remain:
-
Fix the indentation bug in
SearchRepositoryBase._prepare_entity_vector_jobslines 1239–1240 — moveskipped_chunks_count += 1andcontinueinsideif same_source_hash and not is_orphan and same_embedding_model:. Add a test that re-syncs an entity with changed content and verifies the embedding is updated. -
Restore the
postgres-openaiquality thresholds and confirm the test passes on HEAD. If recall@5 = 0.0 still reproduces, the bug is likely in the Postgres_prepare_entity_vector_jobsbulk-upsert path or_write_embeddings.
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 3m 54s —— View job PR Review:
|
| Issue | Status |
|---|---|
Chunk-skip skipped_chunks_count += 1; continue indentation bug in base class |
✅ Fixed in 16fc164 |
postgres-openai quality thresholds removed to mask recall@5 = 0.0 |
✅ Restored in 16fc164 |
| ~100-line orchestration loop duplicated from base class | ✅ Fixed in 4529bfa — _prepare_entity_vector_jobs_window hook |
Missing progress_callback type annotation on Postgres override |
✅ N/A — Postgres no longer overrides sync_entity_vectors_batch |
incoming_by_key partially used |
✅ Removed in refactor |
Remaining Issues
1. Dead abstract method _update_timestamp_sql
src/basic_memory/repository/search_repository_base.py line 292, postgres_search_repository.py line 873, sqlite_search_repository.py line 557 — _update_timestamp_sql is declared @abstractmethod, implemented in both subclasses, but never called anywhere in the codebase. The actual timestamp expression used throughout _prepare_entity_vector_jobs is _timestamp_now_expr() (a synchronous non-abstract method). The Postgres override carries # pragma: no cover specifically because nothing calls it.
This is dead code in three files. It should be removed (the @abstractmethod declaration from the base class and both backend implementations). The # pragma: no cover on the Postgres implementation is the tell — it's not a coverage exception for a hard-to-test path, it's evidence the method is unreachable.
2. SQLite upgrade silently drops all cached embeddings
src/basic_memory/repository/sqlite_search_repository.py lines 394–413 — on first startup after this upgrade, existing SQLite users will hit the schema mismatch check (the new entity_fingerprint and embedding_model columns are not in old tables), causing search_vector_chunks and search_vector_embeddings to be dropped and recreated. A full re-sync is triggered on the next indexing run.
This is handled gracefully — the log warning at line 407 is visible — but the UX impact (re-embedding all entities) can be significant for large knowledge bases. A comment near the schema mismatch handler noting that this was introduced in this version (and why it's safe to drop-recreate) would help future maintainers understand the intent.
Minor Issues
3. Progress callback fires in window-sized bursts
search_repository_base.py lines 827–829 — progress_callback is invoked per entity within each window before the concurrent _prepare_entity_vector_jobs_window gather fires. For callers implementing progress bars, events arrive in batches of prepare_window_size (default 4) rather than one by one. Not a bug, but callers should be aware of this. A brief comment near the callback loop noting the burst semantics would help.
Team Checklist
| Item | Status | Notes |
|---|---|---|
| Python 3.12+ types and async patterns | ✅ | |
| SQLAlchemy 2.0 best practices | ✅ | |
| 100-character line length | ✅ | description= strings in Field() follow existing pattern |
| No commented-out code | ✅ | |
| Unit tests for new code | ✅ | TestBatchPrepareConcurrency, TestWriteEmbeddings, fingerprint skip tests |
| 100% test coverage | _update_timestamp_sql in postgres has # pragma: no cover — method is dead code, should be removed |
|
| No hardcoded secrets | ✅ | |
| Proper error handling and logging | ✅ | Good structured loguru logging throughout |
| Performance | ✅ | Concurrent prepare + bulk upsert + skip-unchanged are all sound |
| Code duplication | ✅ | Resolved |
| Chunk-skip logic correctness | ✅ | Fixed in 16fc164 |
| Quality test regression | ✅ | Fixed in 16fc164 |
| Alembic migration | ✅ | m6h7i8j9k0l1 adds entity_fingerprint + embedding_model for Postgres |
| SQLite upgrade path | Schema mismatch detection drops/recreates tables — full re-sync on upgrade | |
| Windows file path compatibility | ✅ | No file path operations in changed code |
| No emoji/unicode in logs | ✅ | |
| Literate programming comments | ✅ | Good Trigger/Why/Outcome style throughout |
| F-string SQL safety | ✅ | Safety comments added near both bulk upsert statements |
Bottom Line
Both blockers from the previous reviews are resolved. The implementation is correct, the architecture is clean, and the test coverage is thorough.
Two remaining items before merging:
-
Remove
_update_timestamp_sql— it's dead code in three files. The# pragma: no coveron the Postgres implementation is a signal, not a fix. -
Document the SQLite re-sync behaviour — add a comment near the schema mismatch handler explaining that the drop-recreate is intentional and expected on upgrade.
CI tests are still running at the time of this review.
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 3m 46s —— View job PR Review:
|
| Issue | Status |
|---|---|
Chunk-skip skipped_chunks_count += 1; continue indentation bug (base class) |
✅ Fixed in 16fc164 |
postgres-openai quality thresholds removed to mask recall@5 = 0.0 |
✅ Restored in 16fc164 |
| ~100-line orchestration loop duplicated from base class | ✅ Fixed in 4529bfa — _prepare_entity_vector_jobs_window hook |
Dead _update_timestamp_sql method in 3 files (with # pragma: no cover) |
✅ Removed in e1df151 |
| SQLite schema-mismatch drop/recreate undocumented | ✅ Trigger/Why/Outcome comment added at sqlite_search_repository.py:407 |
Missing progress_callback type annotation on Postgres override |
✅ N/A — Postgres no longer overrides sync_entity_vectors_batch |
incoming_by_key partially used |
✅ Removed in refactor |
One Minor Observation
Postgres main-path relies on context manager auto-commit
postgres_search_repository.py — the main (non-early-return) exit from _prepare_entity_vector_jobs exits the async with db.scoped_session(...) block without an explicit await session.commit(). The base class (search_repository_base.py:1333) has an explicit commit before the block closes.
Both are functionally correct — db.scoped_session.__aexit__ auto-commits. But the inconsistency is minor and worth a note for future maintainers. Not a blocker.
Team Checklist
| Item | Status | Notes |
|---|---|---|
| Python 3.12+ types and async patterns | ✅ | |
| SQLAlchemy 2.0 best practices | ✅ | |
| FastAPI / Typer conventions | N/A | No API/CLI changes |
| 100-character line length | ✅ | |
| No commented-out code | ✅ | |
| Unit tests for new code | ✅ | TestBatchPrepareConcurrency, TestWriteEmbeddings, fingerprint skip + shard tests |
| 100% test coverage | ✅ | # pragma: no cover entries are on abstract pass-through methods only |
| No hardcoded secrets | ✅ | |
| Proper error handling and logging | ✅ | Structured loguru logging throughout |
| Performance | ✅ | Concurrent prepare + bulk upsert + skip-unchanged all sound |
| Code duplication | ✅ | Resolved — _prepare_entity_vector_jobs_window hook pattern |
| Chunk-skip logic correctness | ✅ | Fixed and tested |
| Quality test regression | ✅ | postgres-openai thresholds restored and passing |
| Alembic migration | ✅ | m6h7i8j9k0l1 adds entity_fingerprint + embedding_model with NOT NULL backfill |
| SQLite upgrade path | ✅ | Schema mismatch handler documented with Trigger/Why/Outcome comment |
| F-string SQL safety | ✅ | Index-safety comments on both bulk upsert statements |
| Windows file path compatibility | ✅ | No file path operations in changed code |
| No emoji/unicode in logs | ✅ | |
| Literate programming comments | ✅ | Good Trigger/Why/Outcome style throughout |
Bottom Line
This is ready to merge. CI tests are still running — pending their completion.
Summary
Testing