Skip to content

feat(core): add note_content tenant schema primitive#719

Merged
phernandez merged 2 commits intomainfrom
feat/note-content-tenant-schema
Apr 5, 2026
Merged

feat(core): add note_content tenant schema primitive#719
phernandez merged 2 commits intomainfrom
feat/note-content-tenant-schema

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

  • add the shared tenant note_content model, migration, and repository for cloud-facing note materialization state
  • rename the new primitive from note_state to note_content across schema, ORM, repository, and tests
  • fix the Alembic async fallback so running-loop migration paths do not leak an un-awaited coroutine warning

Testing

  • uv run ruff check src/basic_memory/models/knowledge.py src/basic_memory/models/__init__.py src/basic_memory/repository/__init__.py src/basic_memory/repository/note_content_repository.py src/basic_memory/alembic/versions/l5g6h7i8j9k0_add_note_content_table.py tests/repository/test_note_content_repository.py tests/test_note_content_migration.py
  • LOGFIRE_IGNORE_NO_CONFIG=1 uv run pytest -q tests/test_alembic_env.py tests/repository/test_note_content_repository.py tests/test_note_content_migration.py
  • BASIC_MEMORY_TEST_POSTGRES=1 LOGFIRE_IGNORE_NO_CONFIG=1 uv run pytest -q tests/repository/test_note_content_repository.py

- add the shared note_content model, migration, and repository

- cover CRUD, migration, and Postgres-backed repository behavior

- fix the Alembic async fallback coroutine warning

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

Claude finished @phernandez's task in 2m 42s —— View job


PR Review: feat(core): add note_content tenant schema primitive

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture compliance
  • Review security & performance
  • Post final review summary

Summary

This is a well-structured primitive. The NoteContent model, migration, repository, and Alembic async fix are all clean. One coverage gap needs addressing before merge per the project's 100% coverage requirement.


Issues

🔴 Missing test: upsert insert path (coverage gap)

NoteContentRepository.upsert has two branches — update (when row exists) and insert (when it doesn't). The test test_upsert_updates_existing_note_content only exercises the update path (it calls create first). The insert path (if existing is Nonesession.add(note_content)) is never reached by any test.

note_content_repository.py:111-119:

if existing is None:        # ← never reached in tests
    session.add(note_content)
    await session.flush()
    created = await self.select_by_id(session, note_content.entity_id)
    if created is None:  # pragma: no cover
        raise ValueError(...)
    return created

A minimal fix is a new test:

@pytest.mark.asyncio
async def test_upsert_inserts_when_no_existing_row(session_maker, test_project, sample_entity):
    """Upsert should insert when no row exists yet."""
    repository = NoteContentRepository(session_maker, project_id=test_project.id)
    created = await repository.upsert(build_note_content_payload(sample_entity.id))
    assert created.entity_id == sample_entity.id
    assert created.db_version == 1

Fix this →

🟡 update_state_fields has an undocumented identity-realignment side effect

update_state_fields calls _align_identity_fields internally, meaning a call like update_state_fields(entity_id, file_write_status="failed") also silently refreshes file_path, external_id, and project_id from the owning entity. The docstring only says "Update mutable sync and materialization fields" with no mention of this side effect.

The test test_update_state_fields_realigns_identity_with_entity documents this via its name, but the method docstring should be explicit:

note_content_repository.py:134:

async def update_state_fields(self, entity_id: int, **updates: Any) -> Optional[NoteContent]:
    """Update mutable sync and materialization fields for a note_content row.

    Also re-aligns the mirrored identity fields (project_id, external_id, file_path)
    from the owning entity, so renames are picked up without a separate call.
    """

Fix this →

🟡 include_object in env.py shadows Python built-in object

alembic/env.py:69:

def include_object(object, name, type_, reflected, compare_to):

The parameter object shadows the object built-in. ruff may flag this with A002 (argument-shadows-builtin). Rename to obj or schema_object to be safe.

Fix this →

🟢 Nit: Non-standard Alembic revision ID

l5g6h7i8j9k0_add_note_content_table.py:16:

revision: str = "l5g6h7i8j9k0"

Alembic auto-generates 12-char hex IDs (e.g., a1b2c3d4e5f6). Using a letter-sequence pattern is functional but non-standard and could cause confusion. Not a blocker.


Checklist

Code Quality & Standards

  • Follows coding conventions in CLAUDE.md
  • Python 3.12+ type annotations and async patterns (Mapped, mapped_column, async_sessionmaker)
  • SQLAlchemy 2.0 best practices
  • FastAPI/Typer conventions: N/A (no API/CLI changes)
  • 100-character line length maintained
  • No commented-out code blocks
  • Windows-compatible paths (Path(...).as_posix() used in _align_identity_fields)

Testing & Documentation

  • Unit tests for new functions/methods
  • Integration tests: N/A (no new MCP tools)
  • Test coverage for edge cases (cascade, cross-project scoping, file path rename)
  • 100% test coverageupsert insert path untested (see above)
  • # pragma: no cover used appropriately on __repr__ and impossible post-flush branches

Basic Memory Architecture

  • Database changes include Alembic migration with proper upgrade()/downgrade()
  • FK cascade ondelete="CASCADE" for both entity_id and project_id
  • CheckConstraint on file_write_status enforced at DB level
  • _align_identity_fields keeps denormalized fields consistent — good pattern
  • NOTE_CONTENT_MUTABLE_FIELDS frozenset guards update_state_fields against invalid keys
  • Preserves local-first architecture (no cloud coupling in this primitive)

Security & Performance

  • No hardcoded secrets or credentials
  • No sensitive data in logs
  • Indexes on project_id, file_path, and external_id (unique) — correct for expected access patterns
  • BigInteger for version counters — correct choice for high-frequency increment fields

Strengths

The Alembic async fix is particularly well done — creating the coroutine before asyncio.run(), catching RuntimeError, and closing the coroutine before re-raising is the correct pattern to prevent RuntimeWarning: coroutine 'run_async_migrations' was never awaited. The thread-based fallback for nested-loop environments (pytest-asyncio) is clean. Tests in test_alembic_env.py provide good regression coverage for each branch.

The _align_identity_fields approach for keeping NoteContent's denormalized identity fields (external_id, file_path, project_id) in sync with the owning Entity is a solid design that prevents drift — calling it from both create and update_state_fields ensures it can't be bypassed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f13742abf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +121 to +124
for column_name in NoteContent.__table__.columns.keys():
if column_name == "entity_id":
continue
setattr(existing, column_name, getattr(note_content, column_name))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve existing fields when upserting partial payloads

This update path overwrites every column on the existing row with attributes from a transient NoteContent object, but _coerce_note_content only requires entity_id for mapping inputs. If callers pass a partial payload (which the current signature allows), omitted fields resolve to None and non-null columns like markdown_content/db_checksum/file_write_status get clobbered, causing integrity failures or accidental data loss instead of a safe upsert.

Useful? React with 👍 / 👎.

Comment on lines +84 to +85
query = self.select().where(NoteContent.file_path == Path(file_path).as_posix())
return await self.find_one(query)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid single-row lookup on non-unique file paths

This method relies on find_one() (which uses one_or_none()), so it will raise MultipleResultsFound if more than one row matches a path. The new schema does not enforce (project_id, file_path) uniqueness for note_content, and mirrored paths can drift when an entity path changes before note content is realigned, so duplicate paths are possible in a project and this lookup can fail at runtime.

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

Claude finished @phernandez's task in 1m 39s —— View job


Re-review: feat(core): add note_content tenant schema primitive

  • Read latest commit changes (5a24c46)
  • Verify previous review issues were addressed
  • Check for new issues
  • Post final review summary

All Previous Issues Resolved ✅

Commit 5a24c46 addresses every issue from the prior review plus both Codex P1/P2 concerns.

🔴 → ✅ Upsert insert path now tested

test_upsert_inserts_when_no_existing_row was added and correctly calls upsert without a prior create, exercising the if existing is None branch at note_content_repository.py:135.

🟡 → ✅ update_state_fields docstring updated

Now reads: "Update sync fields and re-align project_id, external_id, and file_path from entity." — the side effect is explicit.

🟡 → ✅ include_object builtin shadow fixed

env.py:69 parameter renamed from objectobj.

Codex P1 → ✅ Partial upsert no longer clobbers existing fields

_coerce_note_content now returns (NoteContent, set[str]) tracking explicitly provided fields. The update loop at note_content_repository.py:145-151 applies only provided_fields - {"entity_id"} plus the three identity fields — omitted columns are left untouched. test_upsert_preserves_existing_fields_for_partial_payload verifies this.

Codex P2 → ✅ get_by_file_path handles non-unique paths

Replaced find_one() (which raises on multiple results) with a JOIN+ORDER BY+LIMIT-1 strategy that prefers the row whose current entity.file_path still matches, then falls back to most recently updated. test_note_content_file_path_lookup_prefers_entity_with_current_path covers the drift scenario directly.


Additional Test Coverage Added

Beyond the minimum requested fix, the commit adds 8 more tests covering:

  • test_create_requires_entity_id — fail-fast on missing identity
  • test_create_rejects_missing_entity — FK integrity at the repo level
  • test_create_rejects_entity_from_another_project — cross-project boundary guard
  • test_update_state_fields_rejects_invalid_fields — mutable-field frozenset guard
  • test_update_state_fields_returns_none_for_missing_note_content — clean None response
  • test_delete_by_entity_id_returns_false_when_missing — missing row returns False
  • test_note_content_file_path_lookup_prefers_entity_with_current_path — path drift preference

Checklist

Code Quality & Standards

  • Follows Basic Memory coding conventions (CLAUDE.md)
  • Python 3.12+ type annotations and async patterns
  • SQLAlchemy 2.0 best practices
  • 100-character line length maintained
  • No commented-out code blocks
  • Windows-compatible paths (Path(...).as_posix())

Testing & Documentation

  • Unit tests for all new functions/methods
  • 100% test coverage — insert path, partial upsert, path drift, boundary guards all covered
  • # pragma: no cover used appropriately on post-flush impossibility branches
  • Docstrings accurately describe side effects

Basic Memory Architecture

  • Alembic migration with proper upgrade()/downgrade()
  • FK cascade ondelete="CASCADE" for entity_id and project_id
  • CheckConstraint on file_write_status enforced at DB level
  • _align_identity_fields pattern prevents drift across all write paths
  • NOTE_CONTENT_MUTABLE_FIELDS frozenset guards update_state_fields
  • Preserves local-first architecture (no cloud coupling)

Security & Performance

  • No hardcoded secrets or credentials
  • No sensitive data in logs
  • Indexes on project_id, file_path, external_id — correct for access patterns
  • BigInteger for version counters

Minor Nit (not a blocker)

The revision ID l5g6h7i8j9k0 is still a letter-sequence pattern rather than hex. This was noted as non-blocking in the prior review and remains unchanged — fine to leave as-is.


This PR is ready to merge. All critical and medium issues have been fully addressed with solid test coverage.

@phernandez phernandez merged commit 88c8f18 into main Apr 5, 2026
41 checks passed
@phernandez phernandez deleted the feat/note-content-tenant-schema branch April 5, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant