Skip to content

AdvancedSQLiteSession.add_items can report success after structure metadata failure #3348

@Aphroq

Description

@Aphroq

Please read this first

Describe the bug

AdvancedSQLiteSession.add_items() writes rows to the base messages table, commits them, and only then writes message_structure rows. If the structure metadata write fails, add_items() logs the error and returns without raising.

This means callers can observe a successful add_items() call even though AdvancedSQLiteSession.get_items() cannot read the saved messages because advanced reads join through message_structure. If orphan cleanup also fails, the base table keeps invisible orphan rows.

Impact: a session save can appear successful while the new messages are missing from later reads, and retry logic cannot reliably know whether to retry or stop.

Debug information

  • Agents SDK version: main @ 4c3de2d
  • Latest release boundary checked: v0.17.0
  • Python version: Python 3.12.1

Repro steps

Run this minimal script from the repo root:

import asyncio
import sqlite3

from agents.extensions.memory import AdvancedSQLiteSession


class BrokenMetadataSession(AdvancedSQLiteSession):
    def _insert_structure_metadata(self, conn: sqlite3.Connection, items):
        raise RuntimeError("metadata write failed")


class BrokenMetadataAndCleanupSession(BrokenMetadataSession):
    def _cleanup_orphaned_messages_sync(self, conn: sqlite3.Connection) -> int:
        raise RuntimeError("cleanup failed")


async def run_case(session_cls, label):
    session = session_cls(session_id=label, create_tables=True)
    await session.add_items([{"role": "user", "content": "hello"}])
    visible = await session.get_items()

    with session._locked_connection() as conn:
        raw_count = conn.execute(
            f"SELECT COUNT(*) FROM {session.messages_table} WHERE session_id = ?",
            (session.session_id,),
        ).fetchone()[0]
        structure_count = conn.execute(
            "SELECT COUNT(*) FROM message_structure WHERE session_id = ?",
            (session.session_id,),
        ).fetchone()[0]

    print(label, "visible=", visible, "raw_count=", raw_count, "structure_count=", structure_count)
    session.close()


async def main():
    await run_case(BrokenMetadataSession, "cleanup_succeeds")
    await run_case(BrokenMetadataAndCleanupSession, "cleanup_fails")


asyncio.run(main())

Actual result:

Failed to add structure metadata for session cleanup_succeeds: metadata write failed
Failed to add structure metadata for session cleanup_fails: metadata write failed
Failed to cleanup orphaned messages: cleanup failed
cleanup_succeeds visible= [] raw_count= 0 structure_count= 0
cleanup_fails visible= [] raw_count= 1 structure_count= 0

add_items() returns in both cases. In the first case the just-added message is removed by cleanup and is not visible to get_items(). In the second case the message remains in the base table but has no message_structure row, so advanced reads still cannot see it.

Expected behavior

AdvancedSQLiteSession.add_items() should write the base message rows and message_structure rows in a single transaction. If structure metadata cannot be written, the base message rows should be rolled back and the original error should be visible to the caller. A retry after the failure should persist the batch exactly once.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions