This document describes the changes made to enable this codebase to work with AI agent evaluation frameworks directly as an SDK instead of over API proxy.
What Changed:
- Added
PydanticMixinto all base classes
What Changed:
- Created
BoxOperationsclass that encapsulates session management - AI agents don't need to pass
sessionto every function call - Fixed parameter names to match actual operations.py signatures
Files Created:
src/services/box/database/typed_operations.py- Type-safe wrapper for Box operations
Usage:
from services.box.database.typed_operations import BoxOperations
from sqlalchemy.orm import Session
# Initialize once with session
ops = BoxOperations(session)
# AI agents call methods without passing session
user = ops.create_user(
name="John Doe",
login="john@example.com",
job_title="Engineer"
)
folder = ops.create_folder(
name="Reports",
parent_id="0",
user_id="user-123"
)
# All returned models have Pydantic serialization
folder_data = folder.model_dump() # Returns dict for AI agentFor AI Agent SDKs:
# Anthropic Claude SDK
tools = [
{
"name": "create_folder",
"description": "Create a new folder in Box",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Folder name"},
"parent_id": {"type": "string", "description": "Parent folder ID"},
"user_id": {"type": "string", "description": "Owner user ID"}
},
"required": ["name", "parent_id", "user_id"]
}
}
]
# Tool implementation
def execute_tool(tool_name, arguments):
ops = BoxOperations(session)
if tool_name == "create_folder":
folder = ops.create_folder(**arguments)
return folder.model_dump() # Serialize for agentWhat Changed:
- Created unified interface for seeding, clearing, and diffing database state
- Utilities work across all services
Files Created:
src/eval_platform/eval_utilities.py- State management for evals
Usage:
from eval_platform.eval_utilities import create_snapshot, get_diff
# Before agent runs
before_snapshot = create_snapshot(session, "test_env", "before")
# Agent executes actions...
ops = BoxOperations(session)
folder = ops.create_folder(name="Reports", parent_id="0", user_id="user-123")
# After agent runs
after_snapshot = create_snapshot(session, "test_env", "after")
# Get diff
diff = get_diff(session_manager, "test_env", "before", "after")
# Analyze changes
print(f"Inserts: {len(diff.inserts)}") # New rows
print(f"Updates: {len(diff.updates)}") # Modified rows
print(f"Deletes: {len(diff.deletes)}") # Deleted rows
# Each change includes __table__ to identify origin
for insert in diff.inserts:
print(f"Created {insert['__table__']}: {insert['name']}")from eval_platform.eval_utilities import EvalContext
with EvalContext(session_manager, "test_env") as ctx:
# Agent executes here
ops = BoxOperations(session)
folder = ops.create_folder(name="Reports", parent_id="0", user_id="user-123")
# Get diff automatically
assert len(ctx.inserts) == 1
assert ctx.inserts[0]['name'] == 'Reports'
assert ctx.inserts[0]['__table__'] == 'box_folders'
# Snapshots automatically cleaned upfrom eval_platform.eval_utilities import clear_environment
# Clear all data (keeps schema structure)
clear_environment(session, "test_env")- Status: Fully migrated
- Typed Operations:
BoxOperationswith all CRUD operations - Tests: 5 passing SQLite integration tests
- Files:
src/services/box/database/typed_operations.py - Key Operations:
create_user,create_folder,create_file,update_folder,delete_folder
- Status: Fully migrated
- Typed Operations:
CalendarOperationswith calendar, event, and user operations - Tests: 6 passing SQLite integration tests
- Files:
src/services/calendar/database/typed_operations.py - Key Operations:
create_user,create_calendar,create_event,list_events,update_event - Notes: Requires
user_idfor all event operations (permissions checking)
- Status: Fully migrated
- Typed Operations:
SlackOperationswith team, channel, message operations - Tests: 5 passing SQLite integration tests
- Files:
src/services/slack/database/typed_operations.py - Key Operations:
create_team,create_user,create_channel,send_message,add_emoji_reaction - Notes: Message text field is
message_textin database model
- Status: Comprehensive CRUD API complete with 80+ operations
- Typed Operations:
LinearOperationswith full CRUD for 14 major entity types - Files:
src/services/linear/database/typed_operations.py- 80+ operations (1,246 lines)src/services/linear/database/entity_defaults.py- Default factories (360 lines)examples/linear_crud_demo.py- Usage demonstration (280 lines)LINEAR_CRUD_SUMMARY.md- Detailed implementation notes
- Key Operations (80+ methods total):
- Organization (2):
create_organization,get_organization - User (3):
create_user,get_user,get_user_by_email - Team (2):
create_team,get_team - Issue (4): Full CRUD with
create,get,update,delete - Comment (4): Full CRUD operations
- WorkflowState (2): Create and get workflow states
- Project (5): Full CRUD +
list_projects - ProjectMilestone (4): Full CRUD for milestones
- Cycle (4): Sprint/cycle management
- Initiative (4): High-level initiative tracking
- Document (4): Documentation management
- Attachment (3): File attachment handling
- IssueLabel (4): Label management with colors
- IssueRelation (4): Issue dependency tracking
- Organization (2):
- Design Pattern: Entity defaults factory pattern
- Challenge: Linear's auto-generated GraphQL schema has extreme complexity
- Organization: 102 required fields
- Team: 50+ required fields
- User: 30+ required fields
- Solution:
entity_defaults.pyprovides factory functions# Instead of manually setting 102 fields: defaults = organization_defaults(name, url_key) org = Organization(**defaults)
- All defaults support
**kwargsfor customization
- Challenge: Linear's auto-generated GraphQL schema has extreme complexity
- AI Agent Tool Design:
- Clear type hints on all 80+ operations
- Comprehensive docstrings for each method
- Pydantic serialization:
.model_dump(),.model_dump_json() - Optional return types for proper None handling
- Production Use:
- Pre-seed baseline entities (org, teams, states) in setup
- Use typed operations for standard CRUD workflows
- Complete entity defaults incrementally as needed
- See
LINEAR_CRUD_SUMMARY.mdfor full details
All services include comprehensive SQLite integration tests demonstrating:
- Basic CRUD operations
- State management with snapshots
- Environment clearing
- Complete agent evaluation workflows
- Multiple operations with diff tracking
Run all working tests:
cd backend
DATABASE_URL=sqlite:///dummy.db PYTHONPATH=src uv run pytest tests/test_sqlite_integration.py tests/test_calendar_sqlite_integration.py tests/test_slack_sqlite_integration.py -vTest Results (as of migration):
- ✅ Box: 5/5 passing
- ✅ Calendar: 6/6 passing
- ✅ Slack: 5/5 passing
- ✅ Linear: Comprehensive CRUD operations implemented (80+ methods)
Total: 16/16 tests passing for Box, Calendar, and Slack services Linear: Full CRUD API implemented with entity defaults pattern
All services now support both PostgreSQL and SQLite:
- Replaced PostgreSQL-specific
JSONBwith database-agnosticJSONacross all schemas - Eval utilities auto-detect database dialect and use appropriate SQL
- Integration tests use SQLite for fast, isolated testing
- Production uses PostgreSQL with full schema isolation