Skip to content

feat(agent): persist and restore conversation history across runs#1963

Open
jaswu51 wants to merge 7 commits intodimensionalOS:devfrom
jaswu51:jaswu51/feat/agent-session-restore
Open

feat(agent): persist and restore conversation history across runs#1963
jaswu51 wants to merge 7 commits intodimensionalOS:devfrom
jaswu51:jaswu51/feat/agent-session-restore

Conversation

@jaswu51
Copy link
Copy Markdown

@jaswu51 jaswu51 commented May 3, 2026

Problem

Agent _history is lost every time dimos run exits. Users have no way to continue a previous conversation across runs. Closes #1898

Solution

  • Added dimos/core/session_store.py: saves/loads/restores conversation history as JSON under $XDG_STATE_HOME/dimos/sessions/<blueprint>/ (defaults to ~/.local/state/dimos/ on Linux)
  • Agent and McpClient now restore history on startup and save on stop
  • Added --restore-session=<run_id> CLI flag to restore a specific session
  • Added --no-restore CLI flag to start with a clean history
  • Default behavior: auto-restore latest session for the same blueprint

Session files track parent_session_id to maintain the chain across runs.

Breaking Changes

None

How to Test

# Run agent, send a message, stop with Ctrl+C
dimos run demo-agent
dimos topic send /human_input '"hello"'
# Ctrl+C

# Verify session saved
ls "${XDG_STATE_HOME:-$HOME/.local/state}/dimos/sessions/demo-agent/"

# Run again — history is restored automatically
dimos run demo-agent

# Start fresh
dimos run demo-agent --no-restore

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 3, 2026

Greptile Summary

This PR adds cross-run conversation history persistence to McpClient by introducing session_store.py (save/load/restore/find-latest helpers), wiring restore into on_system_modules and save into stop, and exposing restore_session/no_restore settings in GlobalConfig. Several issues from prior rounds (timestamp-only filename collision, import-time STATE_DIR, narrow exception scope, missing CLI flag wiring) remain open; the one new finding in this round is a redundant double-write of DIMOS_RUN_ID introduced by the PR.

Confidence Score: 4/5

Safe to merge with minor open issues; prior P1 findings (CLI flag wiring, filename collision) should be tracked.

The highest unresolved finding from prior rounds is P1 (session filename collision and missing CLI flag wiring); no new P0/P1 bugs were found in this round. Score is capped at 4 by the prior P1s.

dimos/core/session_store.py (filename collision, import-time STATE_DIR, exception scope) and dimos/robot/cli/dimos.py (missing --restore-session/--no-restore typer params).

Important Files Changed

Filename Overview
dimos/core/session_store.py New module for session persistence; has known issues with timestamp-only filenames (collision risk) and import-time STATE_DIR resolution flagged in prior review rounds.
dimos/agents/mcp/mcp_client.py Adds restore on on_system_modules and save on stop; logic is sound — history is correctly passed to state_graph.stream via self._history.
dimos/robot/cli/dimos.py Sets DIMOS_RUN_ID twice (redundant duplicate introduced by this PR); documented --restore-session/--no-restore CLI flags are missing from the typer signature (flagged in prior round).
dimos/core/global_config.py Adds restore_session and no_restore fields to GlobalConfig; straightforward pydantic BaseSettings additions with no issues.
dimos/core/test_session_store.py Good round-trip, find-latest, and restore tests; test_returns_empty_on_corrupt_session correctly exercises JSONDecodeError handling (file placement is accurate).

Sequence Diagram

sequenceDiagram
    participant CLI as dimos CLI (run)
    participant Env as os.environ
    participant MCP as McpClient
    participant SS as session_store
    participant Disk as ~/.local/state/dimos/sessions/

    CLI->>Env: DIMOS_RUN_ID, DIMOS_BLUEPRINT, DIMOS_STARTED_AT
    CLI->>MCP: on_system_modules()
    MCP->>SS: restore_session(blueprint, restore_session_id, no_restore)
    SS->>Disk: find_latest_session() or load by run_id
    Disk-->>SS: JSON session file
    SS-->>MCP: (history, parent_session_id)
    MCP->>MCP: _history = restored messages
    Note over MCP: agent runs, appends to _history
    CLI->>MCP: stop()
    MCP->>SS: save_session(run_id, blueprint, model, history, ...)
    SS->>Disk: write {timestamp}.json
Loading

Reviews (3): Last reviewed commit: "fix(test): use correct session filename ..." | Re-trigger Greptile

Comment thread dimos/core/session_store.py Outdated
Comment on lines +59 to +61
def _session_path(run_id: str) -> Path:
timestamp, blueprint = _parse_run_id(run_id)
return SESSIONS_DIR / blueprint / f"{timestamp}.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Session file collision on same-second runs

_session_path builds the filename from only the timestamp portion of run_id, discarding the blueprint suffix. Two runs of the same blueprint started within the same second will produce the same path — the later save_session call silently overwrites the earlier one, destroying its conversation history. The filename should use the full run_id (e.g., f"{run_id}.json") to guarantee uniqueness.

Comment on lines +118 to +122
try:
history, metadata = load_session(session_id)
return history, str(metadata["run_id"])
except FileNotFoundError:
return [], None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Narrow exception scope crashes agent startup on corrupt sessions

restore_session only catches FileNotFoundError. If the stored JSON is malformed, json.JSONDecodeError propagates from load_session; if the saved file is missing the "run_id" key, a KeyError propagates from str(metadata["run_id"]). Either exception will crash the agent during on_system_modules with no graceful fallback. The catch block should be broadened (e.g., except Exception) and log a warning before returning ([], None), matching the robustness level of find_latest_session.

Comment on lines +133 to +137
try:
data = json.loads(files[-1].read_text())
return str(data["run_id"])
except Exception:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 find_latest_session silently swallows all exceptions with a bare except Exception: return None. A permission error, decode error, or I/O failure is indistinguishable from "no sessions found", making failures very hard to diagnose. Log a warning before returning.

Suggested change
try:
data = json.loads(files[-1].read_text())
return str(data["run_id"])
except Exception:
return None
try:
data = json.loads(files[-1].read_text())
return str(data["run_id"])
except Exception:
logger.warning("Failed to read latest session file.", path=str(files[-1]))
return None

Comment thread dimos/core/session_store.py Outdated
return base / "sessions"


SESSIONS_DIR = _get_sessions_dir()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SESSIONS_DIR evaluated at import time — XDG_STATE_HOME race

SESSIONS_DIR is resolved once when the module is first imported. If XDG_STATE_HOME is set after the module is loaded (common in test environments or when .env is loaded lazily), all save/load paths will silently use the wrong base directory. Consider computing the path lazily inside each function, or accepting the directory as a parameter in tests.

Comment thread dimos/core/session_store.py Outdated

def _get_sessions_dir() -> Path:
xdg = os.environ.get("XDG_STATE_HOME")
base = Path(xdg) / "dimos" if xdg else Path.home() / ".local" / "state" / "dimos"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this variable as STATE_DIR in dimos/constants.py. Please use that.

Note, that we develop from the dev branch, and some changes to that file haven't be merged to main yet. You'll have to rebase your branch on dev and change this PR to go into dev instead of main.

Comment thread dimos/agents/agent.py Outdated
if self._thread.is_alive():
self._thread.join(timeout=2.0)

run_id = os.environ.get("DIMOS_RUN_ID", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file no longer exists in dev. You need to rebase your branch on dev on remove changes to this file.

Sorry, we haven't released to main in a while and it's gotten a bit out of date with dev.

@paul-nechifor
Copy link
Copy Markdown
Contributor

Hi, @jaswu51 . This looks really good!

The issue is that we work from the dev branch. Could you please rebase on dev?

Also, it would be nice to add some tests, or maybe a single integration test.

@jaswu51 jaswu51 force-pushed the jaswu51/feat/agent-session-restore branch from 09e539d to fe89bc9 Compare May 3, 2026 23:23
@jaswu51 jaswu51 changed the base branch from main to dev May 3, 2026 23:24
Comment on lines +104 to +114
def test_returns_empty_on_corrupt_session(self, tmp_path) -> None:
bp_dir = tmp_path / "sessions" / "demo-agent"
bp_dir.mkdir(parents=True)
(bp_dir / "20260503-120000-demo-agent.json").write_text("not json")
history, parent = restore_session(
blueprint="demo-agent",
restore_session_id="20260503-120000-demo-agent",
no_restore=False,
)
assert history == []
assert parent is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 test_returns_empty_on_corrupt_session exercises FileNotFoundError, not JSONDecodeError

restore_session calls load_session("20260503-120000-demo-agent"), which builds the path via _session_pathSTATE_DIR / "sessions" / "demo-agent" / "20260503-120000.json". The corrupt file the test places at "20260503-120000-demo-agent.json" is never read — load_session raises FileNotFoundError (missing file) rather than JSONDecodeError (corrupt content). The test passes but does not validate that restore_session handles malformed JSON gracefully; a narrowed except FileNotFoundError in restore_session would make this test pass while leaving JSON decode errors unhandled.

@jaswu51
Copy link
Copy Markdown
Author

jaswu51 commented May 3, 2026

Hi @paul-nechifor, rebased on dev and addressed all review comments:

  • Removed agent.py changes (not in dev branch)
  • Replaced _get_sessions_dir() with STATE_DIR from constants.py
  • Broadened exception handling in restore_session
  • Added unit tests for session_store

Ready for re-review! Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent Restoration

2 participants