feat(agent): persist and restore conversation history across runs#1963
feat(agent): persist and restore conversation history across runs#1963jaswu51 wants to merge 7 commits intodimensionalOS:devfrom
Conversation
Greptile SummaryThis PR adds cross-run conversation history persistence to Confidence Score: 4/5Safe to merge with minor open issues; prior P1 findings (CLI flag wiring, filename collision) should be tracked. The highest unresolved finding from prior rounds is P1 (session filename collision and missing CLI flag wiring); no new P0/P1 bugs were found in this round. Score is capped at 4 by the prior P1s. dimos/core/session_store.py (filename collision, import-time STATE_DIR, exception scope) and dimos/robot/cli/dimos.py (missing --restore-session/--no-restore typer params). Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI as dimos CLI (run)
participant Env as os.environ
participant MCP as McpClient
participant SS as session_store
participant Disk as ~/.local/state/dimos/sessions/
CLI->>Env: DIMOS_RUN_ID, DIMOS_BLUEPRINT, DIMOS_STARTED_AT
CLI->>MCP: on_system_modules()
MCP->>SS: restore_session(blueprint, restore_session_id, no_restore)
SS->>Disk: find_latest_session() or load by run_id
Disk-->>SS: JSON session file
SS-->>MCP: (history, parent_session_id)
MCP->>MCP: _history = restored messages
Note over MCP: agent runs, appends to _history
CLI->>MCP: stop()
MCP->>SS: save_session(run_id, blueprint, model, history, ...)
SS->>Disk: write {timestamp}.json
Reviews (3): Last reviewed commit: "fix(test): use correct session filename ..." | Re-trigger Greptile |
| def _session_path(run_id: str) -> Path: | ||
| timestamp, blueprint = _parse_run_id(run_id) | ||
| return SESSIONS_DIR / blueprint / f"{timestamp}.json" |
There was a problem hiding this comment.
Session file collision on same-second runs
_session_path builds the filename from only the timestamp portion of run_id, discarding the blueprint suffix. Two runs of the same blueprint started within the same second will produce the same path — the later save_session call silently overwrites the earlier one, destroying its conversation history. The filename should use the full run_id (e.g., f"{run_id}.json") to guarantee uniqueness.
| try: | ||
| history, metadata = load_session(session_id) | ||
| return history, str(metadata["run_id"]) | ||
| except FileNotFoundError: | ||
| return [], None |
There was a problem hiding this comment.
Narrow exception scope crashes agent startup on corrupt sessions
restore_session only catches FileNotFoundError. If the stored JSON is malformed, json.JSONDecodeError propagates from load_session; if the saved file is missing the "run_id" key, a KeyError propagates from str(metadata["run_id"]). Either exception will crash the agent during on_system_modules with no graceful fallback. The catch block should be broadened (e.g., except Exception) and log a warning before returning ([], None), matching the robustness level of find_latest_session.
| try: | ||
| data = json.loads(files[-1].read_text()) | ||
| return str(data["run_id"]) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
find_latest_session silently swallows all exceptions with a bare except Exception: return None. A permission error, decode error, or I/O failure is indistinguishable from "no sessions found", making failures very hard to diagnose. Log a warning before returning.
| try: | |
| data = json.loads(files[-1].read_text()) | |
| return str(data["run_id"]) | |
| except Exception: | |
| return None | |
| try: | |
| data = json.loads(files[-1].read_text()) | |
| return str(data["run_id"]) | |
| except Exception: | |
| logger.warning("Failed to read latest session file.", path=str(files[-1])) | |
| return None |
| return base / "sessions" | ||
|
|
||
|
|
||
| SESSIONS_DIR = _get_sessions_dir() |
There was a problem hiding this comment.
SESSIONS_DIR evaluated at import time — XDG_STATE_HOME race
SESSIONS_DIR is resolved once when the module is first imported. If XDG_STATE_HOME is set after the module is loaded (common in test environments or when .env is loaded lazily), all save/load paths will silently use the wrong base directory. Consider computing the path lazily inside each function, or accepting the directory as a parameter in tests.
|
|
||
| def _get_sessions_dir() -> Path: | ||
| xdg = os.environ.get("XDG_STATE_HOME") | ||
| base = Path(xdg) / "dimos" if xdg else Path.home() / ".local" / "state" / "dimos" |
There was a problem hiding this comment.
We have this variable as STATE_DIR in dimos/constants.py. Please use that.
Note, that we develop from the dev branch, and some changes to that file haven't be merged to main yet. You'll have to rebase your branch on dev and change this PR to go into dev instead of main.
| if self._thread.is_alive(): | ||
| self._thread.join(timeout=2.0) | ||
|
|
||
| run_id = os.environ.get("DIMOS_RUN_ID", "") |
There was a problem hiding this comment.
This file no longer exists in dev. You need to rebase your branch on dev on remove changes to this file.
Sorry, we haven't released to main in a while and it's gotten a bit out of date with dev.
|
Hi, @jaswu51 . This looks really good! The issue is that we work from the Also, it would be nice to add some tests, or maybe a single integration test. |
09e539d to
fe89bc9
Compare
| def test_returns_empty_on_corrupt_session(self, tmp_path) -> None: | ||
| bp_dir = tmp_path / "sessions" / "demo-agent" | ||
| bp_dir.mkdir(parents=True) | ||
| (bp_dir / "20260503-120000-demo-agent.json").write_text("not json") | ||
| history, parent = restore_session( | ||
| blueprint="demo-agent", | ||
| restore_session_id="20260503-120000-demo-agent", | ||
| no_restore=False, | ||
| ) | ||
| assert history == [] | ||
| assert parent is None |
There was a problem hiding this comment.
test_returns_empty_on_corrupt_session exercises FileNotFoundError, not JSONDecodeError
restore_session calls load_session("20260503-120000-demo-agent"), which builds the path via _session_path → STATE_DIR / "sessions" / "demo-agent" / "20260503-120000.json". The corrupt file the test places at "20260503-120000-demo-agent.json" is never read — load_session raises FileNotFoundError (missing file) rather than JSONDecodeError (corrupt content). The test passes but does not validate that restore_session handles malformed JSON gracefully; a narrowed except FileNotFoundError in restore_session would make this test pass while leaving JSON decode errors unhandled.
|
Hi @paul-nechifor, rebased on dev and addressed all review comments:
Ready for re-review! Many thanks! |
Problem
Agent
_historyis lost every timedimos runexits. Users have no way to continue a previous conversation across runs. Closes #1898Solution
dimos/core/session_store.py: saves/loads/restores conversation history as JSON under$XDG_STATE_HOME/dimos/sessions/<blueprint>/(defaults to~/.local/state/dimos/on Linux)AgentandMcpClientnow restore history on startup and save on stop--restore-session=<run_id>CLI flag to restore a specific session--no-restoreCLI flag to start with a clean historySession files track
parent_session_idto maintain the chain across runs.Breaking Changes
None
How to Test