Skip to content

fix(ui): recover legacy sessions with export fallback#444

Closed
pascalandr wants to merge 4 commits into
NeuralNomadsAI:devfrom
Pagecran:fix/opencode-large-session-load-clean
Closed

fix(ui): recover legacy sessions with export fallback#444
pascalandr wants to merge 4 commits into
NeuralNomadsAI:devfrom
Pagecran:fix/opencode-large-session-load-clean

Conversation

@pascalandr
Copy link
Copy Markdown
Contributor

@pascalandr pascalandr commented May 14, 2026

Summary

  • recover legacy OpenCode sessions that still appear in session.list but fail to open in CodeNomad because the transcript endpoint rejects their historical message payload
  • add a narrow recovery path that falls back to opencode export only for that known legacy validation failure
  • keep normal sessions on the existing session.messages flow and harden the fallback so it does not become a new source of hangs or memory spikes

Problem

The broken sessions are not missing from OpenCode entirely.

They still show up in session.list, so CodeNomad can discover them and render them in the session sidebar. The failure happens later, when CodeNomad tries to hydrate the transcript by calling the OpenCode v2 session.messages endpoint.

For some historical sessions, at least one assistant message does not satisfy the current v2 response schema anymore. The concrete legacy shape we reproduced is a missing info.agent field inside a transcript message. When that happens, the OpenCode endpoint rejects the entire session.messages response during validation and returns a 400 body-validation error instead of returning partial transcript data.

That means CodeNomad never receives a message list to hydrate at all. This is why the session can still be visible in the UI while opening it fails.

The error we reproduced on the affected sessions is:

Missing key
  at [1]["info"]["agent"]

Why This Is Not Just A UI Hydration Fix

A UI-only compatibility layer would only help if CodeNomad received the old transcript payload and merely failed to render or normalize it.

That is not what happens here.

The blocking failure occurs inside the OpenCode session.messages API path itself, before CodeNomad can hydrate anything. Since the endpoint returns a validation error instead of message data, CodeNomad cannot repair the payload in memory because it never gets access to the transcript through the normal API route.

That is why this PR uses opencode export as a read fallback for a very specific legacy failure mode rather than trying to reinterpret session.messages client-side.

Recovery Strategy

The new behavior is:

  1. CodeNomad first uses the normal session.messages request path.
  2. If that call succeeds, nothing changes.
  3. If that call fails with the specific legacy body-validation signature for missing info.agent, CodeNomad falls back to opencode export <sessionId>.
  4. The exported transcript is then used only as a replacement read source for hydration.

Important scope limits:

  • this does not export and re-import sessions
  • this does not mutate or migrate session data on disk
  • this does not bypass normal API loading for healthy sessions
  • this does not swallow arbitrary schema failures; it is restricted to the known legacy missing-agent validation path

Hardening In This PR

Because the affected transcripts can be very large, the fallback path is also hardened so the recovery mechanism does not introduce a second failure mode.

Specifically:

  • the fallback matcher is restricted to the reproduced legacy validation path (["info"]["agent"]), so unrelated schema regressions still surface normally
  • the server exports to a temporary file and streams that file back instead of buffering the full export payload in memory
  • the export subprocess is abortable and protected by a timeout so a stuck export cannot hang the request forever
  • temporary export files are cleaned up when the request finishes or is aborted

Files Changed

  • packages/ui/src/stores/session-message-source.ts
    • adds the client-side message-loading fallback entry point
  • packages/ui/src/stores/session-message-fallback.ts
    • isolates the strict legacy error matcher
  • packages/server/src/workspaces/manager.ts
    • adds the server-side export-to-temp-file helper with timeout and cleanup
  • packages/server/src/server/routes/workspaces.ts
    • exposes the export stream endpoint used only by the fallback path
  • packages/ui/src/stores/session-api.ts
    • routes transcript hydration through the guarded loader
  • packages/ui/src/stores/session-state.ts
    • reuses the same guarded loader for deep blank-session inspection
  • packages/ui/src/lib/api-client.ts
    • adds the typed server API call for the export endpoint

Validation

  • npm run typecheck --workspace @codenomad/ui
  • npx tsx --test packages/ui/src/stores/session-message-fallback.test.ts
  • npm run build --workspace @codenomad/tauri-app

Notes

  • npm run typecheck --workspace @neuralnomads/codenomad currently fails on the moving dev base because of unrelated server dependency/type drift, not because of this change
  • touched large files worth future refactor attention:
    • packages/server/src/workspaces/manager.ts
    • packages/ui/src/stores/session-api.ts
    • packages/ui/src/stores/session-state.ts

Some historical OpenCode sessions still appear in session.list but fail to open in CodeNomad because the strict v2 session.messages response validation rejects transcripts that are missing assistant metadata such as info.agent. This left the session visible in the UI while the transcript pane stayed empty.

When that specific schema-validation failure happens, CodeNomad now falls back to an opencode export for the affected session and hydrates the transcript from the CLI-compatible export payload instead of abandoning the load. The fallback is scoped to malformed legacy transcripts so normal session loads still use the standard API path.

Validation: npm run typecheck --workspace @codenomad/ui; npm run build --workspace @codenomad/tauri-app. Note: npm run typecheck --workspace @neuralnomads/codenomad currently fails on this origin/dev base because of unrelated server dependency/type drift.
@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

Hey @pascalandr

  • recover legacy OpenCode sessions that appear in session.list but fail transcript hydration because session.messages rejects malformed historical assistant metadata

Interesting. Is it failing in OpenCode end or CodeNomad end?

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

only in codenomad (any version on windows), opencode have no problem with the malformed metadata

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

If I understand correctly, this tries to export and import the old style session.
If, that's true, I would expect the session hydration to support the old session data instead of export / import

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

If I understand correctly, this tries to export and import the old style session. If, that's true, I would expect the session hydration to support the old session data instead of export / import

right, still working on it, but there's no import, only export because this is the only working way:
The UI cannot hydrate these sessions directly because the OpenCode session.messagesendpoint itself rejects the legacy payload, so this uses opencode export as a read fallback. I have updated the description with lot more details

The initial legacy-session recovery path worked, but it was still too broad and too heavy for very large transcripts. Tighten the fallback to the exact missing-agent validation signature, stream export output through a temp file instead of buffering the full payload in memory, and abort stalled exports so the recovery path cannot hang indefinitely.

Also add focused coverage for the legacy error matcher so unrelated schema regressions keep surfacing normally instead of being hidden behind the fallback.
@pascalandr
Copy link
Copy Markdown
Contributor Author

@CodeNomadBot

I want you to perform the gatekeeper checks for latest PR state

check for regressions
check for better possible implementations
Be the PR gatekeeper
Be ruthless about code quality

@codenomadbot
Copy link
Copy Markdown
Contributor

codenomadbot Bot commented May 14, 2026

Gatekeeper review for latest PR state.

I understood the request as: review PR #444 ruthlessly for regressions and implementation quality, especially around the legacy session.messages failure fallback. Assumption: this is review-only; I did not change code.

Blocking finding

  1. packages/ui/src/stores/session-api.ts:739-764, packages/ui/src/stores/session-message-source.ts:21-22, packages/ui/src/lib/api-client.ts:219-220, packages/server/src/workspaces/manager.ts:133-135 - the fallback drops the worktree context. The normal load path correctly resolves worktreeSlug and creates a worktree-scoped OpenCode client, but fetchSessionMessages() only receives instanceId and the fallback calls /api/workspaces/:id/sessions/:sessionId/export. The server then runs opencode export with cwd: workspace.path, i.e. the root workspace. For any legacy session mapped to a non-root worktree, the primary session.messages call is sent to the right worktree proxy, but the recovery export is executed from the wrong directory. That can return the wrong transcript, fail to find the session, or make legacy worktree sessions unrecoverable. This should stay scoped exactly like the normal request path: pass the resolved worktree slug into the fallback and expose an export route that resolves the worktree directory before spawning opencode export.

Non-blocking but should fix before merge if touching this area

  1. packages/ui/src/stores/session-message-source.ts:22-23 - fallback response shape failures are silently converted into an empty transcript. If opencode export succeeds but returns a different JSON shape, partial data, or a future schema where messages is absent/non-array, the UI returns []; loadMessages() then marks the session as loaded and displays it as blank. For a recovery path whose whole purpose is avoiding transcript loss, this is too forgiving. Validate the export response and throw/log a hard fallback failure when messages is not an array instead of presenting an empty session.

Implementation direction

The export fallback itself is reasonable as a narrow workaround if session.messages is genuinely rejecting before CodeNomad can hydrate. The missing piece is making it mirror the existing worktree proxy semantics. Minimal shape I’d expect:

  • fetchSessionMessages(instanceId, sessionId, client, worktreeSlug) or equivalent context object.
  • serverApi.exportSessionData(instanceId, worktreeSlug, sessionId).
  • Server route under a worktree-aware path, or a route/query parameter that validates and resolves worktreeSlug using the same logic as the proxy.
  • exportSessionDataToFile() should spawn with the resolved directory, not always workspace.path.

Validation attempted

  • npm run typecheck --workspace @codenomad/ui failed in this worktree because dependencies/types are not installed or resolvable (solid-js, @opencode-ai/sdk/v2/client, node:test, etc.).
  • npm run typecheck --workspace @neuralnomads/codenomad failed immediately with missing @types/node.

Those validation failures look environmental/base-dependency related rather than conclusively caused by this PR, but they mean I could not prove the branch clean locally.

Verdict

Request changes. The fallback is too close to merge to ignore the worktree-context regression; it risks fixing root legacy sessions while leaving worktree sessions broken or incorrectly hydrated.

--
Yours,
CodeNomadBot

The legacy-session export fallback was correctly recovering malformed root transcripts, but it could still read the wrong session data for worktree-backed sessions because the recovery export always ran from the workspace root. Make the fallback follow the same worktree context as the normal transcript request so legacy worktree sessions resolve against the correct directory.

Tighten the recovery path further by failing hard when the export payload does not contain a transcript array instead of silently presenting an empty session, and keep the matcher/tests focused on the reproduced missing-agent validation failure. Validation: npx tsx --test packages/ui/src/stores/session-message-fallback.test.ts; npm run typecheck --workspace @codenomad/ui; npm run build --workspace @codenomad/tauri-app
@github-actions
Copy link
Copy Markdown

PR builds are available as GitHub Actions artifacts:

https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/25870026173

Artifacts expire in 7 days.
Artifacts:

  • pr-444-f7e58381766b80cb2301f75a7196a586f268a108-tauri-macos-arm64
  • pr-444-f7e58381766b80cb2301f75a7196a586f268a108-electron-windows
  • pr-444-f7e58381766b80cb2301f75a7196a586f268a108-electron-linux

@pascalandr
Copy link
Copy Markdown
Contributor Author

@CodeNomadBot
same review

@codenomadbot
Copy link
Copy Markdown
Contributor

codenomadbot Bot commented May 14, 2026

Gatekeeper review for the updated PR state.

I understood “same review” as: re-run the ruthless PR gatekeeper pass after the latest updates, focused on regressions and implementation quality. Assumption: review-only; I did not change code.

Findings

No blocking findings in the latest state.

The previous blocking issue I raised is addressed: the fallback now carries worktreeSlug from the normal message-load path, calls a worktree-scoped export endpoint, resolves the actual worktree directory server-side, and spawns opencode export with that directory as cwd.

Non-blocking quality notes

  1. packages/server/src/workspaces/manager.ts:177-184 - the export helper does not check options.signal.aborted before registering the abort listener/spawning. If the request is already aborted during the route’s workspace/worktree resolution phase, addEventListener("abort", ...) will not fire retroactively and the export can still start, only stopping on timeout or process exit. This is not a correctness blocker for returned data, but for the “no hangs/no wasted subprocesses” goal, add an early if (options?.signal?.aborted) throw new Error("Session export aborted") before spawn().

  2. packages/server/src/workspaces/manager.ts:140 - the temp output filename is derived directly from sessionId. OpenCode session IDs are likely safe in practice, but this is still a local file path boundary in a server route. A future odd ID or manually-hit route containing path separators could make cleanup/streaming behavior less predictable. Safer minimal implementation: write to a fixed filename inside the temp dir, e.g. session.json, since the temp directory is already unique.

Implementation assessment

The current approach is now appropriately narrow:

  • normal sessions stay on session.messages
  • fallback only triggers on the reproduced missing info.agent body-validation signature
  • export is streamed from a temp file rather than buffered wholesale
  • malformed export responses now throw instead of silently marking a transcript blank
  • worktree context is preserved in both UI and server layers

I do not see a better lower-risk implementation unless OpenCode exposes a tolerant/raw transcript endpoint. Given the stated failure occurs before CodeNomad receives message data, client-side hydration compatibility alone would not solve this class of legacy sessions.

Validation attempted

  • npx tsx --test packages/ui/src/stores/session-message-fallback.test.ts passed: 3 tests.
  • npm run typecheck --workspace @codenomad/ui failed in this worktree because dependencies/types are not resolvable (solid-js, @opencode-ai/sdk/v2/client, node:test, etc.). This matches the local environment issue seen in the prior review, so I could not use it to prove or disprove this PR.
  • npm run typecheck --workspace @neuralnomads/codenomad failed immediately with missing @types/node, also consistent with the local environment/base dependency state.

Verdict

No blocking regressions found in the updated PR. I would still tighten the pre-aborted signal handling and temp filename before merge if you want the fallback hardening to be fully boring, but I would not block on either unless the project treats local route hardening as mandatory for this release.

--
Yours,
CodeNomadBot

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

hey @pascalandr

3. If that call fails with the specific legacy body-validation signature for missing info.agent, CodeNomad falls back to opencode export <sessionId>.

  1. Do we get this error from OpenCode?
  2. How does OpenCode work with the same session. You mentioned it works with opencode, is it still the case ?
  3. If even OpenCode isn't supporting those old sessions, why should we?

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

To clarify: OpenCode itself still reads these sessions fine; the incompatibility is in the SDK used by CodeNomad, where session.messages rejects some session payloads that the main OpenCode client still accepts.
I need this fix because a quarter of my used cessions are broken without it in codenomad.
These are not so old sessions; they are sessions whose payload is still accepted by OpenCode itself but rejectedd by the newer SDK path used by CodeNomad.

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

You maybe highlighting some deeper issue here, OpenCode Desktop uses the same SDK that CodeNomad does.
If it is able to work, it means CodeNomad is

a. Either doing something incorrect
b. Using wrong / older version of SDK

I think instead of trying a band-aid fix, we should investigate why it fails in CodeNomad.

Would you be able to

  1. Try loading the same session in OpenCode desktop with Developer Tools to capture the complete request and response.
  2. Try loading the same session in CodeNomad and compare the request and response.

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

oh I never spoked about Opencode desktop, did you ?

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

TUI also uses the same SDK, IIRC

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

well as I said already this cessions open with zero issue or message in opencode cli and doesn't open at all in codnomad. that all I can say, the agent found a solution by testing against the sdk. #444 (comment)

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

CLI / TUI also works via the SDK so if it can work, then definitely some issue in CodeNomad.
Agent might not know and is trying one way to make this work.

Would you be able to send me one of your problematic sessions by exporting?
I can figure out the cause and try to fix it properly.

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

once the cession is exported it isn't broken anymore in Codenomad. So I agree we don't understand well what's happening here, working on another solution by trying to repair the cessions instead of supporting them.

@shantur
Copy link
Copy Markdown
Collaborator

shantur commented May 14, 2026

We can understand the issue better if you try to load the session in opencode desktop and keep the dev tools open to capture request / response. Once we have that we can see what different request / response is CodeNomad doing.

A curl request can also be used to query the session messages to understand what's going on.

@pascalandr
Copy link
Copy Markdown
Contributor Author

ok so I'm going to install opencode desktop

@pascalandr
Copy link
Copy Markdown
Contributor Author

Error: Missing key
at [1]["info"]["agent"]
at wrapClientError (oc://renderer/assets/main-DlaiLIQY.js:62826:12)
at request (oc://renderer/assets/main-DlaiLIQY.js:59067:28)
at async retry (oc://renderer/assets/main-DlaiLIQY.js:63575:14)
at async fetchMessages (oc://renderer/assets/main-DlaiLIQY.js:65394:24)
at async loadMessages (oc://renderer/assets/main-DlaiLIQY.js:65418:7)
at async Promise.all (index 1)
at async oc://renderer/assets/main-DlaiLIQY.js:65576:13

────────────────────────────────────────
Causé par :
{
"body": {
"name": "BadRequest",
"data": {
"message": "Missing key\n at [1]["info"]["agent"]",
"kind": "Body"
}
},
"status": 400
}

@pascalandr
Copy link
Copy Markdown
Contributor Author

pascalandr commented May 14, 2026

and still no issue in TUI, so I guess finally the sdk is not used the same way in each of them

@pascalandr
Copy link
Copy Markdown
Contributor Author

supersed by #450

@pascalandr pascalandr closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants