Skip to content

Conversation

@FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Jan 21, 2026

  • Persist pending messages to storage so queued messages survive kernel restarts
  • Persist relevant RemoteHandle state (nextSendSeq, highestReceivedSeq, startSeq)
  • Implement crash-safe write ordering so the state of retransmission is not made inconsistent by system crashes

Closes #691


Note

Implements durable, crash-safe sequencing for remote messaging and adds store APIs to support it.

  • Persist RemoteHandle pending messages (as strings) and sequence state (nextSendSeq, highestReceivedSeq, startSeq) via KernelStore; include persistence of highestReceivedSeq on receive
  • Restore state on startup and recover from crashes (repair nextSendSeq/startSeq, handle orphan messages); retransmit pending from storage with retry/timeout logic and queue capacity checks
  • Use crash-safe write/dequeue ordering (persist message → startSeq → nextSendSeq; ACK updates startSeq before deleting messages) and lazy cleanup for orphaned entries
  • Add KernelStore methods: get/setRemoteNextSendSeq, setRemoteHighestReceivedSeq, setRemoteStartSeq, get/set/deletePendingMessage, getRemoteSeqState, deleteRemotePendingState; ensure cleanup when deleting a remote
  • Extensive tests for persistence, recovery, ACK handling, and store methods

Written by Cursor Bugbot for commit 1d44443. This will update automatically on new commits. Configure here.

@FUDCo FUDCo marked this pull request as ready for review January 21, 2026 01:48
@FUDCo FUDCo requested a review from a team as a code owner January 21, 2026 01:48
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

// Start ACK timeout if this is the first pending message
if (this.#pendingMessages.length === 1) {
this.#kernelStore.setRemoteStartSeq(this.remoteId, seq);
// Start ACK timeout for the first pending message
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash-safe ordering persists startSeq too late

Medium Severity

The crash-safe write ordering persists setRemoteNextSendSeq before setRemoteStartSeq. If a crash occurs between these two writes when sending the first message, startSeq will default to 0 on recovery while nextSendSeq is correctly restored. The recovery code only repairs nextSendSeq (checking for message at nextSendSeq + 1), not startSeq. This causes #getPendingCount() to return an inflated value (e.g., 2 instead of 1), potentially causing premature rejection at MAX_PENDING_MESSAGES.

Additional Locations (1)

Fix in Cursor Fix in Web

FUDCo and others added 3 commits January 20, 2026 20:01
Remove in-memory pendingMessages array in favor of deriving queue state
from persisted startSeq/nextSendSeq values. Messages are stored as plain
strings without sendTimestamp. Implements crash-safe persistence for
remote message queues (issue #691).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes crash-safe write ordering bug where startSeq was persisted after
nextSendSeq. Now: message → startSeq (if first) → nextSendSeq.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Scan for message at seq 1 even when getRemoteSeqState returns undefined.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FUDCo FUDCo force-pushed the chip/persistent-message-queues branch from abec9bf to 1d44443 Compare January 21, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote comms: Persistent message queue

2 participants