This document assesses our current remote messaging system against the ideals of the Ken protocol, as described in HP Labs Tech Report HPL-2010-155: "Output-Valid Rollback-Recovery" by Kelly, Karp, Stiegler, Close, and Cho.
The Ken protocol provides these guarantees for fault-tolerant distributed computing:
- Exactly-once delivery in process-pairwise FIFO order
- Output validity: Outputs could have resulted from failure-free execution
- Transactional turns: One message delivered → processing → checkpoint → transmit outputs
- Consistent frontier: Most-recent per-process checkpoints always form a recovery line
- Local recovery: Crashes cause only local rollbacks, no domino effect
- Sender-based message logging: Messages persisted in sender's output queue until ACKed
- Deferred transmission: Outputs buffered during turn, transmitted only after checkpoint
A "turn" in Ken is the fundamental unit of computation:
turn_start(deliver exactly one message to processing_function)
→ processing_function executes
→ outputs buffered in Q_out (not transmitted yet)
turn_end:
→ atomically persist(turn, app_state, Q_out, Done)
→ THEN transmit buffered messages
Key aspects:
- Only one message delivered per turn
- All outputs buffered until end of turn
- Atomic checkpoint includes application state AND output queue
- Transmission happens only after checkpoint completes
Donetable tracks which messages have been processed to completion
| Property | Status | Implementation |
|---|---|---|
| Sender-based logging | ✓ | Messages persisted at remotePending.${remoteId}.${seq} |
| Sequence numbers | ✓ | seq on outgoing, highestReceivedSeq for incoming |
| Cumulative ACK | ✓ | Piggyback ACKs acknowledge all messages up to seq |
| Retransmission | ✓ | Timeout-based retransmit until ACK or max retries |
| Crash-safe persistence | ✓ | Write message first, then update nextSendSeq |
| Local recovery | ✓ | Restore seq state, restart ACK timeout |
Ken's model:
turn_start(deliver one message)
→ processing_function executes
→ outputs buffered in Q_out
turn_end:
→ atomically persist(turn, app_state, Q_out, Done)
→ THEN transmit buffered messages
Our model:
message received → kernel processes → sends outputs immediately
each output: persist message → update seq → transmit
The kernel's "crank" mechanism may provide turn-like boundaries, but RemoteHandle doesn't coordinate with it. Messages are transmitted immediately after being persisted, not deferred until end of turn.
Ken maintains a Done table ensuring:
- Each message delivered to application at most once
- FIFO ordering enforced via
next_ready()considering seq + sender ID
We track highestReceivedSeq but only for ACK purposes. We don't have explicit duplicate detection for incoming messages. If the remote retransmits a message we already processed (but before we ACKed), we could deliver it twice.
Ken guarantees outputs could have resulted from failure-free execution because:
- Outputs are buffered during a turn
- A crash during processing loses all outputs from that turn
- Only committed outputs escape to the outside world
Our system transmits immediately after persisting, so a crash mid-crank could result in:
- Some messages transmitted to remote
- But kernel state not yet committed
- On recovery, kernel re-executes and sends different/duplicate messages
Ken atomically checkpoints (turn, app_state, Q_out, Done) together at end of turn.
Our system persists messages individually as sent. There's no atomic boundary coordinating kernel state with outgoing message state.
Ken: persist checkpoint → THEN transmit
Ours: persist message → transmit immediately
Ken's approach ensures the "send" is recorded in checkpoint before any transmission. This is crucial for the consistent frontier property.
Ken can opportunistically persist incoming messages before delivery. On crash, the input queue is reconstructed from sender retransmissions.
We don't persist incoming messages. On crash, we rely entirely on senders to retransmit.
| Ken Property | Our System | Notes |
|---|---|---|
| Exactly-once delivery | Partial | At-least-once with no duplicate detection |
| Output validity | No | Immediate transmission, no turn boundaries |
| Transactional turns | No | No coordination with kernel cranks |
| Consistent frontier | Partial | No atomic checkpoint across kernel+remote state |
| Local recovery | Yes | Crashes don't affect other processes |
| Sender-based logging | Yes | Messages persisted until ACKed |
| FIFO ordering | Partial | Per-sender seq, but no enforcement on receive side |
Buffer outgoing messages during crank execution, persist and transmit only at crank commit. This would require:
RemoteHandleto be aware of crank boundaries- Outgoing messages buffered in memory during crank
- Batch persist + transmit at crank commit
Track processed message IDs, deduplicate on receive:
- Persist
Donetable entries for processed messages - On receive, check if message already in
Donebefore delivering - ACK messages in
Donewithout re-delivering
Persist kernel state and output queue together:
- Single atomic write at end of crank
- Include: kernel state, outgoing messages, Done table updates
- Requires coordination between kernel store and remote message persistence
Transmit only after checkpoint completes:
- Buffer messages during turn
- After atomic checkpoint succeeds, release messages for transmission
- This ensures "send" is recorded before any transmission occurs
Hold out-of-order messages until predecessors processed:
- Track expected next seq per sender
- Buffer messages that arrive out of order
- Deliver in sequence order only
The most significant change would be integrating RemoteHandle with the kernel's crank/commit cycle. Currently:
Kernel Crank:
process message → syscalls may send to remote
RemoteHandle (independent):
persist each outgoing message → transmit immediately
Ken-style architecture:
Kernel Crank:
process message → syscalls buffer outputs
Crank Commit (atomic):
persist(kernel_state, buffered_outputs, done_table)
Post-Commit:
transmit buffered outputs
This would require the kernel to control when RemoteHandle actually transmits, rather than RemoteHandle transmitting independently.
- HP Labs Tech Report HPL-2010-155: "Output-Valid Rollback-Recovery"
- Ken project: https://web.eecs.umich.edu/~tpkelly/Ken/
- Waterken (Ken implementation in Java): http://waterken.sourceforge.net/