docs: add WAL vs changeset replication tradeoffs analysis#54
docs: add WAL vs changeset replication tradeoffs analysis#54WillPapper wants to merge 3 commits intomainfrom
Conversation
Evaluates Litestream and WAL-based replication as alternatives to SyndDB's changeset-based approach. Documents why changesets are preferred for validator verification while acknowledging WAL's strengths for disaster recovery.
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive documentation analyzing the tradeoffs between WAL-based and changeset-based SQLite replication approaches for SyndDB. The document explains why SyndDB uses the SQLite Session Extension for changeset-based replication instead of WAL-based tools like Litestream, while acknowledging that WAL-based approaches have strengths for disaster recovery scenarios.
Key points:
- Compares physical (WAL) vs logical (changeset) replication approaches
- Evaluates Litestream as a WAL-based alternative
- Documents five key requirements that make changesets the better choice for SyndDB's validator verification architecture
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Restructure as a discussion-focused design doc with: - Clear options (A/B/C/D) for decision making - Key decision factors with specific questions - Effort estimates and risk levels - 10 open questions for team discussion - Suggested experiments to validate assumptions - Tentative recommendation with rationale
|
L O L I've been reading the docs for SQLite WAL Mode, Session Extension, and Litestream this morning and these are similar to the points I was going to bring up. I'll add my thoughts in the next ~hour |
|
My resources
My thoughts BEFORE reading Claude's docs above:
Big questions for me
I think we need to zoom out and define the SyndDB product requirements - who SyndDB is for and not for, and what they should and should not be able to do with it. Both the WAL and Session Extension approaches are perfectly viable, but each has tradeoffs we need to understand in order to decide how to proceed |
|
Thoughts on what Claude wrote
This is an interesting point. I am not sure what "certain operations" it refers to that aren't captured. We would need to determine if this is acceptable for us.
I don't think the Payload Size matters, iirc Ar.io (Arweave network for permanent storage) told us that up to 100 KB data uploads are free (subsidized) and GCS definitely handles such files.
I didn't know this and it wasn't explicitly discussed in docs. It seems like an unpleasant experience to specify a validator machine type Took a brief look into this further (https://www.sqlite.org/wal.html) and I am not sure if the WAL file does differ across platform, or if it does then I didn't find conclusive evidence from reading the docs and other Googling. Claude may have hallucinated this The other decision factors appear relevant and echo what I said in my comment above. |
We are developing a product called synd DB. We better become experts in SQLite, otherwise debugging issues will be impossible. We need do develop specialized expertise to have any chance of offering a quality product.
IMO the value proposition of offering "no changes needed" integration is huge! The "session extension way" requires some non-trivial changes in the application code.
I don't see a reason we particularly care about this. The only thing the storage layer is useful is for validators to reconstruct the state and sign withdrawals, and that's not a latency-sensitive part of the system. (for reference litestream does WAL checkpoints/backups every second by default)
They should assert constraints over the data. Overall reconstructing the DB state and making checks over table data should a simple / robust way to do it |
| WAL pages may differ across: | ||
| - Endianness (big vs little endian) | ||
| - Alignment/padding | ||
| - Page size configuration | ||
| - SQLite compile options |
There was a problem hiding this comment.
this is factually incorrect. https://www.sqlite.org/fileformat.html#the_write_ahead_log
| Rough comparison for a single-column UPDATE: | ||
| - WAL: 4KB page (minimum) | ||
| - Changeset: ~50-200 bytes (column value + metadata) | ||
|
|
||
| **Question:** Is bandwidth/storage cost a significant concern? |
There was a problem hiding this comment.
this looks like gibberish. How were this values calculated?
|
|
||
| --- | ||
|
|
||
| ## Hybrid Architecture (Option C) |
There was a problem hiding this comment.
this is the worst of both worlds, we won't be able to recuperate changesets from WAL backups, so the storage layer data is borked anyway in case of a failure
There was a problem hiding this comment.
yeah agree, imo there's no disaster recovery need here
I don't think that's true. Session Extension is a native API on SQLite, so changesets should be listened to and sent in the order that they're written to on the DB. I don't think we are relying on the app for DB ordering.
I think this is still true if there is an issue in the WAL version of the application we build |
|
@jorgemmsilva having listened to your explanation in standup today, I think what you are really against is the FFI client library approach where we depend on the app using the client to I think both Session Extension and WAL are viable for the reasons discussed above. To address your concern, with the Session Extension approach we could write a standalone "listener" node outside of the App but still in "VM 1" that would solely forward changesets. This would introduce another node with associated network calls and failure modes though |
https://sqlite.org/sessionintro.html#capturing_a_changeset
we're relying on the way the application is written to finish sessions and send changesets ordered correctly.
Not true, we can reconcile WAL frames with the DB / storage layer so no information is ever missing |
Is this even possible to do? we also can't guarantee that the client will call "sqlite3session_attach() with a NULL argument" Just feels like this strategy leaves a lot of implementation detail burden on the application. Which I really don't like when the alternative is "no custom implementation necessary" There is a reason litestream uses the WAL strategy |
|
I see what you are saying. I think you are correct that is a question of where to store the implementation complexity. The session extension approach means that the customer needs to use our library correctly, while the WAL approach means that we need to configure the [DB + Syndicate-Litestream-app] and make sure that works correctly. The latter is going to be more work for us but probably the more robust approach. I searched "session extension" in the Litestream Github and there are some interesting results. This is interesting - benbjohnson/litestream#129 - marked as "wont fix." I think this indicates that the WAL approach is orthogonal to saving logical changesets, if such a feature is complicated to support in Litestream |
|
I'm glad we're getting on the same page 🙂
That issue is about potentially using a "session extension format" (changesets) to share changes to a downstream client as litestream replicates WAL data. It's quite different from using the "session extension" as the replication mechanism itself. I feel like the session extension tech would be useful for CRDTs and offline apps, where one could, for example: do changes to a TODO list mobile app while offline, save those changes as a changeset and then relay the info over to a server once the device comes online again. But yeah, all logic must be purpose-built on the app side. |
Add section 6 covering changeset inversion as a key differentiator: - Document sqlite3changeset_invert() API and how it transforms operations - Explain why WAL cannot support inversion (forward-only, checkpointing) - List SyndDB use cases: validator rollback, dispute resolution, optimistic execution - Add inversion row to comparison table - Update recommendation rationale 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This is a very helpful discussion! Thank you for all of the thoughtful input @jorgemmsilva @daniilrrr. I agree that placing this discussion in the context of product goals will help us make a decision. One blind spot in our current understanding is how validators will work. Do validators need the diffs provided by changesets, or are forward-only WAL updates suitable? That product question (validator flexibility) would affect the DX and security of SyndDB more than the WAL vs changeset approach. A hard-to-write, brittle validator is more painful than a required client library or set of initialization instructions for an application. I built out some example use cases (price oracle and prediction market) that involve custom validator rules in #58 (also fixed some client bugs and some nuances in changeset handling while I was at it). That will help guide the discussion after break |

Opening this PR to allow for easy discussion @jorgemmsilva @daniilrrr. We can iterate on PR comments (and if we'd like, have Claude Code make PR updates in response to our questions)
Some of this recreates the original logic from the earlier WAL vs changeset debate, but in a more structured way
Summary
Evaluates WAL-based replication (Litestream) as an alternative to changeset-based
replication. Documents why changesets fit SyndDB's validator verification model better.
Key findings:
sqlite3changeset_invert()for surgical rollback; WAL is forward-only with no undo capabilityDiscussion points