Skip to content

feat(payload): positional-cursor /check paging; remove the spool#52

Open
andinux wants to merge 14 commits into
codex/chunked-payloads-networkfrom
codex/positional-cursor-chunks
Open

feat(payload): positional-cursor /check paging; remove the spool#52
andinux wants to merge 14 commits into
codex/chunked-payloads-networkfrom
codex/positional-cursor-chunks

Conversation

@andinux

@andinux andinux commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replace the server-side download spool with a positional cursor on
cloudsync_payload_chunks, so the /check job can page a tenant's change window
one chunk per round-trip with no temporary table. This is the extension half of the
change; the companion server PR is sqliteai/cloudsync#52.

This branch also carries three client send-path fixes found while benchmarking
the change — see Send-path fixes.

Why

The spool stages a whole /check window into a cloudsync_payload_spool table on
the tenant, then pages it out. That has three operational costs this PR removes:

  1. Privilegespool_fill issues CREATE TABLE, which the tenant's default
    readwrite apikey is not authorized for → failures.check: database_permission_denied
    (the original bug; the bench polled forever). Positional needs no DDL.
  2. Disk — the spool temporarily ~doubles tenant DB space per requesting client.
    Positional writes nothing to the tenant.
  3. Ops — no provisioning / migration / dbadmin-fallback to keep the spool table
    creatable.

How it works

cloudsync_payload_chunks gains an optional positional cursor:

  • inputs resume_db_version, resume_seq, resume_frag_offset — start the scan
    at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset;
  • outputs next_db_version, next_seq, next_frag_offset, is_final — where the
    emitted chunk stopped.

The server pages with … LIMIT 1, threading each chunk's next_* back as the next
resume_* under a pinned until watermark, until is_final. Tiling is exact
(db_version, seq) is a unique total order, so each change is emitted in exactly one
chunk with no overlap and no reliance on changes-level idempotency. The first fetch
uses the legacy exclusive since (drain-start "exclusive-after" rule).

The /check client wire protocol is unchanged — the positional cursor is
server-internal (job ↔ node); the client still pages by an integer artifact cursor.
The positional cursor itself touches no client code; the send-path fixes below are a
separate, unrelated change that does modify network.c.

Feature comparison

positional cursor (new) spool (old)
Tenant CREATE TABLE privilege none required (the bug)
Tenant disk during /check none ~doubles per client
Provisioning / migration none spool table must exist
Node-side state none (cursor is the job's loop var) spool rows per stream
Client wire protocol unchanged unchanged
Drain cost O(N²) (see below) O(N)

Performance (test/chunk_bench.c, local SQLite, end-to-end paging)

window (multi-version) spool O(N) positional O(N²) ratio
100 chunks 164 ms 872 ms 5.3×
200 chunks 350 ms 3,439 ms 9.8×

Chunk counts/bytes match exactly (positional is correct, just slower); the ratio
doubles as the window doubles ⇒ positional is O(N²). Root cause: each resume runs
… FROM cloudsync_changes WHERE (db_version,seq) ≥ cursor ORDER BY db_version, seq,
and cloudsync_changes has no (db_version, seq) index, so it re-scans + re-sorts
the window every call.

Accepted trade-off: /check is an async background job, so the O(N²) is node CPU
on cold-start/large windows — fine for typical windows and mitigatable. The real O(N)
fix is a follow-up (indexed (db_version, seq) seek on cloudsync_changes);
test/chunk_bench.c is kept as the guard to confirm it flattens the curve.

Changes

  • cloudsync_payload_chunks positional cursor — SQLite vtab (cloudsync_sqlite.c)
    and PostgreSQL SRF (cloudsync_postgresql.c + SQL signatures). Legacy callers
    (send path) unchanged; columns appended, positional branch only active when
    resume_db_version is bound.
  • Removed cloudsync_payload_spool table + spool_fill/spool_drop/drop_chunk
    on both engines, the SQLite Payload Download Spool unit test, and the spool
    sections of the PG chunk test.
  • New tests on both engines: byte-for-byte tiling incl. a mid-fragment resume, plus
    an end-to-end drain→apply→content-hash round-trip.
  • test/chunk_bench.c — local positional-drain benchmark / O(N²)→O(N) guard.

Send-path fixes (independent of the positional cursor)

Three pre-existing bugs in the unreleased 1.1 chunked send path, surfaced while
benchmarking. No CHANGELOG entries (introduced and fixed within the same unreleased
version).

  1. Send announces the covered window, not the raw change range.
    cloudsync_network_send_changes announced each chunk's actual [db_version_min, db_version_max] to /apply. When the local db_version clock has run ahead of the
    site's own changes — after merging applied remote changes, or any transaction that
    produced no local-site change — the skipped versions are absent from the sent
    ranges, so the server's per-site coverage (ComputeGaps walks it from version 1)
    reports them as gaps and never advances lastOptimisticVersion to the local
    version; the send then churns and never reaches synced. Now it announces the
    covered window [min(running_lo, chunk_min) … chunk_max], running_lo chained
    from send_checkpoint+1, tiling [send_checkpoint+1 … watermark] contiguously.
    The min(…, chunk_min) clamp keeps min ≤ max for consecutive chunks that share
    a db_version (a value fragmented across chunks); a partial-fragment failure stays
    safe because the receiver stages incomplete fragments (0 applied rows), so coverage
    is recorded only once a value is whole.

  2. lastOptimisticVersion taken latest-valid, not monotonic-max.
    network_sync_state_update_from_response kept the maximum optimistic/confirmed
    version across a multi-chunk send. The server moves the optimistic version
    backward when a later chunk fails (rollback), and that value becomes the durable
    send checkpoint — so the max masked the rollback and advanced the checkpoint past
    rolled-back changes (silent data loss). Now it takes the latest valid (>= 0)
    value, matching how gaps is already handled.

  3. Client fail-fast on a non-retryable failures.check — surfaces a permanent
    error immediately instead of polling until timeout.

New network-layer unit-test harness (make network-unittest, test/network_unit.c),
built with networking compiled in so network.c's internal functions run on in-memory
buffers with no server — the layer previously had none. Plus an e2e repro
test_send_gap_from_clock_hole that forces a clock hole with cloudsync_db_version_next()
and asserts serverVersion reaches localVersion (red before fix 1, green after).

Verification

  • SQLite unit suite passes; PG chunk tests (52/53/54/55) pass on PostgreSQL 17 with
    the spool removed.
  • make network-unittest passes (version fold, announce-window tiling — leading
    hole / inter-chunk hole / same-version fragment — and status computation).
  • Full basic e2e suite green against a live server, including the new
    test_send_gap_from_clock_hole repro.

Follow-ups (not blocking)

  • Indexed (db_version, seq) seek on cloudsync_changes → positional drain O(N).
  • /check job-duration metric/alert for pathological large windows.

🤖 Generated with Claude Code

andinux and others added 7 commits June 26, 2026 18:08
cloudsync_payload_chunks could only resume a window from since>db_version,
so a chunk boundary that lands inside a single committed db_version (or
inside a fragmented oversized value) was not addressable — which is the
whole reason the server stages the stream into a cloudsync_payload_spool
table to page it out. This makes the vtab resumable by position instead.

Add an optional positional cursor: hidden inputs resume_db_version,
resume_seq, resume_frag_offset start the scan at (db_version, seq)
inclusive and re-enter a mid-value fragment at a byte offset; new outputs
next_db_version, next_seq, next_frag_offset and is_final report where the
emitted chunk stopped. A stateless /check can then page the whole window
with an O(1) seek per call (vs O(N^2) replay-from-since), no spool table,
no server-side state. Legacy since>db_version callers (send path, spool
fill) are unchanged: columns are appended and the positional branch only
activates when resume_db_version is bound.

Tiling is exact, not idempotent-overlap: (db_version, seq) is a unique
total order (changes rowid = (db_version<<30)|seq), next_* names the row
that did not fit or the exact byte already emitted, and the fragment plan
is a deterministic function of the row, so a resumed fragment tiles
identically. The drain-start cursor must seek exclusive-after the last
applied change (see docs/internal design note) so the protocol never
relies on changes-level idempotency to absorb a re-sent row.

New unit test Payload Chunks Positional Resume drives a window mixing a
db_version split across chunks with a value larger than the chunk budget,
pages it one chunk per call via the cursor, and asserts byte-identity
with a full-window scan (including a mid-fragment resume).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… SRF

Mirror the SQLite vtab's positional cursor on the PostgreSQL SRF so the
/check job can page the node one chunk per round-trip without the
cloudsync_payload_spool table. Adds inputs resume_db_version, resume_seq,
resume_frag_offset (start the scan at (db_version, seq) inclusive and
re-enter a mid-value fragment at a byte offset) and outputs
next_db_version, next_seq, next_frag_offset, is_final (where the emitted
chunk stopped).

The positional query reuses the (db_version > $ OR (db_version = $ AND
seq >= $)) shape already used by payload_blob_checked's estimate, with an
inclusive seq to match the vtab's exact tiling. Fragment setup is factored
into payload_chunks_pg_begin_fragment so a streamed and a resumed fragment
build identically; the next cursor is read from the buffered source row
(or the same row mid-fragment), peeking one row ahead to set is_final.

Legacy callers (send path, spool fill) are unchanged: new args default to
NULL so the positional branch only activates when resume_db_version is
bound, and the existing output columns keep their positions.

New test 55_payload_chunks_positional_resume.sql resumes at every
non-final chunk's reported cursor and asserts the fetched chunk equals the
next chunk of a full-window scan byte-for-byte, including a mid-fragment
resume. Verified against PostgreSQL 17; existing chunk/spool/fragment
tests (52-54) still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part 1 of test 55 proves the positional cursor produces byte-identical
chunks to a full scan. Part 2 now closes the loop end-to-end: a plpgsql
helper drains the whole window the way the /check job will (legacy
exclusive since=0, then the positional cursor one chunk per call,
ORDER BY chunk_index LIMIT 1 so each value-per-call SRF runs to
completion), the drained stream is applied to a fresh database, and the
receiver's table content is hashed and compared to the source.

This validates the real path (positional drain -> apply -> faithful
replica), not just byte-identity against a baseline. Verified on
PostgreSQL 17: 501 rows reproduced, hashes match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the PostgreSQL test's end-to-end check on the SQLite side: after
asserting the positional drain reproduces the full-scan chunks
byte-for-byte, apply that drained stream to a fresh receiver database and
assert its split_test content matches the source via
test_split_tables_equal. Chunks are applied in reverse drain order so the
apply path must reassemble v3 fragments and merge rows independent of
transport order. Validates positional drain -> apply -> faithful replica,
the real path the /check job will use. All unit tests pass, no leaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chunk_bench builds a window of N chunks and times paging the whole window
one chunk per call via the positional cursor on cloudsync_payload_chunks,
reporting wall time, per-chunk cost and throughput. Local-only (loads the
built dylib, no network). Env-tunable rows/row_bytes/txns/repeats/chunk_size
(TXNS splits the rows across db_versions). Lets the drain's computational
growth be tracked — e.g. to confirm a future indexed (db_version, seq) seek
flattens it from O(N^2) to O(N).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ions

The /check job now pages the tenant via the positional cursor on
cloudsync_payload_chunks, so the server-side download spool is dead.
Remove the cloudsync_payload_spool table and the
cloudsync_payload_spool_fill / _drop / _drop_chunk functions on both
engines (SQLite: sql_sqlite.c constants, sql.h decls, cloudsync_sqlite.c
functions + registration; PostgreSQL: cloudsync.sql.in and the 1.0->1.1
migration), plus the SQLite "Payload Download Spool" unit test and the
spool sections of the PostgreSQL chunk test.

The client-side cursor paging (network.c) is unchanged: the wire protocol
still pages chunks by an integer cursor; only the server's node-side
staging mechanism changed. All remaining chunk/fragment/positional tests
pass on both engines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@andinux andinux changed the title feat(payload): positional-cursor resume for chunk vtab (prototype) feat(payload): positional-cursor /check paging; remove the spool Jun 30, 2026
andinux and others added 7 commits June 30, 2026 11:28
… max

network_sync_state_update_from_response kept the maximum optimistic/confirmed
version seen across a multi-chunk send. But the server can move
lastOptimisticVersion BACKWARD when a later chunk fails (rollback), and that
value becomes the durable send checkpoint (CLOUDSYNC_KEY_SEND_DBVERSION) — so
masking a decrease advanced the checkpoint past the rolled-back changes, which
were then never re-sent (silent data loss). Take the latest valid (>= 0) value
instead, matching how `gaps` is already handled; a missing/unparseable field
(-1) still leaves the value untouched. Pre-existing since the chunked-payload
transport work, unreleased.

Add a network-layer unit test harness (the layer had none): network_unit.c is
built with networking compiled in (T_CFLAGS minus OMIT_NETWORK, linked against
curl) so the internal parsing/state functions can be exercised on in-memory
NETWORK_RESULT buffers with no server. `make network-unittest` runs it (also
wired into `make test`). The new test reproduces the rollback (50→100→50 must
end at 50, not 100) and covers the missing-field no-op and network_compute_status;
network_sync_state_update_from_response / network_compute_status are de-static'd
for it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The network-unittest target (added in 01f6d25) links dist/network_unit as an
executable from NT_LDFLAGS, derived from the platform LDFLAGS. Linux, Android and
Windows put -shared in LDFLAGS (the extension is a .so there); the filter only
stripped macOS's -dynamiclib, so -shared leaked into the executable link and ld
tried to emit a shared object from non-PIC objects:
  relocation R_X86_64_PC32 ... can not be used when making a shared object
Strip -shared too so it links as a plain executable. macOS is unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e range

cloudsync_network_send_changes announced each chunk's actual [db_version_min,
db_version_max] to /apply. When the local db_version clock has advanced past the
site's own changes — e.g. after merging applied remote changes, or any transaction
that produced no local-site change — those skipped versions are absent from the
sent ranges, so the server's per-site coverage (ComputeGaps walks it from version
1) reports them as gaps and never advances lastOptimisticVersion to the local
version. The send then churns and never reaches "synced".

Announce the covered window instead: min = a running window lower bound
(send_checkpoint+1, chained to the previous chunk's max+1), max = the chunk's own
max, tiling [send_checkpoint+1 .. watermark] contiguously. network_announce_min()
clamps the min to the chunk's own min so consecutive chunks sharing a db_version
(a value fragmented across chunks) keep min<=max; a partial-fragment failure stays
safe because the receiver stages incomplete fragments (0 applied rows), so coverage
is only recorded once the value is whole. Pre-existing since the chunked-payload
transport (f1ace81), unreleased.

Tests: network-unit case for the tiling (leading hole / inter-chunk hole /
same-version fragment pair); e2e repro test_send_gap_from_clock_hole forces a clock
jump with cloudsync_db_version_next() and asserts serverVersion reaches
localVersion (red before this change, green after).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e binary

The Android CI runs test binaries on the emulator from a command script generated by
`make test -n` (it pushes the workspace and executes the script via adb). The
network-unittest recipe linked dist/network_unit *inside the recipe*, so `make -n`
emitted the cross-compiler link command into that on-device script; the emulator has
no host NDK toolchain, so it failed with
  x86_64-linux-android26-clang: not found  (EXIT_CODE=127).
Move the link to a file-target rule (like dist/unit) and keep only the run in the
network-unittest recipe, so once the host build is done `make -n` emits just
`./dist/network_unit`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add step-by-step comments through the benchmark flow and, in trace builds,
print the raw JSON returned by the cloudsync_network_* functions: route the
pre-measure-sync and cleanup-send calls through a trace-aware db_exec_network
helper, and surface the send/check response JSON from timed_request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Android CI host runs `make test`, which aborts when `unittest` runs the
cross-built dist/unit (an Android binary on the Linux host). dist/network_unit is
not in TEST_TARGET (network_unit.c is filtered out of TEST_SRC), so it was built
only via the network-unittest prerequisite — ordered after unittest, i.e. never,
once make aborted. `make test -n` then saw the binary missing and emitted the NDK
link into the on-emulator command script (2028cc3's file-rule split was necessary
but not sufficient). List dist/network_unit as a direct `test` prerequisite before
unittest so it builds during the file-build phase, exactly like dist/unit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The send path's status/version reporting had no e2e coverage — both the spurious
gap and the optimistic-version regression moved data correctly while reporting the
wrong status, so data-only assertions missed them. Add two helpers:

- db_send_ok(): run a send expected to succeed and assert it did not fail at the
  protocol level (status != "error", no send.lastFailure) — catches a
  server-reported apply/check failure that does not raise a SQL error, which a
  plain db_exec of the send ignored. Applied at the basic and chunked primary send
  sites (not the two failure-path tests, nor the best-effort in-loop sends).
- db_send_await_converge(): send, then poll until send.serverVersion reaches
  send.localVersion (durably covered, no gap), robust to the server's async apply.
  test_send_gap_from_clock_hole now uses it instead of a single synchronous assert.

Full e2e suite (incl. chunked) green against a live server.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant