feat(sync-service): record per-transaction fragment wall-time#4504
feat(sync-service): record per-transaction fragment wall-time#4504erik-the-implementer wants to merge 8 commits into
Conversation
Add a pg_txn.fragments_wall_duration_µs attribute to the pg_txn.replication_client.transaction_received span, set on the commit fragment. It measures the wall-clock time from a transaction's begin to its commit as received from Postgres. Because the replication stream is consumed on demand (e.g. paused while database connections are scaled down), this includes idle gaps between fragments and can be far larger than the per-fragment processing time — it's the signal for transactions whose fragments span a shape consumer's suspend threshold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude Code ReviewSummaryIteration 5. Since iteration 4, one new commit landed — What's Working Well
Issues FoundCritical (Must Fix)1. Build is red — the missed fixture from iteration 4 was never stamped. File: This is the same issue flagged as Important in iteration 4, now confirmed failing in CI rather than predicted. The local commit: %Changes.Commit{}, # tx_started_at defaults to nilThe "restores latest offset" test feeds that fragment through Codecov confirms exactly this: Fix (stamp it like every other collector-fed fixture): # shape_cache_test.exs:1533
commit: %Changes.Commit{tx_started_at: System.monotonic_time()},I re-swept the whole test tree for bare
So Important (Should Fix)None new. Suggestions (Nice to Have)Carry-overs, all non-blocking and unchanged by
(Suggestion #4 from iteration 4 — assert the emission — is now resolved by the new test.) Issue ConformanceStill no linked issue ( Previous Review Status
Monorepo / Cross-Package Notessync-service-internal telemetry attribute; no HTTP-contract or TypeScript-client impact. Review iteration: 5 | 2026-06-05 |
❌ 1 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
…ll-time
tx_started_at_mono is set together with txn_fragment on Begin, so a
commit always has it; a begin-less commit would raise on the
`%{fragment | commit}` map-update regardless. Compute the duration
inline at commit and read it directly in the ShapeLogCollector,
removing the defensive nil branches and the misleading "nil after
reconnect" comment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gment processing Carry the begin monotonic time on the commit and compute the wall-clock duration in the ShapeLogCollector after the commit fragment is processed, so processing time is included. Mirrors the existing receive_lag pattern (stored mono time + delta computed later, reported in ms). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xtures Changes.Commit is only built by MessageConverter, which always sets tx_started_at on Begin (a begin-less commit raises on the fragment map-update before exiting), so in regular execution the field is always present. Drop the is_integer guard and stamp tx_started_at on synthetic test commits instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lti-shape txn Drives a single complete transaction that the EventRouter reslices to two shapes, and asserts the total_processing_time span attribute lands exactly once (on the original incoming commit), guarding against future EventRouter changes that might set it per-shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds a
total_processing_timeattribute to thepg_txn.replication_client.transaction_receivedspan, set on the commit fragment. It records the wall-clock time taken to process all fragments of a single transaction — from when the begin was received to when the commit fragment finishes processing.Today our spans only measure per-fragment processing time (~ms). They can't tell us how long a transaction's fragments are smeared across in wall-clock terms — which is the quantity that determines whether a shape consumer can idle past its suspend threshold mid-transaction (see #4501 / #4503). Because the replication stream is consumed on demand (e.g. paused while DB connections are scaled down), this can be far larger than the processing time, as it includes the idle gaps between fragments.
Changes
MessageConverterstamps the begin monotonic time (tx_started_at) onBeginand carries it on theCommitstruct.ShapeLogCollector.do_handle_event/2setstotal_processing_time(ms) on the span after the commit fragment is processed, asnow - tx_started_at. Computing it post-processing (rather than precomputing in the converter at commit-receive time) folds in the processing time too, mirroring the existingreceive_lagpattern (replication_client.ex): a stored monotonic time plus a delta computed later, reported in ms.Test
message_converter_test.exsasserts the begin timestamp is recorded on the commit. Converter (18) and ShapeLogCollector (29) suites pass.🤖 Generated with Claude Code