fix(diff): normalise scanned values in both engines so typed columns …#136
Conversation
|
Warning Review limit reached
Next review available in: 49 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR adds scanned-value normalization for diff and mtree processing, extends repair conversion for legacy time and money inputs, and adds unit and integration coverage for typed-column diff and repair. ChangesScanned value normalization
Compact metadata: 5 files changed; core logic added in Related issues: Not specified in the provided data. Related PRs: Not specified in the provided data. Suggested labels: enhancement, tests, bugfix Suggested reviewers: Reviewers familiar with pgx value handling and the diff/repair pipeline. Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 12 |
| Duplication | 1 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/utils.go`:
- Around line 1255-1268: The JSON/JSONB handling in the switch is using a
GetStatus assertion that can panic on valid scanned values. Update the
pgtype.JSON and pgtype.JSONB branches in pkg/common/utils.go to handle them
separately and check the exposed Status field directly before attempting
AssignTo and marshaling, keeping the existing nil fallback behavior.
- Around line 1190-1204: The NormalizeScannedValue handling for
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date currently returns only
v.Time, which strips the InfinityModifier and loses infinity/-infinity
information. Update the cases in NormalizeScannedValue to preserve those values
by returning their text form when the scanned pgtype value is present, so
round-tripping through repair still matches PostgreSQL. Use the existing
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date branches as the place to
make this change.
- Around line 1165-1167: The package-level pgxV5TypeMap is shared state and
pgtype.Map.PlanEncode memoizes encode plans, so concurrent calls can race.
Update NormalizeScannedValue and the other Encode path that uses pgxV5TypeMap to
avoid sharing the same map across goroutines, either by guarding access with a
mutex or by creating a fresh pgxv5type.Map per call. Reference the pgxV5TypeMap
helper and both encode call sites when applying the fix.
In `@tests/integration/mtree_typed_repair_test.go`:
- Around line 83-87: The integration test is picking up stale mtree diff reports
from earlier runs instead of the report generated by the current DiffMtree()
call. Update the test around the filepath.Glob/sort.Strings/diffFile selection
to ignore old matches by either deleting pre-existing matching files before
calling DiffMtree() or filtering matches to only files created after the test’s
start time; keep the logic anchored to DiffMtree() and the diffFile selection so
the repair step always uses the current run’s report.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1abd0f24-dd99-48d3-a350-1475f30eb75c
📒 Files selected for processing (5)
internal/consistency/diff/table_diff.gointernal/consistency/mtree/merkle.gopkg/common/utils.gopkg/common/utils_test.gotests/integration/mtree_typed_repair_test.go
cceb109 to
e8e1a0d
Compare
|
@CodeRabbit review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/utils.go`:
- Around line 1215-1219: `NormalizeScannedValue` is missing an explicit `[]byte`
handling path before the generic fallback, which causes `bytea` payloads to be
stringified incorrectly; update the switch in `NormalizeScannedValue` to return
raw byte slices unchanged before reaching `fmt.Sprint`, alongside the existing
`pgtype.Bytea` branch, so scanned binary values preserve their original
contents.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 651b9186-ac55-4c22-be0b-f7634adef40e
📒 Files selected for processing (5)
internal/consistency/diff/table_diff.gointernal/consistency/mtree/merkle.gopkg/common/utils.gopkg/common/utils_test.gotests/integration/mtree_typed_repair_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/consistency/diff/table_diff.go
- internal/consistency/mtree/merkle.go
- pkg/common/utils_test.go
…repair
A diff produced by the Merkle-tree engine wrote raw pgx driver structs into
the report: a time column serialised as {"Microseconds":...,"Valid":...}
instead of "HH:MM:SS", so table-repair aborted on the type conversion and no
rows were fixed. UUIDs and intervals had the same latent problem (a JSON
array of 16 bytes, an interval struct). The classic engine was unaffected
because its row fetch normalised every scanned value inline before the report
was written -- the two engines produced incompatible diff files.
Extract that normalisation into a shared helper, NormalizeScannedValue, and
route both engines' row materialisation through it, so a diff file is
repairable regardless of which engine produced it. Beyond the classic
engine's behaviour the helper adds a pgx v5 interval case (encoded with a
per-call type map: pgtype.Map memoizes encode plans without locking, so a
shared instance would race under the concurrent diff workers), preserves
infinity/-infinity timestamps and dates as their Postgres text forms instead
of collapsing them to a zero time, and reads the v1 JSON/JSONB status field
directly rather than through an interface assertion that could panic. Raw
[]byte bytea payloads pass through untouched (base64 in the report, decoded
by the repair-side bytea conversion) instead of being stringified into an
unrepairable form.
Repair-side, ConvertToPgxType now handles money explicitly -- both engines
emit it as its locale-formatted string, which Postgres parses directly -- so
the per-row "unknown or complex pgType" warning is gone. It also accepts the
legacy {"Microseconds","Valid"} map form for time so diff files written by
older builds remain repairable after upgrading.
Unit tests cover the normaliser round-trips (time, uuid, interval, infinity,
v1 JSON, passthrough), race-freedom of concurrent normalisation, and the
converter's money and legacy-time forms; an integration test drives the
reported scenario end to end: 100 divergent rows with time, timetz, money,
uuid, interval and numeric columns, Merkle-tree diff, then repair from the
report that diff wrote (never a stale one), asserting the nodes converge to
identical rows.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e8e1a0d to
63b5ef3
Compare
…repair
A diff produced by the Merkle-tree engine wrote raw pgx driver structs into the report: a time column serialised as {"Microseconds":...,"Valid":...} instead of "HH:MM:SS", so table-repair aborted on the type conversion and no rows were fixed. UUIDs and intervals had the same latent problem (a JSON array of 16 bytes, an interval struct). The classic engine was unaffected because its row fetch normalised every scanned value inline before the report was written -- the two engines produced incompatible diff files.
Extract that normalisation into a shared helper, NormalizeScannedValue, and route both engines' row materialisation through it, so a diff file is repairable regardless of which engine produced it. The helper preserves the classic engine's behaviour verbatim and adds a pgx v5 interval case the old switch lacked.
Repair-side, ConvertToPgxType now handles money explicitly -- both engines emit it as its locale-formatted string, which Postgres parses directly -- so the per-row "unknown or complex pgType" warning is gone. It also accepts the legacy {"Microseconds","Valid"} map form for time so diff files written by older builds remain repairable after upgrading.
Unit tests cover the normaliser round-trips (time, uuid, interval, passthrough) and the converter's money and legacy-time forms; an integration test drives the reported scenario end to end: 100 divergent rows with time, timetz, money, uuid, interval and numeric columns, Merkle-tree diff, then repair, asserting the nodes converge to identical rows.