fix(diff): normalise scanned values in both engines so typed columns … by danolivo · Pull Request #136 · pgEdge/ace

danolivo · 2026-07-02T10:37:04Z

…repair

A diff produced by the Merkle-tree engine wrote raw pgx driver structs into the report: a time column serialised as {"Microseconds":...,"Valid":...} instead of "HH:MM:SS", so table-repair aborted on the type conversion and no rows were fixed. UUIDs and intervals had the same latent problem (a JSON array of 16 bytes, an interval struct). The classic engine was unaffected because its row fetch normalised every scanned value inline before the report was written -- the two engines produced incompatible diff files.

Extract that normalisation into a shared helper, NormalizeScannedValue, and route both engines' row materialisation through it, so a diff file is repairable regardless of which engine produced it. The helper preserves the classic engine's behaviour verbatim and adds a pgx v5 interval case the old switch lacked.

Repair-side, ConvertToPgxType now handles money explicitly -- both engines emit it as its locale-formatted string, which Postgres parses directly -- so the per-row "unknown or complex pgType" warning is gone. It also accepts the legacy {"Microseconds","Valid"} map form for time so diff files written by older builds remain repairable after upgrading.

Unit tests cover the normaliser round-trips (time, uuid, interval, passthrough) and the converter's money and legacy-time forms; an integration test drives the reported scenario end to end: 100 divergent rows with time, timetz, money, uuid, interval and numeric columns, Merkle-tree diff, then repair, asserting the nodes converge to identical rows.

coderabbitai · 2026-07-02T10:37:29Z

Warning

Review limit reached

@danolivo, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 49 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d7b4f1e-d076-45e4-8e11-2730102ac814

📥 Commits

Reviewing files that changed from the base of the PR and between e8e1a0d and 63b5ef3.

📒 Files selected for processing (5)

internal/consistency/diff/table_diff.go
internal/consistency/mtree/merkle.go
pkg/common/utils.go
pkg/common/utils_test.go
tests/integration/mtree_typed_repair_test.go

📝 Walkthrough

Walkthrough

This PR adds scanned-value normalization for diff and mtree processing, extends repair conversion for legacy time and money inputs, and adds unit and integration coverage for typed-column diff and repair.

Changes

Scanned value normalization

Layer / File(s)	Summary
Normalization utility and repair conversion `pkg/common/utils.go`	Adds `NormalizeScannedValue` to canonicalize scanned pg and pgx driver values, and extends `ConvertToPgxType` for legacy map-shaped time inputs and money values.
Diff and mtree row normalization `internal/consistency/diff/table_diff.go`, `internal/consistency/mtree/merkle.go`	Routes scanned row values through `NormalizeScannedValue` and removes the unused pgtype imports from diff processing.
Normalization unit tests `pkg/common/utils_test.go`	Adds tests for time, UUID, interval, infinity, JSON v1, concurrency, passthrough, and legacy repair conversion cases.
Typed-column diff and repair integration `tests/integration/mtree_typed_repair_test.go`	Adds an integration test that seeds divergent typed rows, runs mtree diff, repairs from the generated report, and verifies convergence by fingerprint comparison.

Compact metadata: 5 files changed; core logic added in pkg/common/utils.go (+158/-0); consumers updated in table_diff.go (+1/-120) and merkle.go (+4/-1); tests added in utils_test.go and mtree_typed_repair_test.go.

Related issues: Not specified in the provided data.

Related PRs: Not specified in the provided data.

Suggested labels: enhancement, tests, bugfix

Suggested reviewers: Reviewers familiar with pgx value handling and the diff/repair pipeline.

Poem

A rabbit peeks at rows anew,
And folds their shapes to tidy glue.
Times and UUIDs hop in line,
With money too, all quick and fine.
Diff and repair now nuzzle through —
Thump! The table matches true. 🐇

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly reflects the main change: normalizing scanned values for typed columns in both diff engines.
Description check	✅ Passed	The description matches the changeset and explains the diff repair, normalization helper, money handling, and tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ACE-201/typed-repair

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codacy-production · 2026-07-02T10:38:11Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 12 complexity · 1 duplication

Metric Results

Complexity 12

Duplication 1

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/utils.go`:
- Around line 1255-1268: The JSON/JSONB handling in the switch is using a
GetStatus assertion that can panic on valid scanned values. Update the
pgtype.JSON and pgtype.JSONB branches in pkg/common/utils.go to handle them
separately and check the exposed Status field directly before attempting
AssignTo and marshaling, keeping the existing nil fallback behavior.
- Around line 1190-1204: The NormalizeScannedValue handling for
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date currently returns only
v.Time, which strips the InfinityModifier and loses infinity/-infinity
information. Update the cases in NormalizeScannedValue to preserve those values
by returning their text form when the scanned pgtype value is present, so
round-tripping through repair still matches PostgreSQL. Use the existing
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date branches as the place to
make this change.
- Around line 1165-1167: The package-level pgxV5TypeMap is shared state and
pgtype.Map.PlanEncode memoizes encode plans, so concurrent calls can race.
Update NormalizeScannedValue and the other Encode path that uses pgxV5TypeMap to
avoid sharing the same map across goroutines, either by guarding access with a
mutex or by creating a fresh pgxv5type.Map per call. Reference the pgxV5TypeMap
helper and both encode call sites when applying the fix.

In `@tests/integration/mtree_typed_repair_test.go`:
- Around line 83-87: The integration test is picking up stale mtree diff reports
from earlier runs instead of the report generated by the current DiffMtree()
call. Update the test around the filepath.Glob/sort.Strings/diffFile selection
to ignore old matches by either deleting pre-existing matching files before
calling DiffMtree() or filtering matches to only files created after the test’s
start time; keep the logic anchored to DiffMtree() and the diffFile selection so
the repair step always uses the current run’s report.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1abd0f24-dd99-48d3-a350-1475f30eb75c

📥 Commits

Reviewing files that changed from the base of the PR and between f085feb and cceb109.

📒 Files selected for processing (5)

internal/consistency/diff/table_diff.go
internal/consistency/mtree/merkle.go
pkg/common/utils.go
pkg/common/utils_test.go
tests/integration/mtree_typed_repair_test.go

danolivo · 2026-07-02T12:36:44Z

@CodeRabbit review

coderabbitai · 2026-07-02T12:36:53Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/utils.go`:
- Around line 1215-1219: `NormalizeScannedValue` is missing an explicit `[]byte`
handling path before the generic fallback, which causes `bytea` payloads to be
stringified incorrectly; update the switch in `NormalizeScannedValue` to return
raw byte slices unchanged before reaching `fmt.Sprint`, alongside the existing
`pgtype.Bytea` branch, so scanned binary values preserve their original
contents.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 651b9186-ac55-4c22-be0b-f7634adef40e

📥 Commits

Reviewing files that changed from the base of the PR and between cceb109 and e8e1a0d.

📒 Files selected for processing (5)

internal/consistency/diff/table_diff.go
internal/consistency/mtree/merkle.go
pkg/common/utils.go
pkg/common/utils_test.go
tests/integration/mtree_typed_repair_test.go

🚧 Files skipped from review as they are similar to previous changes (3)

internal/consistency/diff/table_diff.go
internal/consistency/mtree/merkle.go
pkg/common/utils_test.go

…repair A diff produced by the Merkle-tree engine wrote raw pgx driver structs into the report: a time column serialised as {"Microseconds":...,"Valid":...} instead of "HH:MM:SS", so table-repair aborted on the type conversion and no rows were fixed. UUIDs and intervals had the same latent problem (a JSON array of 16 bytes, an interval struct). The classic engine was unaffected because its row fetch normalised every scanned value inline before the report was written -- the two engines produced incompatible diff files. Extract that normalisation into a shared helper, NormalizeScannedValue, and route both engines' row materialisation through it, so a diff file is repairable regardless of which engine produced it. Beyond the classic engine's behaviour the helper adds a pgx v5 interval case (encoded with a per-call type map: pgtype.Map memoizes encode plans without locking, so a shared instance would race under the concurrent diff workers), preserves infinity/-infinity timestamps and dates as their Postgres text forms instead of collapsing them to a zero time, and reads the v1 JSON/JSONB status field directly rather than through an interface assertion that could panic. Raw []byte bytea payloads pass through untouched (base64 in the report, decoded by the repair-side bytea conversion) instead of being stringified into an unrepairable form. Repair-side, ConvertToPgxType now handles money explicitly -- both engines emit it as its locale-formatted string, which Postgres parses directly -- so the per-row "unknown or complex pgType" warning is gone. It also accepts the legacy {"Microseconds","Valid"} map form for time so diff files written by older builds remain repairable after upgrading. Unit tests cover the normaliser round-trips (time, uuid, interval, infinity, v1 JSON, passthrough), race-freedom of concurrent normalisation, and the converter's money and legacy-time forms; an integration test drives the reported scenario end to end: 100 divergent rows with time, timetz, money, uuid, interval and numeric columns, Merkle-tree diff, then repair from the report that diff wrote (never a stale one), asserting the nodes converge to identical rows. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread pkg/common/utils.go Outdated

Comment thread pkg/common/utils.go

Comment thread pkg/common/utils.go Outdated

Comment thread tests/integration/mtree_typed_repair_test.go Outdated

danolivo force-pushed the fix/ACE-201/typed-repair branch from cceb109 to e8e1a0d Compare July 2, 2026 11:29

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread pkg/common/utils.go

danolivo force-pushed the fix/ACE-201/typed-repair branch from e8e1a0d to 63b5ef3 Compare July 2, 2026 12:47

mason-sharp approved these changes Jul 2, 2026

View reviewed changes

mason-sharp merged commit 7d9bd39 into main Jul 2, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(diff): normalise scanned values in both engines so typed columns …#136

fix(diff): normalise scanned values in both engines so typed columns …#136
mason-sharp merged 1 commit into
mainfrom
fix/ACE-201/typed-repair

danolivo commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Uh oh!

codacy-production Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danolivo commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danolivo commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Uh oh!

codacy-production Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danolivo commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

codacy-production Bot commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading