Skip to content

fix(diff): normalise scanned values in both engines so typed columns …#136

Merged
mason-sharp merged 1 commit into
mainfrom
fix/ACE-201/typed-repair
Jul 2, 2026
Merged

fix(diff): normalise scanned values in both engines so typed columns …#136
mason-sharp merged 1 commit into
mainfrom
fix/ACE-201/typed-repair

Conversation

@danolivo

@danolivo danolivo commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

…repair

A diff produced by the Merkle-tree engine wrote raw pgx driver structs into the report: a time column serialised as {"Microseconds":...,"Valid":...} instead of "HH:MM:SS", so table-repair aborted on the type conversion and no rows were fixed. UUIDs and intervals had the same latent problem (a JSON array of 16 bytes, an interval struct). The classic engine was unaffected because its row fetch normalised every scanned value inline before the report was written -- the two engines produced incompatible diff files.

Extract that normalisation into a shared helper, NormalizeScannedValue, and route both engines' row materialisation through it, so a diff file is repairable regardless of which engine produced it. The helper preserves the classic engine's behaviour verbatim and adds a pgx v5 interval case the old switch lacked.

Repair-side, ConvertToPgxType now handles money explicitly -- both engines emit it as its locale-formatted string, which Postgres parses directly -- so the per-row "unknown or complex pgType" warning is gone. It also accepts the legacy {"Microseconds","Valid"} map form for time so diff files written by older builds remain repairable after upgrading.

Unit tests cover the normaliser round-trips (time, uuid, interval, passthrough) and the converter's money and legacy-time forms; an integration test drives the reported scenario end to end: 100 divergent rows with time, timetz, money, uuid, interval and numeric columns, Merkle-tree diff, then repair, asserting the nodes converge to identical rows.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@danolivo, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 49 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d7b4f1e-d076-45e4-8e11-2730102ac814

📥 Commits

Reviewing files that changed from the base of the PR and between e8e1a0d and 63b5ef3.

📒 Files selected for processing (5)
  • internal/consistency/diff/table_diff.go
  • internal/consistency/mtree/merkle.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • tests/integration/mtree_typed_repair_test.go
📝 Walkthrough

Walkthrough

This PR adds scanned-value normalization for diff and mtree processing, extends repair conversion for legacy time and money inputs, and adds unit and integration coverage for typed-column diff and repair.

Changes

Scanned value normalization

Layer / File(s) Summary
Normalization utility and repair conversion
pkg/common/utils.go
Adds NormalizeScannedValue to canonicalize scanned pg and pgx driver values, and extends ConvertToPgxType for legacy map-shaped time inputs and money values.
Diff and mtree row normalization
internal/consistency/diff/table_diff.go, internal/consistency/mtree/merkle.go
Routes scanned row values through NormalizeScannedValue and removes the unused pgtype imports from diff processing.
Normalization unit tests
pkg/common/utils_test.go
Adds tests for time, UUID, interval, infinity, JSON v1, concurrency, passthrough, and legacy repair conversion cases.
Typed-column diff and repair integration
tests/integration/mtree_typed_repair_test.go
Adds an integration test that seeds divergent typed rows, runs mtree diff, repairs from the generated report, and verifies convergence by fingerprint comparison.

Compact metadata: 5 files changed; core logic added in pkg/common/utils.go (+158/-0); consumers updated in table_diff.go (+1/-120) and merkle.go (+4/-1); tests added in utils_test.go and mtree_typed_repair_test.go.

Related issues: Not specified in the provided data.

Related PRs: Not specified in the provided data.

Suggested labels: enhancement, tests, bugfix

Suggested reviewers: Reviewers familiar with pgx value handling and the diff/repair pipeline.

Poem

A rabbit peeks at rows anew,
And folds their shapes to tidy glue.
Times and UUIDs hop in line,
With money too, all quick and fine.
Diff and repair now nuzzle through —
Thump! The table matches true. 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly reflects the main change: normalizing scanned values for typed columns in both diff engines.
Description check ✅ Passed The description matches the changeset and explains the diff repair, normalization helper, money handling, and tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ACE-201/typed-repair

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

codacy-production Bot commented Jul 2, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 12 complexity · 1 duplication

Metric Results
Complexity 12
Duplication 1

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/utils.go`:
- Around line 1255-1268: The JSON/JSONB handling in the switch is using a
GetStatus assertion that can panic on valid scanned values. Update the
pgtype.JSON and pgtype.JSONB branches in pkg/common/utils.go to handle them
separately and check the exposed Status field directly before attempting
AssignTo and marshaling, keeping the existing nil fallback behavior.
- Around line 1190-1204: The NormalizeScannedValue handling for
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date currently returns only
v.Time, which strips the InfinityModifier and loses infinity/-infinity
information. Update the cases in NormalizeScannedValue to preserve those values
by returning their text form when the scanned pgtype value is present, so
round-tripping through repair still matches PostgreSQL. Use the existing
pgtype.Timestamp, pgtype.Timestamptz, and pgtype.Date branches as the place to
make this change.
- Around line 1165-1167: The package-level pgxV5TypeMap is shared state and
pgtype.Map.PlanEncode memoizes encode plans, so concurrent calls can race.
Update NormalizeScannedValue and the other Encode path that uses pgxV5TypeMap to
avoid sharing the same map across goroutines, either by guarding access with a
mutex or by creating a fresh pgxv5type.Map per call. Reference the pgxV5TypeMap
helper and both encode call sites when applying the fix.

In `@tests/integration/mtree_typed_repair_test.go`:
- Around line 83-87: The integration test is picking up stale mtree diff reports
from earlier runs instead of the report generated by the current DiffMtree()
call. Update the test around the filepath.Glob/sort.Strings/diffFile selection
to ignore old matches by either deleting pre-existing matching files before
calling DiffMtree() or filtering matches to only files created after the test’s
start time; keep the logic anchored to DiffMtree() and the diffFile selection so
the repair step always uses the current run’s report.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1abd0f24-dd99-48d3-a350-1475f30eb75c

📥 Commits

Reviewing files that changed from the base of the PR and between f085feb and cceb109.

📒 Files selected for processing (5)
  • internal/consistency/diff/table_diff.go
  • internal/consistency/mtree/merkle.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • tests/integration/mtree_typed_repair_test.go

Comment thread pkg/common/utils.go Outdated
Comment thread pkg/common/utils.go
Comment thread pkg/common/utils.go Outdated
Comment thread tests/integration/mtree_typed_repair_test.go Outdated
@danolivo danolivo force-pushed the fix/ACE-201/typed-repair branch from cceb109 to e8e1a0d Compare July 2, 2026 11:29
@danolivo

danolivo commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/utils.go`:
- Around line 1215-1219: `NormalizeScannedValue` is missing an explicit `[]byte`
handling path before the generic fallback, which causes `bytea` payloads to be
stringified incorrectly; update the switch in `NormalizeScannedValue` to return
raw byte slices unchanged before reaching `fmt.Sprint`, alongside the existing
`pgtype.Bytea` branch, so scanned binary values preserve their original
contents.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 651b9186-ac55-4c22-be0b-f7634adef40e

📥 Commits

Reviewing files that changed from the base of the PR and between cceb109 and e8e1a0d.

📒 Files selected for processing (5)
  • internal/consistency/diff/table_diff.go
  • internal/consistency/mtree/merkle.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • tests/integration/mtree_typed_repair_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/consistency/diff/table_diff.go
  • internal/consistency/mtree/merkle.go
  • pkg/common/utils_test.go

Comment thread pkg/common/utils.go
…repair

A diff produced by the Merkle-tree engine wrote raw pgx driver structs into
the report: a time column serialised as {"Microseconds":...,"Valid":...}
instead of "HH:MM:SS", so table-repair aborted on the type conversion and no
rows were fixed. UUIDs and intervals had the same latent problem (a JSON
array of 16 bytes, an interval struct). The classic engine was unaffected
because its row fetch normalised every scanned value inline before the report
was written -- the two engines produced incompatible diff files.

Extract that normalisation into a shared helper, NormalizeScannedValue, and
route both engines' row materialisation through it, so a diff file is
repairable regardless of which engine produced it. Beyond the classic
engine's behaviour the helper adds a pgx v5 interval case (encoded with a
per-call type map: pgtype.Map memoizes encode plans without locking, so a
shared instance would race under the concurrent diff workers), preserves
infinity/-infinity timestamps and dates as their Postgres text forms instead
of collapsing them to a zero time, and reads the v1 JSON/JSONB status field
directly rather than through an interface assertion that could panic. Raw
[]byte bytea payloads pass through untouched (base64 in the report, decoded
by the repair-side bytea conversion) instead of being stringified into an
unrepairable form.

Repair-side, ConvertToPgxType now handles money explicitly -- both engines
emit it as its locale-formatted string, which Postgres parses directly -- so
the per-row "unknown or complex pgType" warning is gone. It also accepts the
legacy {"Microseconds","Valid"} map form for time so diff files written by
older builds remain repairable after upgrading.

Unit tests cover the normaliser round-trips (time, uuid, interval, infinity,
v1 JSON, passthrough), race-freedom of concurrent normalisation, and the
converter's money and legacy-time forms; an integration test drives the
reported scenario end to end: 100 divergent rows with time, timetz, money,
uuid, interval and numeric columns, Merkle-tree diff, then repair from the
report that diff wrote (never a stale one), asserting the nodes converge to
identical rows.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@danolivo danolivo force-pushed the fix/ACE-201/typed-repair branch from e8e1a0d to 63b5ef3 Compare July 2, 2026 12:47
@mason-sharp mason-sharp merged commit 7d9bd39 into main Jul 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants