Skip to content

Add upgrade-drift test: detect schema drift between fresh install and sequential updates #20

@jnasbyupgrade

Description

@jnasbyupgrade

Overview

This issue tracks the implementation of a Layer 3 upgrade-drift CI test. The goal is to catch cases where ALTER EXTENSION cat_tools UPDATE (from 0.2.2 to 0.3.0) produces a different schema than a fresh CREATE EXTENSION cat_tools. Any such difference is a bug in the upgrade script.

Implementation Spec

Procedure

  1. make install PGUSER=postgres (installs all SQL files)
  2. Create two databases:
    • fresh: CREATE EXTENSION cat_tools (gets 0.3.0 directly)
    • upgraded: CREATE EXTENSION cat_tools VERSION '0.2.2' then ALTER EXTENSION cat_tools UPDATE (arrives at 0.3.0 via upgrade)
  3. For each database: run unmark-extension.sql to remove all objects from extension membership (so pg_dump includes them as regular objects)
  4. pg_dump --schema-only --no-owner --no-privileges on each database
  5. Normalize both dumps (strip noise, sort object blocks)
  6. Diff — any difference is an upgrade drift bug
  7. CI job passes if diff is empty; fails with the diff shown

Unmarking Extension Objects

/*
 * Generate ALTER EXTENSION cat_tools DROP ... statements for every
 * object owned by the extension, so pg_dump includes them as regular objects.
 */
SELECT format(
    'ALTER EXTENSION cat_tools DROP %s %s;',
    (pg_identify_object(classid, objid, 0)).type,
    (pg_identify_object(classid, objid, 0)).identity
)
FROM pg_depend
WHERE refobjid = (SELECT oid FROM pg_extension WHERE extname = 'cat_tools')
  AND deptype = 'e'
  AND classid != 'pg_extension'::regclass;

pg_identify_object returns (type, schema, name, identity); the identity field is suitable for use in ALTER EXTENSION DROP. Available PG 9.3+. Some object types may need special handling — test on PG 11, PG 12, and PG 18.

Dump Normalization Script

The normalization script should:

  1. Strip pg_dump header boilerplate (lines before first SET or -- section)
  2. Strip SET statements (search_path etc.)
  3. Strip -- Name: ...; Type: ...; Schema: ... section comment lines
  4. Split remaining content into blocks on blank-line boundaries
  5. Within each block, normalize whitespace (collapse runs, trim line ends)
  6. Sort blocks lexicographically
  7. Rejoin and diff the two outputs

File Layout

test/upgrade-drift/
  PLAN.md                    <- Design doc (write first)
  unmark-extension.sql       <- SQL to generate DROP statements
  run-drift-test.sh          <- Orchestrator: creates DBs, runs full test
  normalize-dump.pl          <- Normalization script (or .sh)

CI Job

upgrade-drift-test:
  strategy:
    matrix:
      pg: [11, 12, 18]
  name: Upgrade drift test on PostgreSQL ${{ matrix.pg }}
  runs-on: ubuntu-latest
  container: pgxn/pgxn-tools
  steps:
    - name: Start PostgreSQL ${{ matrix.pg }}
      run: pg-start ${{ matrix.pg }}
    - name: Check out the repo
      uses: actions/checkout@v4
    - name: Install rsync
      run: apt-get install -y rsync
    - name: Install cat_tools
      run: make install PGUSER=postgres
    - name: Run upgrade drift test
      run: test/upgrade-drift/run-drift-test.sh

Known Edge Cases to Address

  • Objects intentionally different between fresh/upgraded (allowlist approach — start with empty test/upgrade-drift/expected-diffs.txt)
  • PG version differences in pg_dump output format
  • The _cat_tools private schema (included or excluded?)
  • Table data (schema-only dump ignores it — note this limitation)
  • prosrc field in pg_dump output (function bodies) will be byte-for-byte identical if the upgrade script copies them correctly — whitespace differences are signal, not noise

Prior Analysis Findings

  1. pg_identify_object(classid, objid, 0).identity gives the right string for ALTER EXTENSION DROP without needing to handle each object type separately — but test this assumption for edge cases.
  2. Paragraph mode (splitting on blank lines) aligns well with pg_dump's output format.
  3. An allowlist of known acceptable diffs is the right pattern for intentional differences.

Language Choice for Normalization Script (Open Question)

The language for normalize-dump.pl (or equivalent) is still open. The leading candidate is Perl, but the tradeoffs are:

Perl (leading candidate)

  • Pro: Ships with every Debian system (it's a dependency of dpkg itself); available in the pgxn/pgxn-tools CI image with zero extra install steps; paragraph mode (local $/ = "") is perfect for block-splitting; strong regex with /xsm modifiers
  • Con: Less readable to developers who aren't Perl users; idiomatic Perl can be cryptic

Python

  • Pro: More readable to cold readers; re.DOTALL + re.VERBOSE cover the same ground as Perl's regex modifiers
  • Con: Not guaranteed in the CI image (needs apt-get install -y python3); potential version and venv headaches on dev machines

Shell/awk

  • Pro: Already used in the Makefile; zero new dependencies; awk RS="" paragraph mode exists
  • Con: Set arithmetic and multi-file logic are awkward; limited for the full requirements of both scripts

Go

  • Pro: Fast and statically typed
  • Con: Requires an install and build step; RE2 engine (no lookahead/lookbehind); overkill for these scripts

Recommendation: Use Perl unless there is a strong team preference otherwise. If Perl is chosen, keep idioms straightforward — avoid write-only constructions, prefer named captures over positional, and include comments explaining any non-obvious regex.

Success Criteria

  • test/upgrade-drift/run-drift-test.sh runs locally (with PG available) and PASSes when fresh install and upgrade produce identical schema, FAILs with a readable diff when they differ
  • CI job defined in ci.yml and syntactically valid
  • PLAN.md is complete enough that a new developer understands the full design

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions