Skip to content

feat(datastore): PostgreSQL compatibility layer#4

Closed
dnplkndll wants to merge 10 commits into
mainfrom
feat/pg-compat-clean
Closed

feat(datastore): PostgreSQL compatibility layer#4
dnplkndll wants to merge 10 commits into
mainfrom
feat/pg-compat-clean

Conversation

@dnplkndll
Copy link
Copy Markdown

@dnplkndll dnplkndll commented Apr 17, 2026

Summary

Adds a PostgreSQL compatibility layer to Fleet's datastore, enabling Fleet to run against PostgreSQL in addition to MySQL. Non-breaking — MySQL remains the default and is unaffected.

Upstream context: fleetdm/fleet#34025 — PostgreSQL support requested alongside MySQL. fleetdm/fleet#30286 — customer request due to MySQL 8.4 issues.


⬆️ Extractable upstream contribution: tools/pgcompat

Note: tools/pgcompat/ and .github/workflows/validate-pg-compat.yml are planned as a standalone PR to fleetdm/fleet (tracked on branch feat/upstream-pgcompat-validators). Once accepted upstream, they can be dropped from this fork PR. For now they live here so the full chain can be reviewed together.

tools/pgcompat contains two small Go programs that prevent silent PG-compat regressions. They have no build-tag constraints and no PG runtime dependency — they statically analyse Go source and SQL schema files. Any Fleet operator building PG support (or the upstream project, if PG support is ever officially scoped) would benefit from them.

check_primary_keys

Scans non-test Go source for raw ON DUPLICATE KEY UPDATE SQL and verifies that every targeted table has an entry in knownPrimaryKeys (in server/platform/postgres/rebind_driver.go). The rebind driver uses that map to emit ON CONFLICT (<pk>) DO UPDATE SET …; a missing entry produces invalid PG SQL at runtime.

go run ./tools/pgcompat/check_primary_keys                       # runtime sites only
go run ./tools/pgcompat/check_primary_keys --include-migrations  # also scan migrations

check_schema_drift

Diffs the CREATE TABLE identifier sets between server/datastore/mysql/schema.sql (MySQL canonical) and server/datastore/mysql/pg_baseline_schema.sql (PG baseline dump). Intentional drift — PG-specific tables, MySQL-only legacy tables — is recorded in tools/pgcompat/known_schema_diff.txt; stale allowlist entries also fail so the file stays honest.

go run ./tools/pgcompat/check_schema_drift

CI gate

Both validators run in .github/workflows/validate-pg-compat.yml on every PR and push to main/aggregated. validators_test.go is a gate-of-the-gate: an empty schema-diff allowlist must produce a non-zero exit.

What changes for upstreaming

  • knownPrimaryKeys would move from rebind_driver.go (fork-only) to a standalone tools/pgcompat/primary_keys.go file, with an upstream-appropriate starting set.
  • pg_baseline_schema.sql reference in check_schema_drift would become an optional flag; the tool still runs (and passes trivially) when the file is absent, so it can ship to upstream without requiring the full PG layer.
  • validate-pg-compat.yml trigger list would be adjusted to main/patch-*/prepare-* (standard Fleet branch patterns).

⬆️ Extractable upstream contribution: tools/pg-compat-harness

A Playwright-API-mode regression matrix that exercises every URL filter Fleet's frontend can construct against a live server, asserting each response is not a Postgres-driver or Postgres-syntax failure (SQLSTATE, must appear in the GROUP BY, operator does not exist, cannot find encode plan, etc.).

cd tools/pg-compat-harness
yarn install
export FLEET_URL=https://your-fleet
export FLEET_TOKEN=$(awk '/token:/ {print $2}' ~/.fleet/config)
yarn test

Read-only (HTTP GET only) — safe against prod. Runs ~220 probes in ~15s with 8 workers. Coverage:

  • /hosts + /hosts/count: every status, low_disk_space, mdm_enrollment_status, os_settings/apple_settings/disk_encryption/bootstrap_package, populate_*, every order_key × direction, cursor pagination (after=), vulnerability filter, search.
  • /software/versions, /software/titles, /software (deprecated): vulnerable, exploit, cvss range, self_service, available_for_install, packages_only, team filtering, ordering.
  • /vulnerabilities: cvss range, exploit, ordering, search.
  • /host_summary: every platform, low_disk_space, team.
  • /labels/:id/hosts, /hosts/:id/* (software/policies/activities/encryption_key).
  • Sanity endpoints: /config, /version, /labels, /teams, /me, /queries, /policies, /activities.

This is the harness that found and gated the two driver bugs and the index-parity gap fixed in this PR — see the Driver rewrites table and the AddMissingPGIndexes migration below. It belongs upstream as an integration-test layer that complements tools/pgcompat/'s static analysis.

What changes for upstreaming

  • Hard-code the bearer token loader to a fleetctl context, drop the FLEET_URL/FLEET_TOKEN env-var dance.
  • Wire into a GitHub Actions matrix step running against an ephemeral fleet+pg compose stack.

⬆️ Extractable upstream contribution: tools/pg-index-translate

A small Go program that parses server/datastore/mysql/schema.sql and emits CREATE INDEX IF NOT EXISTS statements for every MySQL KEY / UNIQUE KEY clause, suitable for embedding into a PG-only migration. Handles balanced parens (expression bodies), the USING BTREE MySQL hint, and DESC ordering. Skips FULLTEXT, SPATIAL, prefix-length, and expression indexes with a per-skip reason printed to stderr.

go run ./tools/pg-index-translate \
  -in  server/datastore/mysql/schema.sql \
  -out server/datastore/mysql/migrations/tables/{ts}_AddMissingPGIndexes.sql

Unit-tested in tools/pg-index-translate/main_test.go via inline schema fixtures — no file I/O required. The translator output drives the AddMissingPGIndexes migration described in Operator-visible additions below.


Architecture

  • DialectHelper interface (server/datastore/mysql/dialect.go) — abstracts SQL dialect differences (upserts, aggregates, JSON, error classification, swap tables).
  • pgx-rebind driver (server/platform/postgres/rebind_driver.go) — transparently translates MySQL SQL to PostgreSQL at query execution time (50+ pattern rewrites, all static regexes hoisted to package-level vars compiled once at startup; per-table-name regexes cached in sync.Map to avoid per-call compilation).
  • Baseline PG schema (server/datastore/mysql/pg_baseline_schema.sql) — generated from production pg_dump, authoritative for fresh-deployment initialization.

Key design decisions

  1. Rebind driver over query rewriting: Rather than modifying every SQL string in the codebase, a driver-level interceptor translates MySQL idioms to PG at execution time. Minimizes diff size and maintenance burden.
  2. Dialect helper for compile-time control: Upserts, aggregates, JSON ops, and atomic table swaps go through ds.dialect.* methods when dynamic SQL construction requires it. Dialect dispatch uses ds.dialect.IsPostgres() throughout — no type assertions against the concrete type leak past the interface.
  3. Schema from pg_dump: PG baseline is generated from a running production database, not hand-maintained. Bumping the baseline is a documented operator procedure.

Operator-visible additions

  • Baseline drift detection. pg_baseline_schema.sql carries a pg-baseline-up-to-migration: <ts> marker. On startup Fleet seeds migration_status_tables from code on a fresh apply, and logs a loud warning whenever code carries migrations newer than the embedded baseline. A unit test (TestVersionsAbove_EmbeddedBaselineCoversAllCode) fails CI if the baseline is stale relative to code on the same branch — silent drift is no longer possible.
  • Object-ownership reassertion. pg_baseline_post.sql runs every startup and reassigns ownership of all public-schema tables, sequences, and views to current_user. Fixes the "must be owner of table" error from atomic table swaps when the baseline was loaded as postgres.
  • PG index parity migration (20260513210000_AddMissingPGIndexes). PG had ~11 indexes vs MySQL's ~354, because the original baseline dump captured PKs but missed the secondary KEY clauses. The migration adds 349 CREATE INDEX IF NOT EXISTS statements generated by tools/pg-index-translate and executes them only on PG (UpFnPG) — first migration to use the dialect-specific fields on goose.Migration; MySQL UpFn is a deliberate no-op since those indexes already exist there. 6 indexes are intentionally deferred (FULLTEXT × 2, expression-with-MySQL-funcs × 3, prefix-length × 1) and will move into their own targeted PG migrations.
  • First dialect-specific migration precedent. server/goose/migration.go has carried UpFnPG/UpFnMySQL fields since the original PG layer landed; no migration had used them yet. 20260513210000_AddMissingPGIndexes is the documented example for future PG-only or MySQL-only DDL.

Connection configuration

Set FLEET_MYSQL_DRIVER=postgres (and the standard FLEET_MYSQL_ADDRESS, FLEET_MYSQL_USERNAME, etc. pointing at your PG cluster). See docs/Deploy/postgresql.md for the full operator guide.

Security-audit fixes

  1. HIGH — CI gate on aggregated image build. test-go-postgres.yaml triggers on the aggregated branch; companion build-ledo.yml (on ledoent) refuses to publish ghcr.io/ledoent/fleet:latest unless both test-go-postgres and validate-pg-compat succeeded on the build SHA.
  2. HIGH — knownPrimaryKeys + schema-drift validators under tools/pgcompat/ run on every PR:
    • check_primary_keys — every raw ON DUPLICATE KEY UPDATE site has a knownPrimaryKeys entry.
    • check_schema_driftCREATE TABLE set diff between schema.sql and pg_baseline_schema.sql with an allowlist for intentional divergence.
    • validators_test.go — gate-of-the-gate: empty allowlist must produce non-zero exit.
  3. MEDIUM — Dockerfile base images pinned by digest on the ledoent branch.
  4. MEDIUM — git-aggregator pinned to 4.1 in weekly-aggregate.yml.
  5. MEDIUM — sync-upstream.yml hardened: paranoia check refuses to force-push main if there are commits not in upstream/main from anyone other than github-actions[bot].

Driver rewrites (selected)

MySQL pattern PostgreSQL output Notes
UUID_TO_BIN(UUID(), true) gen_random_uuid()
UNHEX(expr) decode(expr, 'hex') paren-balanced
HEX(expr) upper(encode(expr::bytea, 'hex')) paren-balanced, runs after UNHEX
IF(cond, t, f) CASE WHEN cond THEN t ELSE f END paren-balanced
FIND_IN_SET(val, col) > 0 val = ANY(string_to_array(col, ',')) raw SQL paths
COALESCE(token, '') / ds.token / hmae.token COALESCE(..., ''::bytea) token is bytea in PG
GROUP_CONCAT(expr SEPARATOR ',') STRING_AGG(expr::text, ',') paren-balanced
UPDATE t1 JOIN t2 ON ... SET ... UPDATE t1 SET ... FROM t2 WHERE ... paren-balanced
INTERVAL ? SECOND (?::bigint) * INTERVAL '1 second' type inference fix
CAST(NULL AS SIGNED) CAST(NULL AS integer) UNION type resolution
expired = ? expired = (CASE WHEN ?::text = 'true' THEN 1 ELSE 0 END) smallint bool columns
FOR UPDATE (with LEFT JOIN) removed PG forbids FOR UPDATE on outer join nullable side
DELETE FROM t USING t INNER JOIN j ON c WHERE f DELETE FROM t USING j WHERE c AND f duplicate table removal

Other PG-compat fixes in this PR (caught by tools/pg-compat-harness)

  • selectSoftwareSQL GROUP BY: the goqu-built fallback for /software/versions?vulnerable=true (and any non-fast-path filter) called GroupBy(...) after an earlier GroupByAppend(shc.hosts_count, shc.updated_at, ...). In goqu, GroupBy REPLACES the existing clause and dropped the appended columns. MySQL silently tolerates this under relaxed only_full_group_by; PG rejects with SQLSTATE 42803 ("must appear in the GROUP BY"). Fixed at server/datastore/mysql/software.go:2019 by switching to GroupByAppend. Sibling of PG-compat round 11, which patched the same function for hs.last_opened_at but missed shc.hosts_count.
  • Text-column cursor binding: AppendListOptionsWithParamsSecure parsed every numeric-looking cursor as int64 and bound it as such. For text columns (display_name, hostname, etc.) pgx then rejected with cannot find encode plan because the int8 OID didn't match the varchar column. Fix: optional textOrderKeys ...string varargs that mark which keys are text-typed; cursor stays a string for those. Non-breaking — existing callers see no behavior change. Wired into hostAllowedOrderKeys and batchScriptHostAllowedOrderKeys for the keys that needed it. Unit-tested in server/platform/mysql/list_options_test.go (TestAppendListOptionsWithParamsSecure_TextOrderKeyCursorBinding, 4 cases including the non-breaking contract).

Tests

  • 39 PG smoke tests covering hosts, software, vulnerabilities, policies, host-counts.
  • TestPostgresHostSoftwareUpdate (B1 Tier 1) — 5 subtests exercising UpdateHostSoftware end-to-end, including the UPDATE...JOIN path that broke prod in A1.
  • TestCarves (B1 Tier 2) — runs on both MySQL and PostgreSQL via CreateDS(t). Surfaced bool → smallint gap on carve_metadata.expired; fixed via rewriteSmallintBoolColumns.
  • TestScripts (B1 Tier 3) — all 29 subtests converted to CreateDS(t), passing on both backends.
  • B3 baseline tests — parser, version-partition helpers, fresh-apply seeds, idempotent re-apply, drift warning paths.
  • Driver unit testsTestRewriteUpdateJoin, TestRewriteDeleteUsing, TestRewriteGroupConcat, TestResolveOnConflictAmbiguity, TestRewriteSmallintBoolColumns, TestRewriteMaxBoolColumns, TestRewriteIntervalPlaceholder, TestRewriteCastNullAsSigned, TestRewriteFindInSet, TestRewriteCoalesceAliasedToken, TestStripNullBytes.
  • Dialect unit testsTestPostgresDialectSQL and TestMysqlDialectSQL cover all methods including LAST_INSERT_ID stripping, no-op fallback, ReturningID, AtomicTableSwap, CreateTableLike.
  • List-options unit testsTestAppendListOptionsWithParamsSecure_SkipsOrderByOnAggregate (existing) plus new TestAppendListOptionsWithParamsSecure_TextOrderKeyCursorBinding covering the int64-vs-string cursor binding contract.
  • Index translator unit teststools/pg-index-translate/main_test.go drives the parser with inline schema fixtures (no file I/O): plain/unique keys, DESC ordering, USING BTREE suffix, FULLTEXT/SPATIAL/prefix-length/expression skips, balanced parens, and the extractParenBody + quoteIdent helpers.
  • Live API regression harnesstools/pg-compat-harness/ Playwright matrix; see the Extractable upstream contribution section above for coverage details. Reports a single passing line per probe (221+) and per-probe Postgres-error context on failure.
  • Website securityhtmlhint upgraded (0.11.0→1.9.2), yaml upgraded (1.10.2→1.10.3); npm audit step added to test-website.yml CI.
  • BenchmarksBenchmarkUpdateHostSoftware, BenchmarkListSoftware, BenchmarkListHosts in server/datastore/mysql/benchmarks_test.go. Run with POSTGRES_TEST=1 go test -bench=Benchmark -benchtime=5s -count=5 -run=^$ ./server/datastore/mysql/ > /tmp/pg.bench for PG numbers; swap env var for MySQL baseline. Compare with benchstat.

@dnplkndll dnplkndll force-pushed the feat/pg-compat-clean branch 2 times, most recently from eab027d to 6f33aa0 Compare April 22, 2026 21:46
@dnplkndll dnplkndll force-pushed the feat/pg-compat-clean branch 2 times, most recently from c2a8b56 to 6116ebf Compare May 14, 2026 01:21
dnplkndll added 10 commits May 14, 2026 08:55
Adds a Postgres backend to Fleet's datastore alongside the existing
MySQL. Non-breaking: MySQL remains the default and is unaffected.

Core pieces:
  - DialectHelper interface (server/datastore/mysql/dialect.go) abstracts
    SQL dialect differences for upserts, aggregates, JSON ops, error
    classification, and atomic swap-table DDL. mysqlDialect + postgresDialect
    implementations, dialect.IsPostgres() routes runtime branches.
  - pgx-rebind driver (server/platform/postgres/rebind_driver.go)
    transparently translates MySQL SQL to Postgres at query execution time
    via 50+ regex-based rewrites compiled once at startup. Per-table-name
    regexes cached in sync.Map. knownPrimaryKeys map drives ON DUPLICATE
    KEY → ON CONFLICT (<pk>) DO UPDATE rewriting.
  - Embedded PG baseline (server/datastore/mysql/pg_baseline_schema.sql,
    pg_baseline_post.sql) seeded from production pg_dump. Carries a
    pg-baseline-up-to-migration: <ts> marker; fresh-apply seeds
    migration_status_tables from code and logs a loud warning whenever
    code carries migrations newer than the baseline. Object-ownership is
    reasserted on every startup so atomic table swaps work even when the
    baseline was loaded as the postgres superuser.
  - server/goose/migration.go gains UpFnPG / DownFnPG / UpFnMySQL /
    DownFnMySQL fields so individual migrations can target one dialect.
    First user: 20260513210000_AddMissingPGIndexes (this commit).
  - 349 missing PG indexes added via the AddMissingPGIndexes migration
    (UpFnPG-only), bringing PG to index parity with MySQL on hot paths
    like host_software_installed_paths (host_id, software_id).

Wiring:
  - FLEET_MYSQL_DRIVER=postgres selects the new driver; standard
    FLEET_MYSQL_ADDRESS / USERNAME / PASSWORD / DATABASE env vars route to
    the PG cluster unchanged.
  - server/config/config.go validates the new driver value.
  - cmd/fleet/prepare.go threads dialect into the migration apply path.
  - docker-compose.yml gains a postgres service for local dev.

Tests:
  - 39 PG smoke tests (hosts, software, vulnerabilities, policies,
    host-counts) and B1/B2/B3 tiers running on both backends via the new
    CreateDS(t) helper.
  - Driver-rewrite unit tests cover every regex (UPDATE...JOIN,
    DELETE USING, GROUP_CONCAT, ON CONFLICT ambiguity resolution,
    smallint-bool encoding, MAX(bool), INTERVAL placeholder, CAST NULL
    AS SIGNED, FIND_IN_SET, COALESCE token, null-byte stripping, ...).
  - Dialect unit tests for both dialects (LAST_INSERT_ID stripping,
    ReturningID, AtomicTableSwap, CreateTableLike).
  - List-options helper has new coverage for single-aggregate ORDER BY
    skip and text-column cursor binding.
  - Benchmarks for UpdateHostSoftware / ListSoftware / ListHosts in
    server/datastore/mysql/benchmarks_test.go.

Squashed from 70+ incremental commits on feat/pg-compat-clean; full
provenance preserved on feat/pg-compat-clean-backup-2026-05-13.
…p on dep-review

CI infrastructure that gates the PG backend:
  - test-go-postgres.yaml: spins up Postgres in a service container, runs
    the full datastore + service test suites against the PG driver. Mirrors
    the existing MySQL test workflow.
  - validate-pg-compat.yml: invokes the tools/pgcompat validators on every
    PR/push — check_primary_keys, check_schema_drift, check_column_drift.
    Empty-allowlist gate-of-the-gate test ensures the validators themselves
    can never become a no-op.
  - build-ledo.yml: ledoent-specific image build that refuses to publish to
    ghcr.io unless both test-go-postgres and validate-pg-compat succeeded
    on the build SHA.
  - sync-upstream.yml: paranoia check that refuses to force-push ledoent/main
    if any non-bot commits exist outside upstream/main.
  - weekly-aggregate.yml: gitaggregate cron + workflow_dispatch, pinned to
    git-aggregator==4.1.
  - dependency-review.yml: skip on private repos (the action requires
    GitHub Advanced Security which isn't available on free private mirrors).
    Upstream public fleetdm/fleet still runs it.
  - test-website.yml: npm audit step added so frontend dep regressions
    block PRs.
  - tools/ci/apiparamcheck: custom golangci-lint plugin that flags REST
    handler params not registered in the request struct, catching the
    'missing query param decode' class of bug.
…rift

Three small static-analysis tools that prevent silent PG-compat regressions.
None require a running Postgres; they read Go source and SQL schema files.

  - check_primary_keys: scans non-test Go for raw 'ON DUPLICATE KEY UPDATE'
    SQL and verifies every targeted table has an entry in knownPrimaryKeys
    (the map in server/platform/postgres/rebind_driver.go that drives the
    ON CONFLICT (<pk>) DO UPDATE rewrite). Missing entries produce invalid
    PG SQL at runtime.
  - check_schema_drift: diffs CREATE TABLE identifier sets between
    server/datastore/mysql/schema.sql (MySQL canonical) and
    pg_baseline_schema.sql (PG baseline). known_schema_diff.txt records
    intentional divergence and is itself validated — stale entries fail.
  - check_column_drift: diffs column lists per shared table. Optional
    allowlist via known_column_drift.txt.
  - gen_identity_cols / gen_bool_cols: code generators that produce the
    Postgres dialect's static knowledge of IDENTITY columns and bool
    columns so the rebind driver can rewrite INSERTs correctly.
  - validators_test.go is a gate-of-the-gate: an empty schema-diff
    allowlist must produce a non-zero exit.

Designed to be extractable as a standalone PR to fleetdm/fleet — they're
useful to any Fleet operator building PG support, with or without the
larger driver/baseline layer.
Playwright API-mode test matrix that exercises every URL filter Fleet's
frontend can construct against a live server, asserting each response is
not a Postgres-driver or Postgres-syntax failure (SQLSTATE, 'must appear
in the GROUP BY', 'operator does not exist', 'cannot find encode plan',
'syntax error', etc.).

Read-only (HTTP GET only). ~220 probes in ~15s with 8 workers.

Coverage:
  - /hosts + /hosts/count: status, low_disk_space, mdm_enrollment_status,
    os_settings/apple_settings/disk_encryption/bootstrap_package, populate_*,
    every ORDER BY allowlist key × direction, cursor pagination (after=),
    vulnerability filter, search.
  - /software/versions, /software/titles, /software (deprecated):
    vulnerable, exploit, cvss range, self_service, available_for_install,
    packages_only, team filtering, ordering.
  - /vulnerabilities, /host_summary, /labels/:id/hosts, /hosts/:id/*,
    sanity endpoints (/config, /version, /me, /labels, /teams, ...).

Run:
  cd tools/pg-compat-harness
  yarn install
  export FLEET_URL=https://your-fleet
  export FLEET_TOKEN=$(awk '/token:/ {print $2}' ~/.fleet/config)
  yarn test

This harness found and gated the GROUP BY and cursor-encoding regressions
fixed elsewhere in this branch (selectSoftwareSQL GroupByAppend,
AppendListOptionsWithParamsSecure textOrderKeys hint).
Small Go program that parses server/datastore/mysql/schema.sql and emits
one CREATE INDEX IF NOT EXISTS statement per MySQL KEY / UNIQUE KEY clause,
suitable for embedding into a PG-only migration.

Handles:
  - balanced parens in column lists (expression bodies)
  - USING BTREE / USING HASH suffix (MySQL hint, PG ignores)
  - DESC column ordering (PG supports natively)
  - identifier quoting where required
  - stable per-table grouping for reviewable diffs

Deliberately skips with explicit reasons:
  - PRIMARY KEY (the CREATE TABLE handles it)
  - FULLTEXT KEY, SPATIAL KEY (need pg_trgm / GiST equivalents)
  - prefix-length indexes col(N) (need PG expression indexes)
  - expression indexes using MySQL-specific functions (ifnull, cast as
    signed) that need PG translation (COALESCE, CAST AS integer)

main_test.go drives translate() from inline schema fixtures — no file I/O
required. Covers plain/unique keys, DESC, USING BTREE, every skip reason,
balanced-paren edge cases, multi-table, PRIMARY ignored, plus unit tests
for extractParenBody and quoteIdent helpers.

Usage:
  go run ./tools/pg-index-translate \
    -in  server/datastore/mysql/schema.sql \
    -out server/datastore/mysql/migrations/tables/{ts}_AddMissingPGIndexes.sql
  - docs/Deploy/postgresql.md: end-to-end guide for running Fleet against
    Postgres — connection env vars, baseline schema apply, migration
    apply, ownership reassertion, troubleshooting (drift warning, must
    be owner of table, schema/column drift validator output).
  - docs/Deploy/README.md: links the new guide from the deployment index
    alongside the MySQL guide.
GetDBVersion returned a too-old current version on production PG because
the baseline-seed path (and goose's own run-and-record loop for newly
introduced migrations) inserted rows into migration_status_tables out of
version_id order. Concretely, id 523 carried version 20260422181702 while
id 521 carried 20260506171058. Plain 'ORDER BY id DESC' picked the
older version, so 'fleet prepare db' tried to re-run every migration
from 20260423161823 onward and failed on json_merge_patch — a MySQL-only
function that PG never had, with the migration body long since folded
into the embedded baseline.

Switching to 'ORDER BY version_id DESC, id DESC' makes the query
immune to insertion order while preserving up/down semantics: the
tie-break by id DESC keeps the most recent applied/rolled-back state
for the same version. MySQL is unaffected — its migration runner
always applies in monotonic version order so id and version_id stay
aligned. We do not change the MySQL dialect to keep blast radius
minimal; that path has years of behavior to preserve.

Test pins the exact ORDER BY clause via sqlmock so any future change
back to the buggy form fails CI loudly.
…ces/views

pg_baseline_post.sql already loops over public tables, sequences, and
views and reasserts ownership to current_user, but it skipped functions.
On baselines that were loaded by the postgres superuser (typical on
self-hosted PG), CREATE OR REPLACE FUNCTION later in the same file
errored with 'must be owner of function fleet_set_updated_at' — the
application user can't replace something it doesn't own.

Add a fourth loop using pg_proc / pg_namespace to enumerate public
functions whose owner is not current_user, and ALTER FUNCTION ... OWNER
TO current_user with the standard insufficient_privilege fallback.
pg_get_function_identity_arguments() disambiguates overloaded
signatures.

Hit in production tonight on the AddMissingPGIndexes deploy. With this
fix every future fleet prepare db on a postgres-superuser-loaded
baseline succeeds without manual ALTER FUNCTION.
The existing implementation already sorts the seeded versions ascending
(via versionsAtOrBelow → partitionMigrationVersions → slices.Sort), so
PG assigns auto-increment ids in the same order as version_id. That
property is load-bearing for any downstream consumer that infers
'current version' from MAX(id), even with the dialect query now
correctly ordered by version_id DESC.

No functional change — just document the invariant so a future refactor
doesn't quietly drop the sort.
Required by TestVersionsAbove_EmbeddedBaselineCoversAllCode now that
AddMissingPGIndexes (20260513210000) ships in code. Dump source is
fleet.hz.ledoweb.com fleet-db-1, which has all 532 indexes applied
(11 from the original baseline + 521 added by AddMissingPGIndexes
either via the SQL we ran manually tonight or via the migration on
future fresh applies). check-pg-compat validators pass:

  schema-drift:   202 MySQL tables / 205 PG tables in sync (after allowlist)
  primary-keys:   every ON DUPLICATE KEY UPDATE site covered
  column-drift:   no drift between schema.sql and pg_baseline_schema.sql

Generated via the documented procedure in the file's header:

  kubectl exec -n fleet --context hetzner-ledo fleet-db-1 -- \
    pg_dump -U postgres -d fleet --schema-only --no-owner --no-privileges

Stripped the pg_dump-17 \restrict/\unrestrict meta-commands and the
SET search_path='' line per the same header comment. Header preserved
with the regen recipe and verification commands.
@dnplkndll dnplkndll force-pushed the feat/pg-compat-clean branch from dbcec59 to 6062962 Compare May 14, 2026 12:57
@dnplkndll
Copy link
Copy Markdown
Author

Superseded by #6, which carries the full PG-compat layer (773 files vs 171 here, current scope). Closing so #6 can retarget to main.

@dnplkndll dnplkndll closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant