feat(coldfront): control-plane support for ColdFront (single-node)#421
Draft
dpage wants to merge 1 commit into
Draft
feat(coldfront): control-plane support for ColdFront (single-node)#421dpage wants to merge 1 commit into
dpage wants to merge 1 commit into
Conversation
Add the control-plane side of ColdFront transparent data tiering: deploy and bootstrap the Lakekeeper Iceberg catalog per database, load the extension config, and schedule the tiering jobs. Consumes the `lakekeeper` service the saas control plane sends. This is the single-node scope; it is one of several per-repo PRs for the feature. Included: - Register the `lakekeeper` service type (image, launch on port 8181, config resource, validator, Goa enum), following the MCP recipe. - External Lakekeeper catalog Postgres via a configurable connection URL (Cloud supplies a managed instance; the control plane does not provision it), with a `migrate`-before-`serve` dependency and fail-loud if the URL is absent. - Post-deploy bootstrap: idempotent Lakekeeper REST warehouse creation (bootstrap -> warehouse -> namespace) with the correct S3 storage-profile (`flavor`/`path-style-access`/`key-prefix`), and `coldfront.set_storage_secret` / `_azure` on the database with the object-store credential bound as query arguments (never interpolated or logged). - Schedule the archiver/partitioner/compactor via the existing gocron/etcd scheduler, running each single-pass in the primary node's Postgres container and capturing the exit code (recorded as `task.TypeTiering`); the archiver's "no tables configured" exit is treated as benign. - Reject enabling ColdFront on a multi-node database (fail-loud), pending the deferred mesh `snowflake.node` reconciliation. Deferred to follow-ups (see PR description): the per-node mesh GUCs for multi-node ColdFront (needs a CP + ColdFront-author decision on `snowflake.node` ownership); expansion of the saas lakekeeper contract (`catalog_db_url`, `pg_encryption_key`, `provider`/`bucket`/`region`/ `endpoint`); the ColdFront-enabled Postgres image; and confirmation of the pinned Lakekeeper image tag.
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Security | 1 critical (1 false positive) |
| Complexity | 13 medium |
🟢 Metrics 184 complexity · 36 duplication
Metric Results Complexity 184 Duplication 36
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
The control-plane side of ColdFront (transparent Postgres→Iceberg data tiering), consuming the
lakekeeperservice the saas control plane sends. This is the single-node scope and one of several per-repo PRs for the feature; it is not end-to-end usable alone (see Dependencies & deferred).What's included
lakekeeperservice type — image (quay.io/lakekeeper/catalog), launch recipe (serve, port 8181), config resource, validator, Goa enum; follows the MCP recipe indocs/development/supported-services.md.migrate→serveis enforced as a resource dependency; missing catalog config fails loudly.bootstrap→warehouse→namespace) with the correct S3 storage-profile (flavor/path-style-access/key-prefix, verified against the ColdFront docs), andcoldfront.set_storage_secret/_azureon the database. The object-store credential is bound as query arguments (never interpolated into SQL, never logged); signatures matchcoldfront--1.0.sql.task.TypeTiering, following the pgBackRest schedule precedent). The tables to tier are resolved by the binaries from the DB registry (coldfront.partition_config, customer-driven), so no table list is passed. The archiver's "no tables configured" exit is treated as benign.Ordering & safety
migrate → serve (health-gated) → REST bootstrap (blocking, after healthy serve) →
set_storage_secret(after thecoldfrontextension exists) → scheduled jobs. All enforced by real resource dependencies. Credentials live in etcd resource state / job args exactly as existing services (RAG keys,ServiceSpec.Config) do — no new plaintext-at-rest or plaintext-in-logs exposure. Everything is runtime-gated by the (unpublished) ColdFront Postgres image, so no unsafe partial state is reachable today.Dependencies & deferred (follow-ups — not in this PR)
warehouse/path_prefix/credentialin the lakekeeperServiceSpec.Config. For this to function it must also supplycatalog_db_url,pg_encryption_key, and the store coordinatesprovider/bucket/region/endpoint(all resolvable from thecoldfront_storerecord). The control plane fails loudly where these are absent.snowflake.node) reconciliation: ColdFront's bakery requiressnowflake.node = hashtext(spock_node_name)&1023, which conflicts with the control plane's ordinal-basedsnowflake.node(Spock/lolor). Solvable in principle (CP'ssnowflake.nodevalue is consumed only by the snowflake/lolor extensions), but the clean fix needs a CP + ColdFront-author decision (likely a small ColdFront upstream change) plus a node-name hash-collision check. Hence single-node-first here.v0.9.0(currently a plausible placeholder).coldfront/localhost DSN used by the tiering binaries is an implicit contract with the image; and a ColdFront upstream benign-exit-code would let us stop keying the archiver's empty-run detection on log text.Testing & review
Built task-by-task with TDD; unit-tested throughout (exit-code capture, REST bootstrap ordering/idempotency, per-provider
set_storage_secret, fail-loud config, multi-node rejection). Contract details (SQL signatures, S3 warehouse profile) were verified against the pgEdge/coldfront source.go build ./...clean; the Goa regen is minimal (canonicalised via the pinned goa v3.23.4 + yamlfmt v0.21.0 under go1.25.8). Real end-to-end awaits the ColdFront image.