DNM: complete cluster autoscaling branch, for CI and reference#36738
Draft
aljoscha wants to merge 17 commits into
Draft
DNM: complete cluster autoscaling branch, for CI and reference#36738aljoscha wants to merge 17 commits into
aljoscha wants to merge 17 commits into
Conversation
d9d177f to
69060d0
Compare
d5de464 to
135056a
Compare
7403ecf to
dae3f3e
Compare
…ss tracker Companion to the cluster autoscaling design doc: a staged 7-PR plan and live progress tracker, with an operating protocol for implementation sessions, the controller boundary and gating model, codebase anchors, and per-PR scope and checklists. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the additive, behaviourally-inert durable state the cluster controller will need, with one catalog migration (v85->v86) defaulting it for existing clusters. No new field is read, so this is dark. On the managed cluster config: `auto_scaling_strategy` (user policy), `reconfiguration`, and `burst` (in-flight runtime records), all `Option`, `None` by default. On the managed replica location: collapse the single `availability_zone` user-pin into an `availability_zones: Vec<String>` recording the zones the replica was provisioned under -- a managed cluster's AVAILABILITY ZONES pool, or an unmanaged replica's pin as a zero-/one-element list. The controller needs this durable to tell realized- from target-shape replicas by config shape (including an AVAILABILITY ZONES divergence). A single-AZ field and a separate provisioned-list field would be mutually exclusive and both collapse to a list at the orchestrator, so one list is the honest shape. The migration backfills managed-cluster replicas from their cluster's `availability_zones` and unmanaged-cluster replicas from their pin. Concretization stays inert: it re-derives the managed pool from the cluster's current config and reads the durable list as the pin only for unmanaged clusters, so the new managed-replica list is written but not yet read. The in-memory `ManagedReplicaAvailabilityZones` enum is unchanged here -- it is the discriminator the persistence path uses, and can only be simplified once this durable field stores the list unconditionally, which it now does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ManagedReplicaAvailabilityZones` distinguished a managed cluster's `AVAILABILITY ZONES` pool (`FromCluster`) from an unmanaged cluster's single user pin (`FromReplica`), but both collapse to a list of acceptable zones at the orchestrator, the distinction is recoverable from whether the owning cluster is managed, and the durable replica record now stores a single `availability_zones` list. Replace the enum on `ManagedReplicaLocation` with a bare `Vec<String>` (empty = no constraint). Now that the durable field stores the list unconditionally, the in-memory->durable `From` is a passthrough and concretization fills the list directly -- re-deriving the managed pool from the cluster config, reading the durable list as the pin for unmanaged clusters -- so this stays behaviour-preserving. The orchestrator maps an empty list to "no constraint"; the convert-to-managed check iterates the pin(s); and the `mz_cluster_replicas.availability_zone` column is unchanged -- it still surfaces only an unmanaged replica's single pin, now derived from the list plus the cluster's managed-ness, with a comment recording what a future plural `text list` column would entail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iring Stand up the cluster controller end to end with only the implicit baseline strategy, so the reconcile loop runs but is a no-op for steady-state clusters. Establish the task boundary, the input/decision types, the compare-and-append apply path, and the master gate -- all dark. New pure crate `mz-cluster-controller` (no adapter/catalog dep, no new third-party license): * `ClusterControllerCtx` -- the strategy-agnostic pull/apply seam. Reads are batched and pulled on demand (`managed_cluster_ids`, `cluster_states` with a latched `now`, and a `collections_hydrated_on_replicas` method that exists for the strategies that follow but goes unused here); the single `apply` transacts a tick's batch under a compare-and-append guard. The controller depends on exactly this trait, which is what makes it testable against a fake impl and extractable later without touching controller code. * The pure `Strategy` trait (`update_state` / `desired_replicas`) and the implicit `BaselineStrategy`, which desires `replication_factor` replicas at the realized cluster shape. Baseline-only means desired equals realized, so a steady managed cluster reconciles to no decisions. * The reconcile kernel: phase 1 unions every strategy's `update_state` per cluster and applies under the guard (a rejected cluster is skipped this tick); phase 2 re-reads, unions `desired_replicas` (multiset union is max-per-shape, not sum, so a replica survives iff some strategy desires its shape), matches by `ReplicaShape` against the actual replicas, and emits the creates and drops that close the gap, with per-create strategy attribution. Phase 2 reuses the phase-1 read when no state was written, keeping the barrier only for the writing strategies that follow. Every decision -- the `UpdateClusterState` state writes and the create/drop batch alike -- carries the `ExpectedClusterState` it was diffed against. The apply path re-reads each target cluster's durable config and records and rejects the whole batch on any mismatch, so a stale create or drop can never reshape the replica set against a config a concurrent `ALTER` has since established; the controller recomputes on the next tick. Adapter driver `coord/cluster_controller.rs` is the other half of the seam. It runs the controller as a separate task whose `CoordCtx` marshals each batched pull/apply to the coordinator loop over `internal_cmd_tx` plus a oneshot (`Message::ClusterControllerRequest`), because the catalog and live signals are reachable only from that loop. On a held guard it builds `Op::UpdateClusterConfig` (cut-over / record write), `Op::CreateClusterReplica` (reusing replica-location concretization), and `Op::DropObjects`, transacted together. The interim create/drop audit reason is `Manual`; the attribution-carrying controller reason lands with the graceful strategy. Add dyncfgs `enable_cluster_controller` (default false) and `cluster_controller_tick_interval` (default 5s), both re-read each tick: a runtime flip of the gate needs no restart, and the cadence is a live operational knob. With the gate off, reads report no clusters and applies reject, keeping the controller fully inert and every legacy path unchanged. The frontier and read-timestamp reads the controller will also need are left to their first consumer (the graceful and on-refresh strategies): their signatures are dictated by that consumer, and declaring them speculatively would fix the wrong shape and pull an unused frontier dependency into the pure crate. They land the same pull-on-demand way as the hydration read. Tested by boundary tests against a fake `ClusterControllerCtx` (steady no-op, under/over-provision, wrong-shape drop+create, union max-not-sum, distinct-shape attribution, and compare-and-append reject-and-recover against a state a concurrent `ALTER` changed, for both the state-write and create/drop guards) and an slt (`cluster_controller.slt`) that forces the gate on, drives the tick interval to 5ms, waits across hundreds of ticks, and asserts a steady managed cluster's replica ids and names are unchanged -- an assertion gate-off behaviour, which never runs the loop, cannot satisfy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + wait-shim
Move graceful (zero-downtime) cluster reconfiguration into the cluster
controller as a pure strategy, driven by the durable `reconfiguration` record,
with hydration-aware cut-over and a durable, honored timeout. Everything lands
dark behind the `enable_cluster_controller` master gate; the legacy 3-stage
machine still runs when the gate is off.
Strategy (mz-cluster-controller). New pure `GracefulReconfigurationStrategy`,
engaged whenever the `reconfiguration` record is present. `desired_replicas`
contributes `target.replication_factor` replicas at the target shape (size,
logging, AZ list) on top of the baseline's realized set -- the hydrate-overlap.
`update_state` cuts the realized config over to the target and clears the
record once those replicas are all present and hydrated; success takes
precedence over the deadline. Past the deadline with the target not fully
hydrated it applies the record's `on_timeout`: `ROLLBACK` (the default) drops
the target set and keeps the record as a tombstone that parks the strategy;
`COMMIT` cuts over to the still-unhydrated target and clears it.
Hydration seam. New `ClusterControllerCtx::hydrated_replicas(cluster, replicas)
-> BTreeSet` ("which of these replicas have all current collections hydrated"),
the shape its only consumer needs and that the underlying controller APIs can
express. The controller pulls it on demand -- only while a reconfiguration is
in flight -- into the live-signal field `ClusterState::hydrated_replicas`
(excluded from the compare-and-append witness). The adapter driver backs it
per-replica against the compute and storage controllers, which collapse a
replica list to a single "hydrated on any" bool.
ALTER reshape (gated). With the master gate on, a managed-cluster `ALTER` that
changes a replica's config shape (SIZE, logging, AVAILABILITY ZONES) -- or any
`ALTER` while a record is already in flight -- writes/folds the
`reconfiguration` record onto the realized config and leaves the realized shape
in place; the controller converges and cuts over. A fold overlays the `ALTER`
onto the in-flight target per dimension: a dimension the `ALTER` sets
re-targets, one left `Unchanged` keeps the in-flight target's value (seeding
`Unchanged` dimensions from the realized config would silently revert the
in-flight transition, since the realized config only advances at cut-over),
while the deadline and `on_timeout` are replaced wholesale by the latest
`ALTER`'s. Non-shape changes with no record in flight keep updating the
realized config directly. The deadline is `now + timeout` and `on_timeout` is
threaded from the existing `WITH (WAIT ...)` clause: `WAIT UNTIL READY (TIMEOUT,
ON TIMEOUT ...)` verbatim, `WAIT FOR` desugars to `ON TIMEOUT COMMIT`, and
omitting `WAIT` falls back to the `default_cluster_reconfiguration_timeout`
dyncfg and the default action. The planner's implicit `OnTimeoutAction::default()`
flips `COMMIT`->`ROLLBACK` globally -- the safe default that never silently
induces downtime by cutting over to an un-hydrated target -- and the legacy
foreground path reads the same `default()`.
Wait-shim. New `ClusterStage::AwaitReconfiguration` polls the durable record
until it clears (done) or its deadline passes (timeout); since the strategy
still cuts over past the deadline once hydrated, the shim grants one grace
re-poll before surfacing `AlterClusterTimeout`. With the new
`enable_background_alter_cluster` dyncfg on, `ALTER` returns immediately
instead. Session disconnect no longer aborts a reconfiguration.
Audit. New `ReplicaCreateDropReason::GracefulReconfiguration` ->
`CreateOrDropClusterReplicaReasonV1::Reconfiguration`, carried on the
controller's graceful-desired replica creates; the audit proto enum is added in
place in the unshipped v86 snapshot.
Design/plan. Settle the per-`ALTER` timeout surface in the design doc and
tracker: keep `WITH (WAIT ...)` as the permanent spelling (it already carries
both the deadline and the on-timeout action) and record `on_timeout` as a
durable, controller-honored knob defaulting to `ROLLBACK`.
Tests: graceful kernel/flow cases in mz-cluster-controller (in-flight desire,
cut-over, partial hydration, timeout-vs-hydrated precedence, timeout park,
`COMMIT`- vs `ROLLBACK`-at-timeout, AZ-only shape change, full overlap then
cut-over, ALTER-back, and the `fold_reconfiguration_target` overlay), FakeCtx
seam tests that drive reconcile end-to-end past a forced deadline, and an
extended `cluster_controller.slt` asserting a background `ALTER` cuts the
realized size over and that the omitted/`COMMIT`/`ROLLBACK` spellings each drive
a record under the gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…udit lifecycle Surface in-flight cluster reconfigurations in SQL so a background ALTER CLUSTER is observable, and record the reconfiguration lifecycle in the audit log. All dark: the durable reconfiguration/burst records only ever move under the `enable_cluster_controller` master gate, so for an ordinary cluster the new introspection reports current == target with nothing in flight. Introspection view. New builtin materialized view `mz_internal.mz_cluster_reconfigurations` -- one row per managed cluster, computed in `mz_catalog_server` from the raw catalog (`mz_catalog_raw`): the realized managed config and the durable `reconfiguration`/`burst` records yield current vs. target size / replication factor / availability-zone list, an in-flight flag, the active deadline, and a placeholder `burst_size` column for the burst strategy. Indexed on `cluster_id` in `mz_catalog_server`. Deriving it from the catalog rather than imperatively packing builtin-table rows keeps the relation a pure function of durable state; a new builtin needs no migration. SHOW CLUSTERS. `mz_show_clusters` LEFT JOINs the new relation to add `current_size`, `target_size`, and `reconfiguration_in_flight`, so a single SHOW CLUSTERS answers "what's there now", "what is it moving to", and "is something in flight". The view is indexed in `mz_catalog_server`, so it stays non-temporal; the timed-out-vs-in-progress split is read from `reconfiguration_deadline` rather than computed with `mz_now()`. Audit lifecycle. New `EventDetails::AlterClusterReconfigurationV1` records a started / finalized / timed-out / cancelled transition with the target shape and, where it applies, the active deadline. It is emitted from the single `Op::UpdateClusterConfig` durable write site, classified purely from the before/after `reconfiguration` record and the write timestamp vs. the deadline: a record write or re-target is `started`, a hydrated clear (or any clear under `ROLLBACK`) is `finalized`, a `COMMIT`-on-timeout clear past the deadline is `timed-out`, and an ALTER-back whose new target equals the realized shape is `cancelled` -- all without adding vocabulary to the controller seam. A clear is `timed-out` only when the record's `on_timeout` is `COMMIT`, so a hydrated-but-late success is never mislabeled. The proto variant is added in place to the unshipped v86 snapshot. Tests: a `classify_reconfiguration_transition` unit test in mz-adapter; an observability section in `cluster_controller.slt` and gate-off column assertions in `show_clusters.slt`; and the catalog-snapshot slt/testdrive expectations and the `mz_internal` system-catalog doc updated for the new relation, its index, and the SHOW CLUSTERS columns. Deferred: the ROLLBACK-timeout audit event -- it performs no config write, so it would need a controller-seam signal or a durable tombstone stamp; the rollback is already visible via the tombstoned record and the none-desired replica drops. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the existing `ON REFRESH` cluster scheduling into the cluster controller as a pure `OnRefreshStrategy`, so the controller framework owns all three existing behaviours (baseline, graceful reconfiguration, on-refresh). Everything lands dark behind the master gate: the legacy `cluster_scheduling.rs` policy still runs when `enable_cluster_controller` is off and the strategy is exercised only with the gate forced on in tests. The strategy contributes one replica at the cluster's realized shape while the cluster is inside a refresh window and nothing otherwise. The window decision (an MV still needs a refresh, or is estimated to still need Persist compaction) is ported verbatim from `check_refresh_policy`. A scheduled cluster's `replication_factor` is the controller's domain, so `update_state` normalizes it to 0 at runtime — self-healing, no migration — and the implicit baseline contributes nothing on a scheduled cluster, leaving the on-refresh strategy as the sole contributor there. The refresh-window signals (read timestamp, compaction estimate, bound REFRESH MV write frontiers and schedules) are pulled through a new `refresh_window_inputs` ctx method, on demand and only for scheduled clusters, the same pay-for-what-you-use way as hydration. The MV write frontier is modeled as the antichain's single upper bound, which keeps the pure crate free of a direct timely dependency while preserving the exact frontier semantics. The cluster schedule is added to the compare-and-append witness so a concurrent `SET (SCHEDULE = ...)` rejects an in-flight on-refresh decision. The legacy `check_scheduling_policies` / `handle_scheduling_decisions` no-op when the controller is enabled, so the two never both write a scheduled cluster's replica set; the legacy path stays for gate-off and is removed in the final cleanup PR. Tests: ten on-refresh kernel and seam tests in `mz-cluster-controller` (window in/out decision, the read-ts boundary, the hydration estimate and compaction windows, replication-factor normalization, and end-to-end create/drop through a fake ctx), and an ON REFRESH section in `cluster_controller.slt` with the gate forced on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, break-glass PR 6 of the cluster-autoscaling plan: the hydration-burst capability, the last feature PR, landing dark behind its gates. SQL surface. New `AUTO SCALING STRATEGY = (ON HYDRATION (HYDRATION SIZE = '...', LINGER DURATION = '...'))` cluster option at CREATE CLUSTER and ALTER CLUSTER SET/RESET, with a dedicated `ClusterAutoScalingStrategyOptionValue` AST node, the `AUTO`/`SCALING`/`LINGER`/`DURATION` keywords, and the parser/planner threading it into the durable `ClusterVariantManaged.auto_scaling_strategy` field. SHOW CREATE CLUSTER renders it. Acceptance is gated by the item-parsing `enable_auto_scaling_strategy` feature flag (not a dyncfg) so a stored statement re-parses at catalog rehydration. Validations reject HYDRATION SIZE equal to the cluster SIZE and AUTO SCALING STRATEGY combined with a non-MANUAL SCHEDULE. Strategy. New pure `HydrationBurstStrategy`: while the cluster carries an ON HYDRATION policy, the break-glass flag is on, the cluster is On, and no steady-state replica is hydrated, it writes a durable `burst` record and desires one extra replica at the hydration size; it stamps the steady-hydration time, tears the record down a linger after, re-arms if the steady set un-hydrates, and clears a stale record whenever a burst is no longer warranted. The seam gains the `AutoScalingPolicy` mirror on the cluster state (and the compare-and-append witness) plus the `enable_hydration_burst` break-glass and `default_hydration_burst_linger` config signals; the controller now probes hydration when a burst is in flight or warranted. Observability and audit. Burst create/drop carry a new `HydrationBurst` reason; a new `ClusterHydrationBurstV1` started/finished lifecycle event is classified at the cluster-config write site. Both proto variants are added in place to the unshipped v86 snapshot. The PR-4 `mz_cluster_reconfigurations.burst_size` column already surfaces the record. All of this is dark: the gates default to the no-burst state and existing tests pass unchanged. Tests: kernel arm/linger/re-arm/teardown/break-glass cases and a fake-ctx seam test in mz-cluster-controller; parser testdata for the new option; a burst section in cluster_controller.slt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flip enable_cluster_controller and enable_background_alter_cluster on for the test harnesses only -- mzcompose get_minimal_system_parameters and the sqllogictest binary's system-parameter defaults -- so CI exercises the controller owning the managed-cluster replica set fleet-wide. The production dyncfg defaults stay false: the feature still lands dark, and the real default flip is a separate rollout commit after a prod bake. With the gate on in CI, the legacy tests that assert behavior the controller does not reproduce are migrated to controller behavior: - managed_cluster.slt: a SIZE change reshapes in the background and advances the realized config only at cut-over, so the surviving replica churns its name across reshapes (r1 -> r2 -> r3). - materialized_views.slt: the controller owns a scheduled cluster's replica set, holding the realized replication_factor at 0 and toggling a single replica in and out of the refresh window, so the scheduling shows up as ordinary manual-reason replica create/drops without the legacy per-policy scheduling-decision audit detail. - test/cluster: graceful reconfiguration writes a durable record that resumes and completes across an environmentd restart, and the controller never creates a legacy "-pending" replica. The async-awareness these and other tests also need (driving the controller tick down and waiting for replica-set reconciles, which are correct with the gate on or off) rides with the respective feature commits; this commit carries only the changes that are incompatible with the legacy behavior, alongside the CI flag flip that makes them apply.
A product-facing writeup of the new cluster-controller capabilities: background graceful reconfiguration, hydration-burst autoscaling, and the new introspection surfaces, plus the user-facing side-effect changes and rollout gates. Every SQL/output pair was captured verbatim from a local environmentd built from this branch; pm-showcase-replay.sh regenerates all transcripts end-to-end (including the mid-reconfiguration restart) against a fresh environment. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…burst arming) Records the three rough edges found while live-testing for the PM showcase and the workshopped fixes as a new tracked stage: (8a) controller drops audited with a new uniform 'retired' reason and on-refresh creates restored to 'schedule'; (8b) a durable rolled_back_at stamp on the reconfiguration record, an event-vocabulary re-carve (finalized = cut over, timed-out = parked), and a 'state' column on mz_cluster_reconfigurations — closing PR 4's deferred ROLLBACK-timeout item; (8c) burst arming gated on the existence of an un-hydrated object, so empty clusters never burst. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Spec'd, workshopped, and implemented 2026-06-10; the tracker records the settled decisions, the implementation checklist, and the dissolution of PR 8a + PR 9 into fixup! commits against their mainline PR targets. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A ROLLBACK at the deadline previously performed no durable write: the
graceful strategy just stopped desiring the target replicas, the record
stayed behind as an unmodified tombstone, and the audit trail ended at
'started'. The strategy now clears the reconfiguration record on a tick
past the deadline with the target un-hydrated (success precedence
unchanged: a hydrated target still cuts over first), leaving the
realized config untouched. No tombstone is retained: a rolled-back
cluster reads settled in mz_cluster_reconfigurations and SHOW CLUSTERS,
and the timeout's papertrail is the audit event.
The audit vocabulary is re-carved around the clear: only the controller
clears a record, and a clear either advanced the realized config to the
cleared record's target ('finalized' — a hydrated success under either
action, or a forced COMMIT past the deadline) or left it short
('timed-out' — the rollback). Both carry the record's deadline, so a
late/forced cut-over is distinguishable from an in-time one. This
closes the deferred ROLLBACK-timeout audit event and removes the
documented COMMIT-late-success imprecision, with no catalog schema
change.
A cleared record is no longer conclusively success, so the foreground
wait-shim now carries the ALTER's target and, when the record clears,
reports success iff the realized config reached it, AlterClusterTimeout
otherwise.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
doc: consolidate rollback/audit semantics in autoscaling design
The tombstone -> clear-and-audit model change had its semantics
re-explained across several sections. Keep the full mechanism in its two
canonical homes — the ROLLBACK case under Failure handling and the audit
event list under Observability — and trim the echoes:
- Graceful reconfiguration strategy: drop the audit-papertrail aside; the
strategy bullet only needs the mechanism.
- Introspection view: keep "reads settled", drop the restated audit
division.
- Notable user-facing changes: cut the internal mechanism (record clear,
durable-outcome read), keep the user-observable fact.
- Align the audit-log prose to the `timed-out` event name (was the stale
"timeout-fired").
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…drate The burst strategy armed whenever no steady replica reported fully-hydrated, which is also true when the replica simply does not exist yet or has not registered with the compute controller — so a brand-new cluster with an AUTO SCALING STRATEGY burst at creation, before any object existed, wasting boot time plus the linger duration of an extra replica. Arming is now existential: a burst is warranted iff there exists an object on the cluster that no steady-state replica has hydrated. With zero objects the condition is vacuously unsatisfied, so the empty-cluster case needs no special-casing. The controller pulls a has_hydratable_objects signal through the ctx on demand, answered from the cluster's bound objects filtered to dataflow-backed items (webhook sources excluded) — a catalog-level approximation of what the per-replica hydration check counts, whose mismatches self-heal through the linger path. The signal is a live input excluded from the compare-and-append witness, and doubles as a probe gate: an object-less cluster skips the per-replica hydration probe entirely. A crashed or restarting steady replica on a cluster with objects still bursts, which is intended. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ified Re-ran the full PM-showcase replay on a fresh environment with the PR 8 build: controller drops audit as 'retired', a rollback-at-deadline reads started -> timed-out -> cancelled -> finalized with the deadline on every transition and the new 'state' column live, and the burst audit shows a single started/finished pair (no at-creation burst). Updates the showcase report's rough-edges section to record the fixes and what a current build prints where its transcripts predate them; marks PR 8 complete in the plan tracker. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Re-run the full showcase replay now that the rough edges the original run surfaced are fixed, and fold the new behavior into the document: - The timeout-rollback section shows the controller rolling back on its own at the deadline — settled SHOW CLUSTERS, no parked record to clear, and the started/timed-out audit pair carrying the deadline and the abandoned target. - Scenario 2 demonstrates that an empty cluster does not burst (burst arming is existential), and the burst audit trail is a single started/finished pair. - The "rough edges found during this run" section is gone; transcripts now show retired drops and the rest of the fixed behavior directly. - All timestamps and timings refreshed from the new run. The replay script follows: the timeout capture queries the timed-out audit event, the obsolete clear-parked-record step is removed, and the at-creation-burst settle poll is replaced by a no-burst capture. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lumn
Rework the in-flight reconfiguration introspection surface into two sparse,
JSON-forward base relations, and collapse the SHOW CLUSTERS additions into a
single column.
mz_cluster_reconfigurations keeps its name, OID, and mz_catalog_server
index but is reshaped: a row only while a graceful reconfiguration is in flight,
with typed deadline and on_timeout columns and the full target shape as a jsonb
column. The realized config already lives in mz_clusters, so the relation
carries only the in-flight delta.
mz_cluster_auto_scaling_strategies is new (own OID + index): one row while
a cluster has an AUTO SCALING STRATEGY configured or an autoscaling action
running, with the configured policy as a jsonb strategy column and the in-flight
runtime as a jsonb state column keyed by strategy ({"burst": ...}, NULL when
idle). JSON keeps the schema stable as strategies grow.
SHOW CLUSTERS drops the four added columns for a single nullable activity
column, built by LEFT JOINing both relations: a short summary of any in-flight
reconfiguration and/or burst, NULL when steady. It needs no mz_now(), so the
indexed mz_show_clusters view stays non-temporal.
Snapshots (oid.slt, mz_catalog_server_index_accounting.slt,
autogenerated/mz_internal.slt) regenerated from a live engine; the new MV takes
s506, shifting later catalog IDs. Tests updated: cluster_controller.slt and
show_clusters.slt for the new shapes, plus the catalog-inventory snapshots,
canary-clusters.td, and the test_managed_cluster.py rollback assertion.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
55e75ca to
57e3371
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Complete branch for SQL-316, the intention is to open PRs for individual chunks of work from this. But I keep this PR as a reference for reviewers and to run CI on the full feature.
Contributes to SQL-316