Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
88c07e0
plan/progress: add cluster autoscaling implementation plan and progre…
aljoscha Jun 1, 2026
a3757fb
catalog: add durable cluster-autoscaling state (migration v85->v86)
aljoscha Jun 1, 2026
c0b899d
controller: collapse managed-replica AZ enum to a list
aljoscha Jun 16, 2026
abece06
cluster-controller: scaffold the controller, baseline strategy, and w…
aljoscha Jun 2, 2026
6ad40ec
cluster-controller: graceful reconfiguration strategy + ALTER reshape…
aljoscha Jun 2, 2026
758dd9b
cluster-controller: reconfiguration observability + SHOW CLUSTERS + a…
aljoscha Jun 3, 2026
69b0988
cluster-controller: ON REFRESH scheduling as a strategy
aljoscha Jun 3, 2026
37b4d62
cluster-controller: hydration burst — AUTO SCALING STRATEGY, strategy…
aljoscha Jun 3, 2026
43deb75
cluster-controller: enable controller in CI and migrate legacy tests
aljoscha Jun 9, 2026
ac3788a
doc: add cluster-controller PM showcase report with live transcripts
aljoscha Jun 10, 2026
7ceda26
plan: add PR 8 — showcase follow-ups (retired drops, rollback stamp, …
aljoscha Jun 10, 2026
bd92fa8
plan: add PR 9 — restore on-refresh scheduling_policies audit detail
aljoscha Jun 10, 2026
861681a
cloudtest: assert rollback-at-deadline settles and audits timed-out (…
aljoscha Jun 16, 2026
0b83f1c
plan/design: rollback-at-deadline clear-and-audit narrative (PR 8b)
aljoscha Jun 16, 2026
f836100
plan/design: burst existential-arming narrative (PR 8c)
aljoscha Jun 16, 2026
7d53416
doc: record PR 8 verification — showcase rough edges fixed and re-ver…
aljoscha Jun 10, 2026
035f28c
doc: re-capture PM showcase transcripts on the fixed build
aljoscha Jun 11, 2026
f9ebc3f
cloudtest: update rollback assertion for the two-relation introspecti…
aljoscha Jun 16, 2026
3e6087c
plan/design: two-relation introspection redesign narrative (PR 4 rede…
aljoscha Jun 16, 2026
5f28818
doc: retarget plan-doc test references to cluster-controller.td
aljoscha Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ members = [
"src/cloud-resources",
"src/cluster",
"src/cluster-client",
"src/cluster-controller",
"src/clusterd",
"src/compute",
"src/compute-client",
Expand Down Expand Up @@ -153,6 +154,7 @@ default-members = [
"src/cloud-resources",
"src/cluster",
"src/cluster-client",
"src/cluster-controller",
"src/clusterd",
"src/compute",
"src/compute-client",
Expand Down
30 changes: 15 additions & 15 deletions doc/developer/design/20260522_cluster_autoscaling.md

Large diffs are not rendered by default.

1,680 changes: 1,680 additions & 0 deletions doc/developer/design/20260522_cluster_autoscaling_plan.md

Large diffs are not rendered by default.

28 changes: 28 additions & 0 deletions doc/user/content/reference/system-catalog/mz_internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,34 @@ The `mz_cluster_schedules` table shows the `SCHEDULE` option specified for each
| `type` | [`text`] | `on-refresh`, or `manual`. Default: `manual` |
| `refresh_hydration_time_estimate` | [`interval`] | The interval given in the `HYDRATION TIME ESTIMATE` option. |

## `mz_cluster_reconfigurations`

The `mz_cluster_reconfigurations` table shows the in-flight graceful reconfiguration of each managed
cluster that has one. A row is present only while a background `ALTER CLUSTER` is converging on a
new configuration; the realized (current) shape is in
[`mz_clusters`](../mz_catalog/#mz_clusters).

<!-- RELATION_SPEC mz_internal.mz_cluster_reconfigurations -->
| Field | Type | Meaning |
|----------------|------------------|----------------------------------------------------------------|
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). |
| `deadline` | [`mz_timestamp`] | The deadline by which the reconfiguration must complete; after it passes the `on_timeout` action applies. Compare against `mz_now()` to distinguish an in-progress reconfiguration from one past its deadline. |
| `on_timeout` | [`text`] | The action applied if `deadline` passes before the target hydrates: `commit` (cut over to the not-yet-hydrated target) or `rollback` (revert to the pre-reconfiguration shape). |
| `target` | [`jsonb`] | The config shape the cluster is reconfiguring to, as JSON: `size`, `replication_factor`, `availability_zones`, and `logging`. The realized (current) shape is in `mz_clusters`. |

## `mz_cluster_auto_scaling_strategies`

The `mz_cluster_auto_scaling_strategies` table shows the configured `AUTO SCALING STRATEGY` of each
managed cluster that has one, together with any in-flight autoscaling state. A row is present while
a strategy is configured or an autoscaling action is running.

<!-- RELATION_SPEC mz_internal.mz_cluster_auto_scaling_strategies -->
| Field | Type | Meaning |
|----------------|-----------|----------------------------------------------------------------|
| `cluster_id` | [`text`] | The ID of the cluster. Corresponds to [`mz_clusters.id`](../mz_catalog/#mz_clusters). |
| `strategy` | [`jsonb`] | **Unstable** The configured autoscaling policy, as JSON. Currently an `on_hydration` sub-policy carrying its `hydration_size` and optional `linger_duration`. |
| `state` | [`jsonb`] | **Unstable** The in-flight autoscaling runtime state, as JSON keyed by strategy, or `NULL` when nothing is running. Currently a `burst` key carrying the active hydration burst: its `burst_size`, `linger_duration`, and `steady_hydrated_at`. |

## `mz_cluster_replica_metrics`

The `mz_cluster_replica_metrics` view gives the last known CPU and RAM utilization statistics
Expand Down
14 changes: 14 additions & 0 deletions misc/python/materialize/mzcompose/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,16 @@ def get_minimal_system_parameters(
"enable_refresh_every_mvs": "true",
"enable_replacement_materialized_views": "true",
"enable_cluster_schedule_refresh": "true",
# The cluster controller and background ALTER CLUSTER land dark in
# production (the dyncfg defaults stay false); force them on for the test
# harness so CI exercises the controller owning the managed-cluster
# replica set. The real production default flip is a separate rollout.
"enable_cluster_controller": (
"true" if version >= MzVersion.parse_mz("v26.29.0-dev") else "false"
),
"enable_background_alter_cluster": (
"true" if version >= MzVersion.parse_mz("v26.29.0-dev") else "false"
),
"enable_s3_tables_region_check": "false",
"enable_statement_lifecycle_logging": "true",
"enable_storage_introspection_logs": "true",
Expand Down Expand Up @@ -673,6 +683,10 @@ def get_default_system_parameters(
"enable_mcp_developer_query_tool",
"mcp_max_response_size",
"user_id_pool_batch_size",
"cluster_controller_tick_interval",
"default_cluster_reconfiguration_timeout",
"enable_hydration_burst",
"default_hydration_burst_linger",
]


Expand Down
6 changes: 6 additions & 0 deletions misc/python/materialize/parallel_workload/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -1864,6 +1864,12 @@ def __init__(
"oidc_group_role_sync_strict",
"console_oidc_client_id",
"console_oidc_scopes",
"enable_cluster_controller",
"cluster_controller_tick_interval",
"enable_background_alter_cluster",
"default_cluster_reconfiguration_timeout",
"enable_hydration_burst",
"default_hydration_burst_linger",
]

def run(self, exe: Executor) -> bool:
Expand Down
Loading
Loading