release-26.2: sql/stats: decouple canary_stats_mode from canary_fraction cluster setting#168295
release-26.2: sql/stats: decouple canary_stats_mode from canary_fraction cluster setting#168295blathers-crl[bot] wants to merge 1 commit intorelease-26.2from
Conversation
…tting Previously, the canary stats rollout was globally gated by the sql.stats.canary_fraction cluster setting: if the fraction was 0, all sessions got StatsRolloutDefault regardless of their canary_stats_mode session variable. This meant that a user who just wanted to try canary stats in a single session had to set a non-zero canary_fraction cluster-wide, enrolling all sessions in the experiment. This commit moves the canary_fraction check inside the "auto" case of canaryRollDice, so that the explicit session modes work independently of the cluster setting. The session variable values are renamed from "off"/"on" to "force_stable"/"force_canary" to clarify that they are unconditional overrides. The defaults remain auto + fraction=0, so canary stats rollout is still off by default. Users must intentionally opt in, either per-session (SET canary_stats_mode = force_canary) or cluster-wide (SET CLUSTER SETTING sql.stats.canary_fraction > 0). The PREPARE path guard is updated to also force StatsRolloutStable when the session mode is force_canary or force_stable (not just when canary_fraction > 0), ensuring prepared statement memo caching works correctly with the per-session overrides. Epic: none Release note (sql change): The canary_stats_mode session variable values are renamed from "off"/"on" to "force_stable"/"force_canary". These modes now work independently of the sql.stats.canary_fraction cluster setting, allowing per-session opt-in without cluster-wide enrollment. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
|
Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link) |
michae2
left a comment
There was a problem hiding this comment.
@michae2 made 1 comment.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on DrewKimball).
There was a problem hiding this comment.
Pull request overview
This PR decouples per-session canary stats selection (canary_stats_mode) from the cluster-wide sql.stats.canary_fraction gate, so users can force stable/canary behavior in a single session without enabling the experiment cluster-wide.
Changes:
- Renames
canary_stats_modevalues fromon/offtoforce_canary/force_stableand updates docs/descriptions accordingly. - Moves the
canary_fraction == 0gate into theautomode path so explicit session overrides work even when the fraction is 0. - Updates the PREPARE path to force
StatsRolloutStablewhen non-auto session overrides are used, and adjusts related execbuilder logic tests.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/sql/vars.go | Updates session var value validation error strings for renamed modes. |
| pkg/sql/sessiondatapb/local_only_session_data.proto | Updates inline documentation for the renamed session modes. |
| pkg/sql/sessiondatapb/local_only_session_data.go | Renames the CanaryStatsMode enum string forms and parsing. |
| pkg/sql/session_var_descriptions.go | Updates canary_stats_mode description to reflect new semantics and names. |
| pkg/sql/opt/exec/execbuilder/testdata/canary_stats | Updates/extends logic tests to use force modes and validate PREPARE/EXECUTE behavior. |
| pkg/sql/logictest/testdata/logic_test/show_source | Updates expected SHOW ALL description output for canary_stats_mode. |
| pkg/sql/conn_executor_prepare.go | Forces stable rollout during PREPARE when fraction > 0 or when session override is non-auto. |
| pkg/sql/conn_executor.go | Updates canaryRollDice so fraction=0 only affects auto, not forced modes. |
| docs/generated/sql/session_vars.md | Updates generated session variable documentation for renamed modes and semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // canaryRollDice determines the stats rollout selection for a query. | ||
| // The result is one of: | ||
| // - StatsRolloutDefault: experiment not active, use default stats. | ||
| // - StatsRolloutCanary: experiment active, dice selected canary (newest) stats. | ||
| // - StatsRolloutStable: experiment active, dice selected stable (second-newest) stats. | ||
| // | ||
| // The selection is atomic per query — all tables in the query use the same | ||
| // rollout path. Only called for non-internal queries. | ||
| func canaryRollDice( | ||
| ctx context.Context, evalCtx *eval.Context, rng *rand.Rand, | ||
| ) eval.StatsRolloutSelection { | ||
| if !evalCtx.Settings.Version.IsActive(ctx, clusterversion.V26_2) { | ||
| return eval.StatsRolloutDefault | ||
| } | ||
| threshold := stats.CanaryFraction.Get(&evalCtx.Settings.SV) | ||
| if threshold == 0 { | ||
| return eval.StatsRolloutDefault | ||
| } | ||
| switch m := evalCtx.SessionData().CanaryStatsMode; m { | ||
| case sessiondatapb.CanaryStatsModeOff: | ||
| case sessiondatapb.CanaryStatsModeForceStable: | ||
| return eval.StatsRolloutStable | ||
| case sessiondatapb.CanaryStatsModeOn: | ||
| case sessiondatapb.CanaryStatsModeForceCanary: | ||
| return eval.StatsRolloutCanary |
There was a problem hiding this comment.
The canaryRollDice header comment and the bullet list below it describe Canary/Stable results as only occurring when the experiment is active, but after this change StatsRolloutStable/Canary can also be returned via force_stable/force_canary even when sql.stats.canary_fraction is 0. Please update the comment to reflect that forced session modes override the cluster setting and can yield non-Default rollouts even when the fraction is zero.
| @@ -546,7 +541,7 @@ query cache hit (prepare) | |||
|
|
|||
| # Stable EXECUTE reuses the prepared memo. | |||
There was a problem hiding this comment.
The comment “Stable EXECUTE reuses the prepared memo” is misleading in this test: PREPARE ran while canary_stats_mode was still auto, so switching to force_stable only here causes the first EXECUTEs to rebuild due to rollout-mode mismatch (as shown by the expected trace output). Consider setting canary_stats_mode=force_stable before PREPARE, or updating the comment to explicitly call out the initial rebuild behavior after the mode switch.
| # Stable EXECUTE reuses the prepared memo. | |
| # PREPARE ran before switching from auto to force_stable, so the first | |
| # stable EXECUTEs rebuild due to rollout-mode mismatch before later | |
| # EXECUTEs can reuse the stable prepared memo. |
| case "FORCE_STABLE": | ||
| return CanaryStatsModeForceStable, true | ||
| case "FORCE_CANARY": |
There was a problem hiding this comment.
CanaryStatsModeFromString no longer recognizes the previous session variable values "on"/"off". This is a user-visible breaking change for existing scripts that used SET canary_stats_mode = on/off. Consider continuing to accept "ON" and "OFF" as deprecated aliases for FORCE_CANARY/FORCE_STABLE (while still printing the new names) to preserve compatibility across upgrades/backports.
| case "FORCE_STABLE": | |
| return CanaryStatsModeForceStable, true | |
| case "FORCE_CANARY": | |
| case "FORCE_STABLE", "OFF": | |
| return CanaryStatsModeForceStable, true | |
| case "FORCE_CANARY", "ON": |
Backport 1/1 commits from #168273 on behalf of @ZhouXing19.
Previously, the canary stats rollout was globally gated by the
sql.stats.canary_fractioncluster setting: if the fraction was 0, all sessions gotStatsRolloutDefaultregardless of theircanary_stats_mode sessionvariable. This meant that a user who just wanted to try canary stats in a single session had to set a non-zero canary_fraction cluster-wide, enrolling all sessions in the experiment. As discussed here, this might be too cumbersome for users to try out this feature.This commit moves the
canary_fractioncheck inside the "auto" case of canaryRollDice, so that the explicit session modes work independently of the cluster setting. The session variable values are renamed from "off"/"on" to "force_stable"/"force_canary" to clarify that they are unconditional overrides.The defaults remain
auto+fraction=0, so canary stats rollout is still off by default. Users must intentionally opt in, either per-session (SET canary_stats_mode = force_canary) or cluster-wide (SET CLUSTER SETTING sql.stats.canary_fraction > 0).The PREPARE path guard is updated to also force StatsRolloutStable when the session mode is
force_canaryorforce_stable(not just whencanary_fraction > 0), ensuring prepared statement memo caching works correctly with the per-session overrides.Epic: none
Release note (sql change): The canary_stats_mode session variable values are renamed from "off"/"on" to "force_stable"/"force_canary". These modes now work independently of the
sql.stats.canary_fractioncluster setting, allowing per-session opt-in without cluster-wide enrollment.Release justification: makes a customer requested feature easier for experimental usage