From 86ed70bfb408494734b3752cadc056a10b6b3526 Mon Sep 17 00:00:00 2001 From: Pavol Loffay Date: Thu, 7 May 2026 15:53:18 +0200 Subject: [PATCH] OLM channels explanation Signed-off-by: Pavol Loffay --- olm-channels.md | 277 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 277 insertions(+) create mode 100644 olm-channels.md diff --git a/olm-channels.md b/olm-channels.md new file mode 100644 index 0000000000..6b2aaa4b05 --- /dev/null +++ b/olm-channels.md @@ -0,0 +1,277 @@ +# OLM Channel Strategy and Upgrade Semantics + +This document describes how OLM channels, upgrade graphs, and OCP version gating work, +and outlines a two-channel release strategy for shipping an operator across multiple OpenShift versions. + +## OLM Channels Overview + +A **channel** is a named upgrade stream within an operator package. Each channel has a **channel head** — +the latest CSV (ClusterServiceVersion) that no other CSV replaces. When a user creates a `Subscription` +pointing to a channel, OLM installs the channel head (or a `startingCSV` if specified), then follows the +upgrade graph within that channel for future upgrades. + +### Upgrade Graph Mechanisms + +The upgrade graph within a channel is defined by three mechanisms on each CSV: + +- **`replaces`**: points to the single CSV this one directly replaces. OLM walks the `replaces` + chain and **upgrades one version at a time** until reaching the channel head. For example, + if v0.1.3 replaces v0.1.2 which replaces v0.1.1, OLM installs v0.1.2 first, then v0.1.3. +- **`skips`**: list of specific CSV names that can upgrade directly to this one. Used to skip + known-bad releases (e.g., a version with a critical vulnerability). +- **`skipRange`** (annotation `olm.skipRange`): a semver range. If the **channel head** has a + `skipRange` that includes the currently installed version, OLM **jumps directly to the + channel head**, bypassing all intermediate versions. This is a direct upgrade, not step-by-step. + +**Important**: `skipRange` only applies to the channel head. Intermediate versions with `skipRange` +do not enable skipping — OLM always evaluates whether to jump directly to head first. + +When OLM evaluates whether an upgrade is available, it checks (in order of precedence): + +1. Channel head in the subscribed catalog source (if `skipRange` on head covers the current version) — **direct jump**. +2. Next CSV that `replaces` the current one in the subscribed source — **step-by-step**. +3. Channel head in another visible catalog source (if `skipRange` covers the current version) — **direct jump**. +4. Next CSV that `replaces` the current one in any visible source — **step-by-step**. + +### OCP Version Compatibility + +There are **two different mechanisms** for declaring OpenShift version compatibility, serving +different purposes: + +#### 1. `com.redhat.openshift.versions` (Build-Time Catalog Filtering) + +Defined in `metadata/annotations.yaml`: + +```yaml +annotations: + com.redhat.openshift.versions: "v4.14-v4.16" +``` + +This annotation is used by **Red Hat's build infrastructure** when generating version-specific +catalog indexes. It controls which bundles are included in each versioned index image +(e.g., `registry.redhat.io/redhat/redhat-operator-index:v4.14` vs `v4.16`). + +- **Used by**: Red Hat pipelines, IIB (Index Image Builder), catalog build tooling +- **When**: At catalog build time +- **Effect**: Bundle is excluded from catalog indexes outside the specified range +- **OLM involvement**: None — OLM never sees bundles filtered out at build time + +#### 2. `olm.maxOpenShiftVersion` / `olm.minOpenShiftVersion` (Runtime OCP Upgrade Gating) + +Defined in `metadata/properties.yaml`: + +```yaml +properties: + - type: olm.maxOpenShiftVersion + value: "4.17" + - type: olm.minOpenShiftVersion + value: "4.14" +``` + +These properties are used by **OLM at runtime** for two purposes: + +1. **Catalog filtering**: OLM filters bundles based on the cluster's OCP version. A bundle with + `olm.minOpenShiftVersion: "4.16"` won't be visible on a 4.14 cluster. + +2. **OCP upgrade gating**: OLM checks installed operators against the **next** OCP minor version. + If an installed operator's `olm.maxOpenShiftVersion` is less than the next minor, OLM blocks + the cluster upgrade by setting `Upgradeable=False` on its ClusterOperator. + +- **Used by**: OLM resolver, OLM ClusterOperator controller +- **When**: At runtime (install, upgrade, OCP upgrade checks) +- **Effect**: Blocks operator visibility and/or OCP cluster upgrades +- **Flow**: bundle → catalog (indexed) → CSV annotation (`operatorframework.io/properties`) + +#### Comparison + +| Aspect | `com.redhat.openshift.versions` | `olm.maxOpenShiftVersion` | +|---|---|---| +| Location | `metadata/annotations.yaml` | `metadata/properties.yaml` | +| Used by | Red Hat build pipelines | OLM at runtime | +| When | Catalog index build time | Operator install/upgrade, OCP upgrade | +| Effect | Bundle excluded from index | Bundle hidden + OCP upgrade blocked | +| Format | Range string (`v4.14-v4.16`) | Single version (`4.17`) | + +**Recommendation**: Use both. `com.redhat.openshift.versions` ensures your bundle only appears +in appropriate catalog indexes. `olm.maxOpenShiftVersion` provides runtime safety by blocking +OCP upgrades when an incompatible operator is installed. + +## Two-Channel Strategy + +### Channel 1: `fast` + +Ships every new version to all supported OCP versions. Users on this channel always receive +the newest operator release. + +``` +fast channel: + v0.1.0 → v0.2.0 → v0.3.0 → v1.0.0 → v1.1.0 → v1.2.0 → v2.0.0 + ↑ head +``` + +Use `skipRange` liberally (e.g., `olm.skipRange: ">=0.1.0 <2.0.0"`) so users can jump from +any older version directly to the latest without stepping through every intermediate release. + +### Channel 2: `stable` + +A single channel for all OCP EUS versions. OCP version properties on each bundle control which +versions are visible on which cluster. Only patch/z-stream releases are added within each +OCP version range. + +``` +stable channel: + v1.0.0 (OCP 4.14) → v1.0.1 (OCP 4.14) → v1.0.2 (OCP 4.14) + + v1.1.0 (OCP 4.14-4.16, skipRange: ">=1.0.0 <1.1.0") → v1.1.1 (OCP 4.16) + + v2.0.0 (OCP 4.16-4.18, skipRange: ">=1.1.0 <2.0.0") → v2.0.1 (OCP 4.18) +``` + +**Note**: OCP EUS (Extended Update Support) versions are even-numbered minor releases: 4.14, 4.16, +4.18, etc. EUS versions receive longer support (up to 24+ months). Odd-numbered versions (4.15, +4.17) are non-EUS with shorter support windows. + +**EUS-to-EUS upgrades**: The control plane must still upgrade sequentially (4.14 → 4.15 → 4.16) — +you cannot skip minor versions. However, EUS-to-EUS allows you to **pause worker node machine +config pools** during the upgrade, so worker nodes only reboot once (from 4.14 directly to 4.16), +minimizing disruption. The operator must support all versions in the upgrade path (hence the +bridge version with `maxOCP: 4.17` covering 4.14, 4.15, and 4.16). + +Each bundle declares its OCP compatibility: + +```yaml +# v1.0.0 — OCP 4.14 only (blocks upgrade to 4.15 until operator is upgraded) +olm.properties: + - type: olm.maxOpenShiftVersion + value: "4.14" + - type: olm.minOpenShiftVersion + value: "4.14" +``` + +```yaml +# v1.1.0 — bridge version for EUS-to-EUS upgrade (supports 4.14, 4.15, AND 4.16) +# Must support 4.15 because control plane upgrades sequentially: 4.14 → 4.15 → 4.16 +olm.properties: + - type: olm.maxOpenShiftVersion + value: "4.17" + - type: olm.minOpenShiftVersion + value: "4.14" +olm.skipRange: ">=1.0.0 <1.1.0" +``` + +```yaml +# v1.1.1 — OCP 4.16+ patch (for clusters that completed the EUS upgrade) +olm.properties: + - type: olm.maxOpenShiftVersion + value: "4.17" + - type: olm.minOpenShiftVersion + value: "4.16" +``` + +### Upgrade Scenarios + +| Scenario | Behavior | +|---|---| +| OCP 4.14 cluster, fresh install | OLM filters the `stable` channel, only sees v1.0.x bundles, installs the latest patch (channel head for that OCP range) | +| OCP 4.14 cluster, patch released | New v1.0.x appears, OLM upgrades automatically | +| OCP 4.14 cluster, bridge version released | v1.1.0 becomes visible (has `minOpenShiftVersion: "4.14"`). OLM upgrades v1.0.0 → v1.1.0 via `skipRange`. This unblocks OCP upgrade to 4.15/4.16 | +| User upgrades OCP 4.14 → 4.16 | See [OCP Upgrade Flow](#ocp-upgrade-flow-and-the-bridge-version-requirement) below | +| OCP 4.16 cluster, fresh install | Only sees v1.1.x, installs the latest patch | +| User on `stable` wants to switch to `fast` | User edits their Subscription to change channel. OLM resolves the new channel head and upgrades if a valid `replaces`/`skipRange` path exists | + +## OCP Upgrade Flow and the Bridge Version Requirement + +### How OLM Gates OCP Upgrades + +OLM continuously checks all installed CSVs against the **next** OCP minor version. If any +operator's `olm.maxOpenShiftVersion` is less than the next minor version, OLM sets: + +``` +ClusterOperator "operator-lifecycle-manager" + Condition: Upgradeable=False + Reason: IncompatibleOperatorsInstalled +``` + +The Cluster Version Operator (CVO) reads this condition and **blocks the OCP cluster upgrade** +until all operators are compatible. + +The logic (implemented in `pkg/controller/operators/openshift/clusteroperator_controller.go`): + +1. OLM reads the current OCP version (e.g., 4.14). +2. Computes the next minor version (4.15). +3. For each installed CSV, checks if `olm.maxOpenShiftVersion >= 4.15`. +4. If any CSV fails the check (i.e., `maxOpenShiftVersion < nextMinor`), sets `Upgradeable=False`. + +**Example**: On OCP 4.14, an operator with `maxOpenShiftVersion: "4.14"` blocks upgrade to 4.15. +An operator with `maxOpenShiftVersion: "4.15"` allows upgrade to 4.15 (but would block 4.16 later). + +### The Deadlock Problem + +If the operator version for the next OCP version requires that OCP version to install, +a deadlock occurs: + +- v1.0.0 has `maxOpenShiftVersion: "4.15"` → allows 4.15, but blocks upgrade to 4.16. +- v1.1.0 has `minOpenShiftVersion: "4.16"` → not visible on 4.14 or 4.15. +- Result: once on 4.15, can't upgrade OCP to 4.16 without upgrading the operator, but + can't upgrade the operator without being on 4.16 first. + +### Solution: Bridge Versions + +Every OCP version transition requires a **bridge version** of the operator that is compatible +with both the current and next OCP version. For EUS-to-EUS upgrades (e.g., 4.14 → 4.16), the +bridge must support the entire range: + +``` +v1.0.0 → minOCP: 4.14, maxOCP: 4.14 (4.14 only — blocks upgrade to 4.15) +v1.1.0 → minOCP: 4.14, maxOCP: 4.17 (4.14 through 4.16 — bridge version) +v1.1.1 → minOCP: 4.16, maxOCP: 4.17 (4.16 only, patch) +``` + +The upgrade flow with a bridge version: + +1. User is on OCP 4.14 with operator v1.0.0 installed (`maxOCP: 4.14` — blocks upgrade to 4.15). +2. v1.1.0 (bridge) becomes visible on 4.14 because `minOpenShiftVersion: "4.14"`. +3. OLM upgrades the operator: v1.0.0 → v1.1.0 (via `skipRange`). +4. v1.1.0 has `maxOpenShiftVersion: "4.17"`: + - On 4.14: `4.17 >= 4.15` → upgrade to 4.15 allowed + - On 4.15: `4.17 >= 4.16` → upgrade to 4.16 allowed +5. User upgrades OCP to 4.16 (control plane goes 4.14 → 4.15 → 4.16 sequentially). +6. On 4.16, subsequent patches (v1.1.1, v1.1.2) continue as normal. + +**Automatic vs. Manual Approval**: + +With `installPlanApproval: Automatic` (default), step 3 happens automatically as soon as the +bridge version appears in the catalog. The user doesn't need to take any action — OLM upgrades +the operator, which unblocks the OCP upgrade. This is the seamless experience. + +With `installPlanApproval: Manual`, the user must approve the operator upgrade (v1.0.0 → v1.1.0) +before the OCP upgrade becomes unblocked. If they attempt to upgrade OCP first, the CVO will +block it until they approve the pending operator InstallPlan. + +### Timeline Diagram + +``` +OCP 4.14 OCP 4.16 +───────────────────────────────────────────────────────────── +operator v1.0.0 ──► v1.1.0 (bridge) ──► [OCP upgrade] ──► v1.1.1 (patch) + supports 4.14-4.16 4.16 only +``` + +## Single `stable` Channel vs. Per-EUS Channels + +| Aspect | Single `stable` channel | Per-EUS channels (`stable-4.14`, `stable-4.16`) | +|---|---|---| +| User experience | Simpler — one subscription, never needs editing | User must manually switch channel when upgrading OCP | +| OCP upgrade | Automatic — operator upgrades when bridge version becomes visible | Manual — user must change channel in Subscription | +| Catalog complexity | All bundles in one channel, filtered by OCP version properties | Separate channels, each self-contained | +| Risk | Relies on correct `minOpenShiftVersion`/`maxOpenShiftVersion` — a misconfigured property could expose incompatible versions to clusters | Channel provides hard isolation — even with wrong properties, users only see versions in their subscribed channel | + +## Key Implementation References + +- Channel and upgrade graph resolution: `pkg/controller/registry/resolver/resolver.go` +- OCP upgrade gating logic: `pkg/controller/operators/openshift/clusteroperator_controller.go` +- `maxOpenShiftVersion` parsing: `pkg/controller/operators/openshift/helpers.go` +- Upgrade predicate matching (`replaces`, `skips`, `skipRange`): `pkg/controller/registry/resolver/cache/predicates.go` +- Operator upgrade conditions: `pkg/controller/operators/olm/operatorconditions.go` +- Properties annotation processing: `pkg/controller/registry/resolver/projection/properties.go` +- Upgrade strategy documentation: `doc/design/how-to-update-operators.md`