Skip to content

Use roxie in long-running release cluster setup#98

Draft
porridge wants to merge 21 commits into
mainfrom
use-roxie-in-start-acs
Draft

Use roxie in long-running release cluster setup#98
porridge wants to merge 21 commits into
mainfrom
use-roxie-in-start-acs

Conversation

@porridge

@porridge porridge commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the legacy deploy-scripts-based flow (central.sh, sensor.sh, deploy.sh,
kubectl patches) in both release/start-acs and release/start-secured-cluster with
roxie — a single roxie deploy invocation per
component.

start-acs (fake-load cluster: Central + SecuredCluster)

  • roxie deploy --tag ... --config roxie-config.yaml --envrc ... --early-readiness
    replaces central.sh + sensor.sh + kubectl patches
  • Monitoring stack still deployed via stackrox Helm chart after roxie finishes
  • Installs roxctl from RHACS mirror (required by roxie)
  • Exports CA cert (base64) as a new output for cross-cluster secured-cluster deploy

start-secured-cluster (real-load cluster: SecuredCluster only)

  • roxie deploy secured-cluster --tag ... --config roxie-config.yaml --set centralEndpoint=...
    replaces sensor.sh + kubectl patches
  • Uses API_ENDPOINT and ROX_CA_CERT_FILE env vars for CRS generation
    (see Multi-cluster secured-cluster deploy fails with cryptic error when following docs roxie#230 for a UX improvement filed)
  • Monitoring deployed via Helm, same as before
  • Collector-config ConfigMap for external IPs kept (operator doesn't manage it)
  • FACT container env vars via CRD overlay (operator would reconcile kubectl patches)

roxie-config.yaml (both actions)

  • Resource requests, limits, and HPA scaling tuned to match the previous deploy-scripts
    flow — verified with 100% alignment against a legacy baseline deployment (every
    deployment container's CPU/memory requests and limits, plus HPA min/max, match exactly)
  • Scanner v2 explicitly enabled (operator's default profile disables it)
  • May be adjusted in the future to better resemble a typical customer deployment

Deleted files

  • release/start-acs/patch-central.json — settings now in roxie-config.yaml
  • release/start-acs/patch-sensor.json — settings now in roxie-config.yaml

Test scaffolding (temporary, for direct dispatch)

  • Added workflow_dispatch trigger to create-demo-clusters.yml with testing
    instructions in a comment
  • Added !cancelled() && !failure() conditions on jobs that depend on properties
    and wait-for-images (skipped in direct dispatch since those require the calling
    repo's context)
  • Added if: needs.properties.outputs.slack-channel != '' on Slack steps
  • Added roxie/install-cli step before both start-acs and start-secured-cluster
    in the workflow

Resource alignment verification

Deployed both legacy (deploy-scripts) and roxie side-by-side on identical GKE clusters
(5x e2-standard-8) with ACS 4.11.0. Compared every deployment container's CPU/memory
requests and limits, plus HPA min/max replicas:

  • Fake-load cluster: 14/14 containers match (YES on all)
  • Real-load cluster: 10/10 containers match (YES on all)
  • HPAs: All match (1/1 treated as equivalent to Disabled)

Run: https://github.com/stackrox/stackrox/actions/runs/28577578264

Test plan

  • Run create-demo-clusters workflow end-to-end (all jobs pass)
  • Manual health check: all pods Running, Central API responding, both secured clusters registered
  • Resource comparison: 100% alignment with legacy deployment

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new roxie/install-cli composite action with installer and docs. Migrates release/start-acs to roxie deploy with a new config file, removes the old patch JSON files, and updates create-demo-clusters.yml for manual dispatch, ref selection, and roxie CLI installation.

Changes

New roxie/install-cli Action

Layer / File(s) Summary
Action, installer script, and docs
roxie/install-cli/action.yml, roxie/install-cli/install-roxie.sh, roxie/install-cli/README.md, README.md
Composite action definition, Bash installer (OS/arch validation, version resolution, SHA256 verification, PATH setup), documentation, and root README registration.

Migrate start-acs to roxie deploy

Layer / File(s) Summary
ROXIE config and removed patches
release/start-acs/roxie-config.yaml, release/start-acs/patch-central.json, release/start-acs/patch-sensor.json
New ROXIE config YAML defines central and securedCluster specs; the prior patch JSON manifests are deleted.
start-acs script and action wiring
release/start-acs/start-acs.sh, release/start-acs/action.yml
start-acs.sh adds a guard for an existing central deployment, replaces the manual deploy flow with roxie deploy, sources the generated envrc, derives CENTRAL_IP, and conditionally runs Helm for monitoring. action.yml narrows the env block and updates the stackrox-dir description.
Workflow dispatch and checkout refs
.github/workflows/create-demo-clusters.yml
Adds workflow_dispatch inputs, skips properties on manual runs, updates kube-burner ref parsing, defaults local action checkouts from workflow-ref, and inserts the roxie CLI install step in start-acs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description matches the changeset, covering the roxie-based deploy flow, new config file, workflow dispatch support, and CLI installation.
Title check ✅ Passed The title matches the main change: replacing the release cluster setup flow with roxie-based deployment and installation.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch use-roxie-in-start-acs

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
release/start-acs/action.yml (1)

43-57: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Install or guard roxie in release/start-acs
start-acs.sh calls roxie deploy directly, but this composite action never installs or checks for roxie. Add the install/check here so the action doesn’t depend on every caller setting up the CLI first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@release/start-acs/action.yml` around lines 43 - 57, The start-acs composite
action invokes start-acs.sh, which expects roxie to be available but does not
install or verify it first. Update the launch-central step in action.yml to
ensure roxie is installed or explicitly checked before running the script, so
callers do not need to preconfigure the CLI. Use the existing start-acs.sh
entrypoint and add the dependency handling in the composite action itself.
🧹 Nitpick comments (1)
roxie/install-cli/install-roxie.sh (1)

21-25: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Unauthenticated GitHub API call is rate-limited and can flake on shared CI runners.

api.github.com allows only 60 requests/hour per source IP for unauthenticated calls; GitHub-hosted runners share egress IPs, so latest-version resolution can intermittently fail. Pass a token and guard an empty/null result so a bad response surfaces clearly instead of producing a malformed base_url.

♻️ Suggested hardening
 if [[ -z "${ROXIE_VERSION:-}" ]]; then
-    ROXIE_VERSION=$(curl -fsSL --retry 5 --retry-all-errors \
-        https://api.github.com/repos/stackrox/roxie/releases/latest | jq -r '.tag_name')
+    ROXIE_VERSION=$(curl -fsSL --retry 5 --retry-all-errors \
+        ${GITHUB_TOKEN:+-H "Authorization: Bearer ${GITHUB_TOKEN}"} \
+        https://api.github.com/repos/stackrox/roxie/releases/latest | jq -r '.tag_name')
+    if [[ -z "${ROXIE_VERSION}" || "${ROXIE_VERSION}" == "null" ]]; then
+        echo "::error::Failed to resolve latest roxie version"
+        exit 1
+    fi
     echo "::notice::Resolved latest roxie version: ${ROXIE_VERSION}"
 fi

Wire GITHUB_TOKEN through the action's env block if you adopt this.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@roxie/install-cli/install-roxie.sh` around lines 21 - 25, The latest-version
lookup in the ROXIE_VERSION initialization block is using an unauthenticated
GitHub API request, which can be rate-limited on shared runners. Update the
release query to use a token from GITHUB_TOKEN (and wire it through the action
env if needed), then validate the jq result before assigning ROXIE_VERSION so
empty or null responses fail clearly instead of flowing into a malformed
base_url.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@release/start-acs/roxie-config.yaml`:
- Around line 32-33: The static ROX_TELEMETRY_STORAGE_KEY_V1 entry in the roxie
config is now checked into the repo, so remove it from the committed YAML and
move it to a runtime-provided secret or rendered config path. Update the config
generation flow around the ROX_TELEMETRY_STORAGE_KEY_V1 setting so it is
injected from GitHub secrets or another secure source, or explicitly annotate
the config if this value is intended to be public/test-only.

In `@release/start-acs/start-acs.sh`:
- Around line 36-43: The temporary envrc created in the start-acs flow is left
behind if the script exits after sourcing it, which can leak ROX_ADMIN_PASSWORD.
Update the start-acs.sh flow around roxie_envrc and the roxie deploy/source
steps to register a cleanup handler that removes the generated temp file on
every exit path, including failures under set -e, and keep the cleanup tied to
the script’s normal teardown.

---

Outside diff comments:
In `@release/start-acs/action.yml`:
- Around line 43-57: The start-acs composite action invokes start-acs.sh, which
expects roxie to be available but does not install or verify it first. Update
the launch-central step in action.yml to ensure roxie is installed or explicitly
checked before running the script, so callers do not need to preconfigure the
CLI. Use the existing start-acs.sh entrypoint and add the dependency handling in
the composite action itself.

---

Nitpick comments:
In `@roxie/install-cli/install-roxie.sh`:
- Around line 21-25: The latest-version lookup in the ROXIE_VERSION
initialization block is using an unauthenticated GitHub API request, which can
be rate-limited on shared runners. Update the release query to use a token from
GITHUB_TOKEN (and wire it through the action env if needed), then validate the
jq result before assigning ROXIE_VERSION so empty or null responses fail clearly
instead of flowing into a malformed base_url.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9e12348d-d933-4f45-b737-11a36ef89ea6

📥 Commits

Reviewing files that changed from the base of the PR and between a98a889 and d71aa46.

📒 Files selected for processing (10)
  • .github/workflows/create-demo-clusters.yml
  • README.md
  • release/start-acs/action.yml
  • release/start-acs/patch-central.json
  • release/start-acs/patch-sensor.json
  • release/start-acs/roxie-config.yaml
  • release/start-acs/start-acs.sh
  • roxie/install-cli/README.md
  • roxie/install-cli/action.yml
  • roxie/install-cli/install-roxie.sh
💤 Files with no reviewable changes (2)
  • release/start-acs/patch-sensor.json
  • release/start-acs/patch-central.json

Comment on lines +32 to +33
- name: ROX_TELEMETRY_STORAGE_KEY_V1
value: "R5fMyO9n0gibSGzOXtlP2qCFWCGb8uoW"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Don’t commit the telemetry storage key in static config.

ROX_TELEMETRY_STORAGE_KEY_V1 is secret-like material now persisted in the repo. Move it to a GitHub secret/runtime-rendered config, or document why this value is intentionally public/test-only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@release/start-acs/roxie-config.yaml` around lines 32 - 33, The static
ROX_TELEMETRY_STORAGE_KEY_V1 entry in the roxie config is now checked into the
repo, so remove it from the committed YAML and move it to a runtime-provided
secret or rendered config path. Update the config generation flow around the
ROX_TELEMETRY_STORAGE_KEY_V1 setting so it is injected from GitHub secrets or
another secure source, or explicitly annotate the config if this value is
intended to be public/test-only.

Comment on lines +36 to +43
roxie_envrc="$(mktemp)"
roxie deploy \
--tag "$MAIN_IMAGE_TAG" \
--config "${SCRIPT_DIR}/roxie-config.yaml" \
--envrc "$roxie_envrc"

gh_log notice "Patching central deployment..."
kubectl -n stackrox patch deploy/central --patch-file="${SCRIPT_DIR}/patch-central.json"

CENTRAL_IP=$(kubectl -n stackrox get svc/central-loadbalancer -o json | jq -r '.status.loadBalancer.ingress[0] | .ip // .hostname')
# shellcheck source=/dev/null
source "$roxie_envrc"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Clean up the generated envrc on every exit.

The sourced envrc contains ROX_ADMIN_PASSWORD; with set -e, any later failure leaves that temp file on disk.

Proposed fix
 roxie_envrc="$(mktemp)"
+cleanup_roxie_envrc() {
+    rm -f "$roxie_envrc"
+}
+trap cleanup_roxie_envrc EXIT
 roxie deploy \
     --tag "$MAIN_IMAGE_TAG" \
     --config "${SCRIPT_DIR}/roxie-config.yaml" \
     --envrc "$roxie_envrc"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
roxie_envrc="$(mktemp)"
roxie deploy \
--tag "$MAIN_IMAGE_TAG" \
--config "${SCRIPT_DIR}/roxie-config.yaml" \
--envrc "$roxie_envrc"
gh_log notice "Patching central deployment..."
kubectl -n stackrox patch deploy/central --patch-file="${SCRIPT_DIR}/patch-central.json"
CENTRAL_IP=$(kubectl -n stackrox get svc/central-loadbalancer -o json | jq -r '.status.loadBalancer.ingress[0] | .ip // .hostname')
# shellcheck source=/dev/null
source "$roxie_envrc"
roxie_envrc="$(mktemp)"
cleanup_roxie_envrc() {
rm -f "$roxie_envrc"
}
trap cleanup_roxie_envrc EXIT
roxie deploy \
--tag "$MAIN_IMAGE_TAG" \
--config "${SCRIPT_DIR}/roxie-config.yaml" \
--envrc "$roxie_envrc"
# shellcheck source=/dev/null
source "$roxie_envrc"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@release/start-acs/start-acs.sh` around lines 36 - 43, The temporary envrc
created in the start-acs flow is left behind if the script exits after sourcing
it, which can leak ROX_ADMIN_PASSWORD. Update the start-acs.sh flow around
roxie_envrc and the roxie deploy/source steps to register a cleanup handler that
removes the generated temp file on every exit path, including failures under set
-e, and keep the cleanup tied to the script’s normal teardown.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/test-roxie-start-acs.yml:
- Around line 20-21: The workflow inputs in the test-roxie-start-acs dispatch
currently use fixed cluster names, so every run reuses the same clusters and can
interfere with other validations. Update the values passed to the reusable
workflow for cluster creation so they are unique per dispatch, using a per-run
identifier or other generated suffix in the test-roxie-start-acs workflow’s
cluster name fields to avoid collisions across reruns and parallel runs.
- Line 13: The workflow currently uses secrets: inherit, which exposes all
dispatcher secrets instead of only the ones this job needs. Update the workflow
trigger/job configuration to pass only the explicitly required secrets for this
workflow, using the existing secret names referenced by the job such as
INFRA_TOKEN, GCP_RELEASE_AUTOMATION_SA, QUAY_RHACS_ENG_BEARER_TOKEN,
QUAY_RHACS_ENG_RO_USERNAME, QUAY_RHACS_ENG_RO_PASSWORD,
RELEASE_MANAGEMENT_PAGERDUTY_INTEGRATION_KEY, SLACK_BOT_TOKEN,
K6_ELASTICSEARCH_USER, K6_ELASTICSEARCH_PASSWORD, and K6_ELASTICSEARCH_URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6b814200-7675-411b-a0f4-a6e2f4c4af9b

📥 Commits

Reviewing files that changed from the base of the PR and between d71aa46 and 3a8a2d8.

📒 Files selected for processing (1)
  • .github/workflows/test-roxie-start-acs.yml

Comment thread .github/workflows/test-roxie-start-acs.yml Outdated
Comment on lines +20 to +21
cluster-with-fake-load-name: test-roxie-fl
cluster-with-real-load-name: test-roxie-rl

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the test cluster names unique per dispatch.

These fixed names are passed straight into the reusable workflow’s cluster-creation jobs, so every manual run targets the same long-running clusters. That makes reruns and parallel validation interfere with each other and can hide whether a pass came from fresh deployment or leftover state.

Suggested change
-      cluster-with-fake-load-name: test-roxie-fl
-      cluster-with-real-load-name: test-roxie-rl
+      cluster-with-fake-load-name: test-roxie-fl-${{ github.run_id }}
+      cluster-with-real-load-name: test-roxie-rl-${{ github.run_id }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cluster-with-fake-load-name: test-roxie-fl
cluster-with-real-load-name: test-roxie-rl
cluster-with-fake-load-name: test-roxie-fl-${{ github.run_id }}
cluster-with-real-load-name: test-roxie-rl-${{ github.run_id }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-roxie-start-acs.yml around lines 20 - 21, The
workflow inputs in the test-roxie-start-acs dispatch currently use fixed cluster
names, so every run reuses the same clusters and can interfere with other
validations. Update the values passed to the reusable workflow for cluster
creation so they are unique per dispatch, using a per-run identifier or other
generated suffix in the test-roxie-start-acs workflow’s cluster name fields to
avoid collisions across reruns and parallel runs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/create-demo-clusters.yml (1)

373-373: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Pin the ROXIE CLI version here.
roxie/install-cli defaults to the latest release when version is omitted, so reruns can pick up a different binary and break deploy behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/create-demo-clusters.yml at line 373, The ROXIE CLI
install step is unpinned and will default to the latest release, which can make
workflow reruns non-deterministic. Update the `roxie/install-cli` usage in
`create-demo-clusters.yml` to pass an explicit version so the binary is stable
across runs. Keep the change localized to the `install-cli` action invocation
and make sure the selected version is consistent with the rest of the deployment
workflow.
🧹 Nitpick comments (1)
.github/workflows/create-demo-clusters.yml (1)

142-147: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Disable credential persistence on the local-action checkouts.

These checkouts are only used to read .actions, so keeping the GitHub token in each cloned repo’s .git/config is unnecessary exposure. Setting persist-credentials: false here addresses the artipacked warning without changing behavior.

Suggested hardening
      - name: Checkout local action
        uses: actions/checkout@v4
        with:
          repository: stackrox/actions
          path: .actions
          ref: ${{ inputs.workflow-ref || github.ref_name }}
+         persist-credentials: false

Also applies to: 163-168, 225-230, 290-295, 315-320, 356-361, 449-454, 510-515, 558-563

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/create-demo-clusters.yml around lines 142 - 147, Disable
credential persistence on the local action checkout steps used to populate
.actions. Update each actions/checkout@v4 invocation in the workflow to set
persist-credentials: false alongside the existing repository, path, and ref
inputs so the GitHub token is not stored in .git/config while preserving
behavior.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/create-demo-clusters.yml:
- Around line 24-29: The new kube-burner config ref is only applied in one
place, so the central kube-burner checkout still uses the default version ref
while the secured-cluster path uses burner-ref. Update the central job’s
checkout logic in the create-demo-clusters workflow to consume the same parsed
kube-burner config ref output as the other job, using the existing
parse-refs/burner-ref wiring consistently. Make sure both jobs resolve the repo
and ref from the same inputs so manual overrides behave identically.

---

Outside diff comments:
In @.github/workflows/create-demo-clusters.yml:
- Line 373: The ROXIE CLI install step is unpinned and will default to the
latest release, which can make workflow reruns non-deterministic. Update the
`roxie/install-cli` usage in `create-demo-clusters.yml` to pass an explicit
version so the binary is stable across runs. Keep the change localized to the
`install-cli` action invocation and make sure the selected version is consistent
with the rest of the deployment workflow.

---

Nitpick comments:
In @.github/workflows/create-demo-clusters.yml:
- Around line 142-147: Disable credential persistence on the local action
checkout steps used to populate .actions. Update each actions/checkout@v4
invocation in the workflow to set persist-credentials: false alongside the
existing repository, path, and ref inputs so the GitHub token is not stored in
.git/config while preserving behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7e3528d8-8098-4183-a08c-65b514395b7b

📥 Commits

Reviewing files that changed from the base of the PR and between 3a8a2d8 and a52a0d5.

📒 Files selected for processing (1)
  • .github/workflows/create-demo-clusters.yml

Comment thread .github/workflows/create-demo-clusters.yml

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/create-demo-clusters.yml:
- Line 82: The Slack posting path in the `start-acs` job still runs even when
`properties` is skipped, so the manual `workflow_dispatch` flow can later read
an empty `needs.properties.outputs.slack-channel` value. Update the Slack step
in `create-demo-clusters.yml` to use the same skip condition as the `properties`
job, or add a check that `needs.properties.outputs.slack-channel` is non-empty
before posting, so the `start-acs` job does not attempt the Slack notification
when no channel is available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 180b6672-13c9-4868-ae05-ab78097c4240

📥 Commits

Reviewing files that changed from the base of the PR and between a52a0d5 and 893f19c.

📒 Files selected for processing (1)
  • .github/workflows/create-demo-clusters.yml

Comment thread .github/workflows/create-demo-clusters.yml
porridge and others added 20 commits July 2, 2026 11:54
Replace the manual central.sh/sensor.sh/patch sequence with a single
`roxie deploy` invocation that handles operator installation, Central +
SecuredCluster deployment, readiness waiting, and LB detection.

The roxie config YAML captures what was previously spread across
action.yml env vars and patch-{central,sensor}.json files, including
PVC storage, resource limits, env vars, and scanner V4 settings.
Process baseline auto-locking uses the native CRD field.

Monitoring remains deployed via the stackrox Helm chart after ACS
is up, using envsubst for PagerDuty integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dispatch wrapper for testing the create-demo-clusters reusable workflow
from this branch. Remove after validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add workflow_dispatch to create-demo-clusters.yml so it can be
dispatched directly from feature branches. Remove standalone test
workflow. All inputs fall back to sensible defaults for dispatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These jobs require the calling repo's context (properties file,
Quay bearer token). Skip them for direct dispatch testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add !cancelled() && !failure() conditions so downstream jobs run
when their dependencies are skipped (dispatch path) but still block
when they actually fail (workflow_call path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Roxie requires roxctl. Download it from the RHACS mirror
matching the deployed version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return sooner without waiting for full scanner initialization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The faster storage class doesn't exist on GKE clusters.
premium-rwo is the GKE equivalent (SSD-backed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace sensor.sh/pull-secrets/kubectl-patches with roxie deploy
secured-cluster. FACT container env vars use CRD overlays instead
of kubectl set env (which the operator would reconcile away).
Collector externalIps ConfigMap kept as-is (operator doesn't manage it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: roxie uses API_ENDPOINT env var (not the CRD config's
centralEndpoint) for roxctl CRS generation. Without it, roxctl gets
an empty endpoint causing "missing port in address".

Filed stackrox/roxie#230 for the UX issue.

Fixes:
- Add CA cert as base64-encoded output from start-acs
- Decode CA cert in start-secured-cluster and set ROX_CA_CERT_FILE
- Set API_ENDPOINT env var for roxie's roxctl CRS generation
- Add --verbose to all roxie deploy calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Roxie auto-detects GKE infra clusters and applies medium resources,
which combined with kube-burner workloads exceeds the 5-node
e2-standard-8 cluster capacity. Use small profile to fit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove resourceProfile: small (let operator use built-in defaults)
- Add explicit central resources matching old patch-central.json
  (2 CPU / 3Gi requests, 4 CPU / 12Gi limits)
- Add explicit sensor resources matching old patch-sensor.json
- Enable scanner v2 (scannerComponent: Enabled) on both configs
  — small profile was disabling it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Legacy roxctl-generate produces min=1/max=1 HPAs for scanner-v4
components (effectively no autoscaling) and min=1/max=5 for scanner.
The operator defaults are min=2/max=5, causing extra replicas that
exhaust cluster CPU.

Set scaling to match legacy observations. Enable scanner v2
(scannerComponent: Enabled) which was missing. Keep central/sensor
resource overrides matching the old patch JSON files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SecuredCluster CRD only accepts "AutoSense" or "Disabled" for
scanner.scannerComponent (unlike the Central CRD which accepts "Enabled").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fake-load cluster:
- scanner-v4-db CPU req: 500m (was operator default 1)
- scanner-v4-indexer CPU req: 400m (was operator default 1500m)
- scanner-v4-matcher CPU req: 400m (was operator default 500m)

Real-load cluster:
- sensor: 500m/4 CPU, 500Mi/8Gi (was 2/4, 3Gi/12Gi from
  patch-sensor.json which only applies to fake-load)
- admission-control mem limit: 500Mi (was operator default 1Gi)
- scanner HPA min: 2 (was 1)
- scanner-v4-indexer HPA: Enabled min=2/max=5 (was Disabled)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add testing instructions comment near workflow_dispatch trigger
- Reorder dispatch inputs to match workflow_call order
- Consolidate roxie config comments to a single header explaining
  that resource settings match the previous deploy-scripts-based
  flow and may be tuned in the future

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@porridge porridge force-pushed the use-roxie-in-start-acs branch from 04591ad to d86a4e4 Compare July 2, 2026 09:59
@porridge porridge changed the title Replace stackrox deploy scripts with roxie in start-acs Replace deploy scripts with roxie in start-acs and start-secured-cluster Jul 2, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@porridge

porridge commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@CodeRabbit-ai review

@porridge porridge changed the title Replace deploy scripts with roxie in start-acs and start-secured-cluster Use roxie in long-running release cluster setup Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant