Skip to content

Ecolter/fix deploy scripts aws#571

Merged
ecolternv merged 5 commits intomainfrom
ecolter/fix-deploy-scripts-aws
Mar 11, 2026
Merged

Ecolter/fix deploy scripts aws#571
ecolternv merged 5 commits intomainfrom
ecolter/fix-deploy-scripts-aws

Conversation

@ecolternv
Copy link
Copy Markdown
Contributor

@ecolternv ecolternv commented Feb 27, 2026

Description

Fix minor bugs in aws deployment script

  • Upgrade database for RDS to postgres 15.12 (15.4 is not supported anymore)
  • Fix CLI parameters not being correctly propogated to terraform
  • Add support for NGC_API_KEY to pull non-public helm charts and images
  • Fix bug where dry-run would not stop after terraform and would throw errors

Issue - None

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • NGC registry auth via API key: secure image pulls, helm repo auth, and automatic image-pull secret creation/cleanup; CLI flag --ngc-api-key available
    • Redis password configurable for AWS deployments
    • Provider-specific config regeneration from flags/env and configurable kubectl execution
    • Dry-run exits early after infrastructure apply to skip post-deploy steps
  • Documentation

    • Added NGC Registry Credentials guidance and NGC_API_KEY reference
  • Chores

    • Updated RDS engine version to 15.12

@ecolternv ecolternv requested a review from a team as a code owner February 27, 2026 22:41
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 2, 2026

📖 Docs preview: https://d3in15bfzp49i0.cloudfront.net/571/index.html

@ecolternv ecolternv force-pushed the ecolter/fix-deploy-scripts-aws branch from 3d68146 to 058d908 Compare March 10, 2026 18:41
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds optional NGC API key support across deployment scripts: new --ngc-api-key flag and NGC_API_KEY env var, creation/cleanup of nvcr image-pull secrets in multiple namespaces, Helm repo auth using the API key, wiring imagePullSecret into Helm values, AWS RDS engine bump to 15.12, and minimal deploy flow refinements (tfvars regen and dry-run early exit).

Changes

Cohort / File(s) Summary
Docs & Deployment CLI
deployments/scripts/README.md
Documents --ngc-api-key, NGC_API_KEY env var, NGC registry/Helm auth behavior, and adds NGC_API_KEY to Environment Variables table.
AWS Terraform generator
deployments/scripts/aws/terraform.sh
Updated generated terraform.tfvars content: RDS engine version changed from 15.4 to 15.12.
Kubernetes deploy logic
deployments/scripts/deploy-k8s.sh
Adds NGC_API_KEY & NGC_SECRET_NAME, --ngc-api-key CLI arg, new create_image_pull_secrets() creating docker-registry secrets in three namespaces (respects DRY_RUN), Helm repo add using $oauthtoken + API key for non-private clusters, injects imagePullSecret into Helm values, uses RUN_KUBECTL/RUN_KUBECTL_APPLY_STDIN instead of some direct kubectl calls, integrates secret creation into deploy flow, and deletes the secret during cleanup.
Minimal deploy flow
deployments/scripts/deploy-osmo-minimal.sh
Adds NGC_API_KEY and --ngc-api-key; extends provider-scoped TF var parsing and regenerates terraform.tfvars when flags/env supply required values; creates image pull secrets during deployment; introduces dry-run early exit after Terraform apply; replaces some kubectl calls with RUN_KUBECTL.

Sequence Diagram

sequenceDiagram
    participant User
    participant Script as "deploy-k8s.sh"
    participant Helm as "Helm / Chart Repo"
    participant K8s as "Kubernetes API"

    User->>Script: run with --ngc-api-key <KEY>
    Script->>Script: parse args, set NGC_API_KEY
    alt NGC_API_KEY present
        Script->>K8s: create `docker-registry` secret `nvcr-secret` in namespaces (3)
        K8s-->>Script: secret created/updated
        Script->>Helm: helm repo add <repo> user="$oauthtoken" password=<NGC_API_KEY>
        Helm-->>Script: repo authenticated
        Script->>Script: inject imagePullSecret into Helm values
    end
    Script->>Helm: helm upgrade --install (with values referencing imagePullSecret)
    Helm->>K8s: apply manifests
    K8s-->>Script: deployment complete
    Note over Script,K8s: On cleanup -> delete `nvcr-secret` from namespaces
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I found a shiny NGC key tonight,
I stash it safe where secrets hide from sight.
I helm and hop, I whisper to each pod,
A secret here, a pull there — oh my mod!
The cluster hums, I nibble a carrot, applause! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Ecolter/fix deploy scripts aws' is vague and generic, using non-descriptive branch naming conventions instead of summarizing the actual changes made. Revise the title to clearly describe the main changes, such as 'Add NGC API key support and fix AWS deployment script issues' or 'Upgrade RDS version and add NGC image pull secret support'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ecolter/fix-deploy-scripts-aws

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
deployments/scripts/deploy-k8s.sh (1)

336-341: Consider using helm repo update after add to refresh credentials.

If the helm repo was previously added without credentials, helm repo add with || true will silently skip updating the authentication. The helm repo update on line 342 fetches new charts but doesn't update repo credentials.

💡 Alternative: force repo update with credentials
         if [[ -n "$NGC_API_KEY" ]]; then
-            helm repo add osmo https://helm.ngc.nvidia.com/nvidia/osmo \
-                --username='$oauthtoken' --password="$NGC_API_KEY" || true
+            helm repo add osmo https://helm.ngc.nvidia.com/nvidia/osmo \
+                --username='$oauthtoken' --password="$NGC_API_KEY" --force-update || true
         else
             helm repo add osmo https://helm.ngc.nvidia.com/nvidia/osmo || true
         fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/deploy-k8s.sh` around lines 336 - 341, The helm repo add
call for "osmo" can leave old unauthenticated credentials in place due to the
"|| true" and a later generic "helm repo update"; update the deploy-k8s.sh logic
so that when NGC_API_KEY is set you either (a) call helm repo add osmo ...
--username='$oauthtoken' --password="$NGC_API_KEY" --force-update to immediately
replace credentials, or (b) run a targeted helm repo update immediately after
the add to refresh auth for the "osmo" repo; adjust the branch that checks
NGC_API_KEY and the subsequent helm repo update invocation (references:
NGC_API_KEY, helm repo add osmo, helm repo update) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/scripts/deploy-k8s.sh`:
- Around line 309-314: Replace the raw kubectl invocation in the secret-creation
block with the provider wrapper RUN_KUBECTL to ensure provider-specific
execution (reference: NG C_SECRET_NAME variable and the kubectl create secret
docker-registry ... --dry-run=client -o yaml | kubectl apply -f - pipeline). Use
RUN_KUBECTL for both the create (--dry-run=client -o yaml) and the apply steps;
if the provider wrapper does not support piping, write the create output to a
temporary YAML file and then call RUN_KUBECTL apply -f <tempfile>, cleaning up
the temp file afterward so the behavior matches other uses of RUN_KUBECTL in the
script.

In `@deployments/scripts/deploy-osmo-minimal.sh`:
- Around line 386-397: The branch that runs when has_all_required is true still
calls azure_configure_interactively/aws_configure_interactively which can prompt
interactively; change the logic so when has_all_required is true you skip the
interactive calls and only invoke the provider-specific generate functions
(azure_generate_tfvars or aws_generate_tfvars) based on PROVIDER, ensuring flags
win and no interactive prompts are triggered.

---

Nitpick comments:
In `@deployments/scripts/deploy-k8s.sh`:
- Around line 336-341: The helm repo add call for "osmo" can leave old
unauthenticated credentials in place due to the "|| true" and a later generic
"helm repo update"; update the deploy-k8s.sh logic so that when NGC_API_KEY is
set you either (a) call helm repo add osmo ... --username='$oauthtoken'
--password="$NGC_API_KEY" --force-update to immediately replace credentials, or
(b) run a targeted helm repo update immediately after the add to refresh auth
for the "osmo" repo; adjust the branch that checks NGC_API_KEY and the
subsequent helm repo update invocation (references: NGC_API_KEY, helm repo add
osmo, helm repo update) accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb071e63-382a-47b4-bd29-10c40a3f92f0

📥 Commits

Reviewing files that changed from the base of the PR and between 0448f6b and 058d908.

📒 Files selected for processing (4)
  • deployments/scripts/README.md
  • deployments/scripts/aws/terraform.sh
  • deployments/scripts/deploy-k8s.sh
  • deployments/scripts/deploy-osmo-minimal.sh

Comment thread deployments/scripts/deploy-k8s.sh Outdated
Comment thread deployments/scripts/deploy-osmo-minimal.sh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/scripts/deploy-osmo-minimal.sh`:
- Around line 369-384: The script errors under set -u because
TF_POSTGRES_PASSWORD and TF_REDIS_PASSWORD may be unset when flags aren’t
provided; initialize them to empty defaults near the other globals (before they
are referenced) so checks in the has_all_required logic and the case on PROVIDER
won’t trigger unbound variable errors—add default assignments for
TF_POSTGRES_PASSWORD and TF_REDIS_PASSWORD alongside the other global vars (so
later references in the has_all_required logic, case "$PROVIDER", and any usage
in functions are safe).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 760ac743-7b32-456f-b356-4ad0554e81a0

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca29d8 and 88d94bc.

📒 Files selected for processing (2)
  • deployments/scripts/deploy-k8s.sh
  • deployments/scripts/deploy-osmo-minimal.sh

Comment thread deployments/scripts/deploy-osmo-minimal.sh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deployments/scripts/deploy-osmo-minimal.sh (1)

371-423: ⚠️ Potential issue | 🟠 Major

Flag overrides are still dropped when terraform.tfvars already exists.

This branch only regenerates terraform.tfvars when the secrets were passed on the current invocation. If the passwords live in the existing file and the user only changes --aws-region, --cluster-name, --environment, etc., the code falls into Using existing terraform.tfvars and the new CLI values never reach Terraform. Backfill the missing secrets from the existing file before evaluating has_all_required, then regenerate whenever any TF_* override is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/deploy-osmo-minimal.sh` around lines 371 - 423, The
current logic only checks has_all_required based on environment/flag vars and
skips regenerating terraform.tfvars when an existing tfvars file is present,
dropping other flag overrides; fix by first loading/backfilling
TF_POSTGRES_PASSWORD and TF_REDIS_PASSWORD from the existing tfvars_file (using
the same grep logic already used) before computing has_all_required, then change
the regeneration condition to regenerate (call azure_generate_tfvars or
aws_generate_tfvars) whenever any TF_* override is present (e.g.,
TF_POSTGRES_PASSWORD or TF_REDIS_PASSWORD or other TF_* flags) or when
has_all_required is true so that CLI flag overrides (region, cluster name,
environment) are never ignored for the PROVIDER-specific generators.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deployments/scripts/deploy-osmo-minimal.sh`:
- Around line 206-207: The built-in usage text in show_help does not mention the
new --redis-password flag, so update the help/usage output (the show_help
function or the help string printed by the script) to include the
--redis-password option and note that it sets TF_REDIS_PASSWORD; ensure the flag
description mirrors the existing --postgres-password entry format and includes
any expected value format or security note so users can discover and set
TF_REDIS_PASSWORD from the script's usage text.
- Around line 559-563: The DRY_RUN early-exit unconditionally prevents later
Kubernetes dry-run logic (including deploy_osmo and the call to deploy-k8s.sh)
from running; update the if [[ "$DRY_RUN" == true ]] block so it only exits when
Terraform was actually executed (e.g., when --skip-terraform is false or a
Terraform plan/run path was taken), or move this exit into the
Terraform-specific block so that DRY_RUN still allows the subsequent
deploy_osmo/deploy-k8s.sh dry-run path to run. Ensure you reference and check
the same flag/condition used when invoking Terraform (the --skip-terraform path)
before calling exit 0 so Kubernetes dry-run logic is reached when appropriate.

---

Outside diff comments:
In `@deployments/scripts/deploy-osmo-minimal.sh`:
- Around line 371-423: The current logic only checks has_all_required based on
environment/flag vars and skips regenerating terraform.tfvars when an existing
tfvars file is present, dropping other flag overrides; fix by first
loading/backfilling TF_POSTGRES_PASSWORD and TF_REDIS_PASSWORD from the existing
tfvars_file (using the same grep logic already used) before computing
has_all_required, then change the regeneration condition to regenerate (call
azure_generate_tfvars or aws_generate_tfvars) whenever any TF_* override is
present (e.g., TF_POSTGRES_PASSWORD or TF_REDIS_PASSWORD or other TF_* flags) or
when has_all_required is true so that CLI flag overrides (region, cluster name,
environment) are never ignored for the PROVIDER-specific generators.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3f6cd965-9d3c-491a-93ce-69c81937b543

📥 Commits

Reviewing files that changed from the base of the PR and between 88d94bc and 4cc7afb.

📒 Files selected for processing (1)
  • deployments/scripts/deploy-osmo-minimal.sh

Comment thread deployments/scripts/deploy-osmo-minimal.sh
Comment thread deployments/scripts/deploy-osmo-minimal.sh
@ecolternv ecolternv merged commit b53f053 into main Mar 11, 2026
9 of 10 checks passed
@ecolternv ecolternv deleted the ecolter/fix-deploy-scripts-aws branch March 11, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants