Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ Two-Node Toolbox (TNF) is a comprehensive deployment automation framework for Op
# From the deploy/ directory:

# Deploy AWS hypervisor and cluster in one command
make deploy arbiter-ipi # Deploy arbiter topology cluster
make deploy arbiter-ipi # Deploy arbiter topology cluster
make deploy fencing-ipi # Deploy fencing topology cluster
make deploy fencing-assisted # Deploy hub + spoke TNF via assisted installer

# Instance lifecycle management
make create # Create new EC2 instance
Expand Down Expand Up @@ -70,6 +71,15 @@ ansible-playbook kcli-install.yml -i inventory.ini -e "test_cluster_name=my-clus
ansible-playbook kcli-install.yml -i inventory.ini -e "force_cleanup=true"
```

#### Assisted Installer Method (Spoke TNF via ACM)
```bash
# Copy and customize the configuration template
cp vars/assisted.yml.template vars/assisted.yml

# Deploy hub + spoke TNF cluster via assisted installer
make deploy fencing-assisted
```

### Linting and Validation
```bash
# Shell script linting (from repository root)
Expand All @@ -88,14 +98,17 @@ make shellcheck
- Automatic inventory management for Ansible integration

2. **OpenShift Cluster Deployment** (`deploy/openshift-clusters/`)
- Two deployment methods: dev-scripts (traditional) and kcli (modern)
- Three deployment methods: dev-scripts (traditional), kcli (modern), and assisted installer (spoke via ACM)
- Ansible roles for complete cluster automation
- Support for both arbiter and fencing topologies
- Assisted installer deploys spoke TNF clusters on an existing hub via ACM/MCE
- Proxy configuration for external cluster access

3. **Ansible Roles Architecture**:
- `dev-scripts/install-dev`: Traditional deployment using openshift-metal3/dev-scripts
- `kcli/kcli-install`: Modern deployment using kcli virtualization management
- `assisted/acm-install`: Install ACM/MCE + assisted service + enable TNF on hub
- `assisted/assisted-spoke`: Deploy spoke TNF cluster via assisted installer + BMH
- `proxy-setup`: Squid proxy for cluster external access
- `redfish`: Automated stonith configuration for fencing topology
- `config`: SSH key and git configuration
Expand All @@ -119,16 +132,23 @@ make shellcheck
- `roles/kcli/kcli-install/files/pull-secret.json`: OpenShift pull secret
- SSH key automatically read from `~/.ssh/id_ed25519.pub` on ansible controller

#### Assisted Installer Method
- `vars/assisted.yml`: Variable override file (copy from `vars/assisted.yml.template`)
- Hub cluster must be deployed first via dev-scripts (`make deploy fencing-ipi`)
- Spoke credentials output to `~/<spoke_cluster_name>/auth/` on hypervisor
- Hub proxy preserved as `hub-proxy.env`

#### Generated Files
- `proxy.env`: Generated proxy configuration (source this to access cluster)
- `hub-proxy.env`: Hub proxy config (preserved when spoke proxy is configured)
- `kubeconfig`: OpenShift cluster kubeconfig
- `kubeadmin-password`: Default admin password

### Development Workflow

1. **Environment Setup**: Use `deploy/aws-hypervisor/` tools or bring your own RHEL 9 host
2. **Configuration**: Edit inventory and config files based on chosen deployment method
3. **Deployment**: Run appropriate Ansible playbook (setup.yml or kcli-install.yml)
3. **Deployment**: Run appropriate Ansible playbook (setup.yml, kcli-install.yml, or assisted-install.yml)
4. **Access**: Source `proxy.env` and use `oc` commands or WebUI through proxy
5. **Cleanup**: Use cleanup make targets or Ansible playbooks

Expand Down
5 changes: 5 additions & 0 deletions deploy/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ arbiter-kcli:
fencing-kcli:
@./openshift-clusters/scripts/deploy-cluster.sh --topology fencing --method kcli

fencing-assisted:
@$(MAKE) fencing-ipi
@./openshift-clusters/scripts/deploy-fencing-assisted.sh

patch-nodes:
@./openshift-clusters/scripts/patch-nodes.sh
get-tnf-logs:
Expand Down Expand Up @@ -109,6 +113,7 @@ help:
@echo " fencing-kcli - Deploy fencing cluster using kcli (non-interactive)"
@echo ""
@echo "OpenShift Cluster Management:"
@echo " fencing-assisted - Deploy hub + spoke TNF cluster via assisted installer"
@echo " redeploy-cluster - Redeploy OpenShift cluster using dev-scripts make redeploy"
@echo " shutdown-cluster - Shutdown OpenShift cluster VMs in orderly fashion"
@echo " startup-cluster - Start up OpenShift cluster VMs and proxy container"
Expand Down
2 changes: 1 addition & 1 deletion deploy/aws-hypervisor/scripts/create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ echo -e "AMI ID: $RHEL_HOST_AMI"
echo -e "Machine Type: $EC2_INSTANCE_TYPE"

ec2Type="VirtualMachine"
if [[ "$EC2_INSTANCE_TYPE" =~ c[0-9]+[gn].metal ]]; then
if [[ "$EC2_INSTANCE_TYPE" =~ c[0-9]+[a-z]*.metal ]]; then
ec2Type="MetalMachine"
fi

Expand Down
108 changes: 108 additions & 0 deletions deploy/openshift-clusters/assisted-install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
# Deploy a spoke TNF cluster via ACM/assisted installer on an existing hub cluster.
#
# Prerequisites:
# - vars/assisted.yml exists (copy from vars/assisted.yml.template)
#
# Usage:
# make deploy fencing-assisted

- hosts: metal_machine
gather_facts: yes

vars:
topology: fencing
interactive_mode: false

vars_files:
- vars/assisted.yml

pre_tasks:
- name: Check that proxy.env exists (hub must be deployed first)
stat:
path: "{{ playbook_dir }}/proxy.env"
delegate_to: localhost
register: proxy_env_check

- name: Fail if proxy.env is missing
fail:
msg: >-
proxy.env not found. The hub cluster must be deployed first
using 'make deploy fencing-ipi'. proxy.env is required for
cluster access.
when: not proxy_env_check.stat.exists

- name: Check that hub kubeconfig exists
stat:
path: "{{ ansible_user_dir }}/auth/kubeconfig"
register: hub_kubeconfig_check

- name: Fail if hub kubeconfig is missing
fail:
msg: >-
Hub kubeconfig not found at ~/auth/kubeconfig.
The hub cluster must be deployed first.
when: not hub_kubeconfig_check.stat.exists

- name: Set hub KUBECONFIG path
set_fact:
hub_kubeconfig: "{{ ansible_user_dir }}/auth/kubeconfig"

- name: Preserve hub proxy.env as hub-proxy.env
copy:
src: "{{ playbook_dir }}/proxy.env"
dest: "{{ playbook_dir }}/hub-proxy.env"
remote_src: no
backup: no
delegate_to: localhost

- name: Display assisted installer configuration
debug:
msg: |
Assisted Installer Configuration:
Hub operator: {{ hub_operator }}
ACM/MCE channel: {{ acm_channel if hub_operator == 'acm' else mce_channel }}
Spoke cluster: {{ spoke_cluster_name }}.{{ spoke_base_domain }}
Spoke release image: {{ spoke_release_image }}
Spoke VMs: {{ spoke_ctlplanes }}x ({{ spoke_vm_vcpus }} vCPUs, {{ spoke_vm_memory }}MB RAM, {{ spoke_vm_disk_size }}GB disk)
Spoke network: {{ spoke_network_cidr }}
API VIP: {{ spoke_api_vip }}
Ingress VIP: {{ spoke_ingress_vip }}
Storage method: {{ assisted_storage_method }}
Force cleanup: {{ force_cleanup }}

roles:
- role: assisted/acm-install
- role: assisted/assisted-spoke

post_tasks:
- name: Setup proxy access for spoke cluster
include_role:
name: proxy-setup
vars:
kubeconfig_path: "{{ spoke_kubeconfig_path }}"
kubeadmin_password_path: "{{ spoke_kubeadmin_password_path }}"

- name: Update cluster inventory with spoke VMs
include_role:
name: common
tasks_from: update-cluster-inventory
vars:
test_cluster_name: "{{ spoke_cluster_name }}"

- name: Display deployment summary
debug:
msg: |
Spoke TNF cluster deployed successfully!

Spoke credentials:
Kubeconfig: {{ spoke_kubeconfig_path }}
Admin password: {{ spoke_kubeadmin_password_path }}

Access spoke cluster:
source proxy.env
KUBECONFIG={{ spoke_kubeconfig_path }} oc get nodes

Access hub cluster:
source hub-proxy.env
KUBECONFIG=~/auth/kubeconfig oc get nodes
2 changes: 2 additions & 0 deletions deploy/openshift-clusters/collections/requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@ collections:
version: ">=2.0"
- name: community.general
version: ">=5.0.0"
- name: ansible.utils
version: ">=2.0.0"
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
# Default variables for acm-install role

# Hub kubeconfig path (set by playbook pre_tasks, fallback to ansible_user_dir)
hub_kubeconfig: "{{ ansible_user_dir }}/auth/kubeconfig"

# Hub operator to install: "acm" or "mce"
hub_operator: acm

# ACM/MCE channel: "auto" detects from packagemanifest
acm_channel: "auto"
mce_channel: "auto"

# Storage method for assisted service: "hostpath"
assisted_storage_method: "hostpath"

# hostPath directories on hub nodes
assisted_images_path: /var/lib/assisted-images
assisted_db_path: /var/lib/assisted-db
assisted_images_size: 50Gi
assisted_db_size: 10Gi
assisted_storage_class: assisted-service

# Timeouts (seconds)
acm_csv_timeout: 900
multiclusterhub_timeout: 1800
assisted_service_timeout: 600
metal3_stabilize_timeout: 300
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
# Create AgentServiceConfig with RHCOS ISO auto-extracted from release image

- name: Get hub release image
shell: |
oc get clusterversion version -o jsonpath='{.status.desired.image}'
register: hub_release_image
changed_when: false

- name: Get hub OCP version
shell: |
oc get clusterversion version -o jsonpath='{.status.desired.version}' \
| cut -d. -f1-2
register: hub_ocp_version
changed_when: false

- name: Determine spoke release image
set_fact:
effective_release_image: >-
{{ hub_release_image.stdout if spoke_release_image == 'auto'
else spoke_release_image }}

- name: Extract RHCOS ISO URL from release image
shell: |
# Get the machine-os-images reference from the release image
RHCOS_REF=$(oc adm release info "{{ effective_release_image }}" \
--registry-config="{{ pull_secret_path }}" \
--image-for=machine-os-images 2>/dev/null)
if [ -z "$RHCOS_REF" ]; then
echo "FAILED: Could not extract machine-os-images from release image"
exit 1
fi
# Extract the RHCOS ISO URL from the image labels/annotations
oc image info "$RHCOS_REF" --registry-config="{{ pull_secret_path }}" \
-o json 2>/dev/null \
| python3 -c "
import json, sys
data = json.load(sys.stdin)
labels = data.get('config', {}).get('config', {}).get('Labels', {})
stream = labels.get('coreos.stream', '')
version = labels.get('version', '')
if stream and version:
url = f'https://rhcos.mirror.openshift.com/art/storage/prod/streams/{stream}/builds/{version}/x86_64/rhcos-{version}-live-iso.x86_64.iso'
print(url)
else:
print('NEEDS_FALLBACK')
"
register: rhcos_iso_extraction
changed_when: false
failed_when: "'FAILED' in rhcos_iso_extraction.stdout"

- name: Try fallback RHCOS ISO extraction via coreos print-stream-json
shell: |
rm -rf /tmp/oc-extract && mkdir -p /tmp/oc-extract
RHCOS_URL=$(oc adm release extract "{{ effective_release_image }}" \
--registry-config="{{ pull_secret_path }}" \
--command=openshift-install --to=/tmp/oc-extract 2>/dev/null && \
/tmp/oc-extract/openshift-install coreos print-stream-json 2>/dev/null \
| python3 -c "
import json, sys
data = json.load(sys.stdin)
iso = data['architectures']['x86_64']['artifacts']['metal']['formats']['iso']['disk']
print(iso['location'])
" 2>/dev/null) || true
rm -rf /tmp/oc-extract
if [ -n "$RHCOS_URL" ]; then
echo "$RHCOS_URL"
else
echo "FAILED"
fi
register: rhcos_iso_fallback
changed_when: false
when: "'NEEDS_FALLBACK' in rhcos_iso_extraction.stdout"

- name: Set RHCOS ISO URL fact
set_fact:
rhcos_iso_url: >-
{{ rhcos_iso_fallback.stdout | default(rhcos_iso_extraction.stdout) | trim }}
failed_when: rhcos_iso_url == 'FAILED' or rhcos_iso_url == 'NEEDS_FALLBACK'

- name: Display RHCOS ISO URL
debug:
msg: "RHCOS ISO URL: {{ rhcos_iso_url }}"

- name: Get RHCOS version from ISO URL
set_fact:
rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | first }}"

- name: Create AgentServiceConfig
template:
src: agentserviceconfig.yml.j2
dest: /tmp/agentserviceconfig.yml
mode: '0644'

- name: Apply AgentServiceConfig
shell: |
oc apply -f /tmp/agentserviceconfig.yml
register: asc_result
changed_when: "'created' in asc_result.stdout"

- name: Wait for assisted-service pod to be Running (2/2)
shell: |
oc get pods -n {{ assisted_service_namespace }} -l app=assisted-service \
--no-headers 2>/dev/null | grep -q '2/2.*Running'
register: assisted_pod
until: assisted_pod.rc == 0
retries: "{{ (assisted_service_timeout / 15) | int }}"
delay: 15

- name: Display assisted-service pod status
shell: |
oc get pods -n {{ assisted_service_namespace }} -l app=assisted-service
register: pod_status
changed_when: false

- name: Show assisted-service pod
debug:
msg: "{{ pod_status.stdout }}"
Loading