From 6138b842b18bd3fae040fefa978848e2243d42c9 Mon Sep 17 00:00:00 2001 From: Jake Smith <99511422+JAC0BSMITH@users.noreply.github.com> Date: Tue, 27 Jan 2026 14:07:57 -0600 Subject: [PATCH 1/3] Dropping old release note --- ...publishing.redirection.operator-nexus.json | 4 +- operator-nexus/TOC.yml | 8 +-- operator-nexus/release-notes-2404-2.md | 64 ------------------- 3 files changed, 3 insertions(+), 73 deletions(-) delete mode 100644 operator-nexus/release-notes-2404-2.md diff --git a/operator-nexus/.openpublishing.redirection.operator-nexus.json b/operator-nexus/.openpublishing.redirection.operator-nexus.json index 57ff39a7d8..302a981821 100644 --- a/operator-nexus/.openpublishing.redirection.operator-nexus.json +++ b/operator-nexus/.openpublishing.redirection.operator-nexus.json @@ -102,7 +102,7 @@ }, { "source_path": "release-note-2404.2.md", - "redirect_url": "/azure/operator-nexus/release-notes-2404-2", + "redirect_url": "/azure/operator-nexus", "redirect_document_id": false }, { @@ -114,7 +114,7 @@ "source_path": "concepts-ab-staged-commit-configuration-update-commit-workflow-v3.md", "redirect_url": "concepts-ab-staged-commit-configuration-update-commit-workflow", "redirect_document_id": false - }, + }, { "source_path": "howto-use-ab-staged-commit-configuration-update-commit-workflow-v3.md", "redirect_url": "howto-use-ab-staged-commit-configuration-update-commit-workflow", diff --git a/operator-nexus/TOC.yml b/operator-nexus/TOC.yml index 2d519151be..242e6e8362 100644 --- a/operator-nexus/TOC.yml +++ b/operator-nexus/TOC.yml @@ -542,10 +542,4 @@ href: reference-operator-nexus-skus.md - name: Password By Key Vault Reference href: reference-key-vault-credential.md -- name: Release Notes - items: - - name: 2024 - expanded: false - items: - - name: 2404.2 - href: release-notes-2404-2.md + diff --git a/operator-nexus/release-notes-2404-2.md b/operator-nexus/release-notes-2404-2.md deleted file mode 100644 index de937c7978..0000000000 --- a/operator-nexus/release-notes-2404-2.md +++ /dev/null @@ -1,64 +0,0 @@ ---- -title: Azure Operator Nexus Release Notes 2404.2 -description: Release notes for Operator Nexus 2404.2 release. -ms.topic: article -ms.date: 09/15/2025 -author: scottsteinbrueck -ms.author: ssteinbrueck -ms.service: azure-operator-nexus ---- - -# Operator Nexus Release Version 2404.2 - -Release date: April 29, 2024 - -## Release summary - -Operator Nexus 2404.2 includes NC3.10 Management updates and an NC3.8.7 -runtime patch. - -## Release highlights - -### Resiliency enhancements - -* Bare Metal Machine (BMM)/BMC KeySets - Enhanced handling of Entra disconnected state. - -* Prevents simultaneous disruptive BMM actions against Kubernetes Control Plane nodes. - -* Prevents user from adding or deleting a hybrid-compute machine extension on the Cluster MRG. - -* Prevents user from creating and deleting arc-connected clusters and the arc-connected machine from Nexus Kubernetes Service (NKS) MRG. - -### Security enhancements - -* Credential rotation status information on the Bare-metal Machine (BMC or Console User) and Storage Appliance (Storage Admin) resources. - -* Harden Network Fabric Controller (NFC) Infrastructure Proxy to allow outbound connections to known services. - -* HTTP/2 enhancements. - -* Remove Key Vault from Cluster Manager MRG. - -### Observability enhancements - -* (Preview) Appropriate status reflected for Rack Pause scenarios. - -* More metrics support: Calico data-plane failures, disk latency, etcd, hypervisor memory usage, pageswap, pod restart, NTP. - -* Enable users to create alert rules that track disconnection metrics for connectivity of clusters to the Cluster Manager. - -* Enable storage appliance logs. - -### Other updates - -* Enable high-availability for NFS Storage. - -* Support Purity 6.5.4. - -* PURE hardware upgrade from R3 to R4. - -* Updated OS image as a 3.8.7 runtime patch release to remediate new Common Vulnerabilities Exposures (CVEs). - -## Next steps - -* Learn more about [supported software versions](./reference-supported-software-versions.md). \ No newline at end of file From 377bfc0db94d0b1f7b3121bf372d49c791f49d70 Mon Sep 17 00:00:00 2001 From: Jake Smith <99511422+JAC0BSMITH@users.noreply.github.com> Date: Tue, 27 Jan 2026 14:51:43 -0600 Subject: [PATCH 2/3] Improve bare-metal documentation cohesiveness - Add cross-references between related troubleshooting articles - Link hardware validation failure docs to concepts overview - Add prerequisites section to warning troubleshooting article - Add quick decision guide to BMM functions how-to - Link troubleshooting articles to run-read and BMC access guides - Enhance Related Operations section in HWV concepts doc These changes improve navigation between related documents and help users find the right information more quickly. --- .../concepts-hardware-validation-overview.md | 12 +++++++++++ operator-nexus/howto-baremetal-functions.md | 20 ++++++++++++++++++- ...leshoot-bare-metal-machine-provisioning.md | 5 +++++ ...troubleshoot-bare-metal-machine-warning.md | 9 ++++++++- ...roubleshoot-hardware-validation-failure.md | 2 ++ 5 files changed, 46 insertions(+), 2 deletions(-) diff --git a/operator-nexus/concepts-hardware-validation-overview.md b/operator-nexus/concepts-hardware-validation-overview.md index 9733e34ddd..7267845821 100644 --- a/operator-nexus/concepts-hardware-validation-overview.md +++ b/operator-nexus/concepts-hardware-validation-overview.md @@ -56,3 +56,15 @@ Up to date Azure Operator Nexus firmware specs, and N-1 and N-2 versions can be ## BIOS boot configuration update HWV verifies that the BIOS boot configuration meets the requirements for successful bootstrapping. If any settings are incorrect, HWV automatically updates them to match the required specifications. + +## Troubleshooting hardware validation failures + +If hardware validation fails during cluster deployment or a BMM Replace action, see [Troubleshoot hardware validation failure](./troubleshoot-hardware-validation-failure.md) for detailed troubleshooting procedures organized by validation category. + +## Related operations + +Hardware validation is automatically triggered during: +- Initial cluster deployment +- [BMM Replace actions](./howto-baremetal-functions.md#replace-a-bare-metal-machine) + +After hardware repairs are completed, you must run a Replace action to re-validate and provision the BMM. diff --git a/operator-nexus/howto-baremetal-functions.md b/operator-nexus/howto-baremetal-functions.md index eeda6f19e7..bcb4cdcde7 100644 --- a/operator-nexus/howto-baremetal-functions.md +++ b/operator-nexus/howto-baremetal-functions.md @@ -62,6 +62,22 @@ Use the following guidance to determine which action best fits your situation: | BMC credentials need manual rotation | Replace | | Firmware reconciliation needed | Replace | +### Quick decision guide + +**Start here if you're unsure which action to use:** + +1. **Did you physically replace or repair hardware?** → Use [Replace](#replace-a-bare-metal-machine) +2. **Is the machine unresponsive but hardware is healthy?** → Use [Restart](#restart-a-bare-metal-machine) +3. **Do you need a clean OS installation without hardware changes?** → Use [Reimage](#reimage-a-bare-metal-machine) +4. **Do you need to prevent new workloads temporarily?** → Use [Cordon](#make-a-bare-metal-machine-unschedulable-cordon) +5. **Is the machine powered off and needs to come online?** → Use [Start](#start-a-bare-metal-machine) + +**For troubleshooting specific status conditions, see:** +- [Warning status messages](./troubleshoot-bare-metal-machine-warning.md) +- [Degraded status](./troubleshoot-bare-metal-machine-degraded.md) +- [Provisioning failures](./troubleshoot-bare-metal-machine-provisioning.md) +- [Hardware validation failures](./troubleshoot-hardware-validation-failure.md) + ## Control plane node considerations Control plane nodes require extra caution when performing lifecycle actions. The platform implements special handling for control plane nodes to maintain cluster quorum and availability: @@ -343,4 +359,6 @@ Code: None Message: Networking test(s) failed: [NIC.Slot.6-1-1_LinkStatus] expected: up; observed: Down; [Additional logs: Link failure detected on NIC.Slot.6-1-1; Unable to perform cabling check on PCI Slot 6] ``` -For more information about troubleshooting hardware validation failures, see [Troubleshoot Hardware Validation Failure](./troubleshoot-hardware-validation-failure.md). +For complete hardware validation troubleshooting procedures organized by failure category (System Info, Drive Info, Network Info, Health Info, Boot Info), see [Troubleshoot Hardware Validation Failure](./troubleshoot-hardware-validation-failure.md). + +To understand what hardware validation checks and when it runs, see [Hardware Validation Overview](./concepts-hardware-validation-overview.md). diff --git a/operator-nexus/troubleshoot-bare-metal-machine-provisioning.md b/operator-nexus/troubleshoot-bare-metal-machine-provisioning.md index 4ddd31a2f7..0193344bc6 100644 --- a/operator-nexus/troubleshoot-bare-metal-machine-provisioning.md +++ b/operator-nexus/troubleshoot-bare-metal-machine-provisioning.md @@ -18,6 +18,9 @@ Provisioning uses the Preboot eXecution Environment (PXE) interface to load the [!INCLUDE [prerequisites-azure-cli-bare-metal-machine-actions](./includes/baremetal-machines/prerequisites-azure-cli-bare-metal-machine-actions.md)] +1. For BMC diagnostic access: See [Manage emergency access to a Bare Metal Machine using the `az networkcloud cluster bmckeyset`](./howto-baremetal-bmc-ssh.md) +1. For running diagnostic commands on control plane nodes: See [Troubleshoot Bare-Metal Machines by Using the run-read Command](./howto-baremetal-run-read.md) + ## Bare Metal Machine roles For a specific version, roles are required to manage and operate the underlying Kubernetes cluster. @@ -170,6 +173,8 @@ racadm --nocertwarn -r $IP -u $BMC_USR -p $BMC_PWD getsysinfo | grep "MAC Addres racadm --nocertwarn -r $IP -u $BMC_USR -p $BMC_PWD getsysinfo | grep "NIC.Embedded.1-1-1" #Boot MAC ``` +For detailed BMC access procedures and additional diagnostic commands, see [Manage emergency access to a Bare Metal Machine using the `az networkcloud cluster bmckeyset`](./howto-baremetal-bmc-ssh.md). + If the MAC address supplied to the cluster is incorrect, use the Bare Metal Machine `replace` action at [Bare Metal Machine actions](howto-baremetal-functions.md) to correct the addresses. ### Ping test BMC connectivity diff --git a/operator-nexus/troubleshoot-bare-metal-machine-warning.md b/operator-nexus/troubleshoot-bare-metal-machine-warning.md index f96f10583b..95d8c96138 100644 --- a/operator-nexus/troubleshoot-bare-metal-machine-warning.md +++ b/operator-nexus/troubleshoot-bare-metal-machine-warning.md @@ -14,6 +14,12 @@ ms.reviewer: ekarandjeff This document provides basic troubleshooting information for Bare Metal Machine (BMM) resources that are reporting a _Warning_ message in the BMM detailed status message. +## Prerequisites + +- Access to the Azure portal or Azure CLI +- Permissions to view and manage Bare Metal Machine resources +- For diagnostic commands: SSH access via BareMetalMachineKeySet (see [Manage emergency access to a Bare Metal Machine](./howto-baremetal-bmm-ssh.md)) + ## Symptoms The Detailed status message of the Bare Metal Machine (Operator Nexus) resource includes one or more of the following. @@ -55,7 +61,8 @@ az networkcloud baremetalmachine run-read-command \ - Replace `` with the name of the resource group containing the BMM resources. - Replace `rack1control01` with the name of a BMM resource for a healthy Kubernetes control plane node, from which to execute the `kubectl get` command. - Replace `rack1compute01` with the name of the affected BMM. -- For more information about the `run-read-command` feature, see [BareMetal Run-Read Execution](./howto-baremetal-run-read.md). + +For more information about the `run-read-command` feature and available diagnostic commands, see [Troubleshoot Bare-Metal Machines by Using the run-read Command](./howto-baremetal-run-read.md). Review the `lastTransitionTime` and `message` fields for more information about the corresponding error condition, as shown in the following example output. diff --git a/operator-nexus/troubleshoot-hardware-validation-failure.md b/operator-nexus/troubleshoot-hardware-validation-failure.md index 30d1b64bd9..ce1a0721aa 100644 --- a/operator-nexus/troubleshoot-hardware-validation-failure.md +++ b/operator-nexus/troubleshoot-hardware-validation-failure.md @@ -16,6 +16,8 @@ HWV is run as part of a cluster deploy action and a bare metal `replace` action. HWV validates a Bare Metal Machine (BMM) by executing test cases against the baseboard management controller (BMC). The Azure Operator Nexus platform is deployed on Dell servers. Dell servers use the integrated Dell remote access controller (iDRAC), which is the equivalent of a BMC. +For background information about hardware validation, when it runs, and what it checks, see [Azure Operator Nexus hardware validation overview](./concepts-hardware-validation-overview.md). + [!INCLUDE [prerequisites-azure-cli-bare-metal-machine-actions](./includes/baremetal-machines/prerequisites-azure-cli-bare-metal-machine-actions.md)] 1. Request access to the cluster's Log Analytics workspace (LAW). From 8ebf6b7b6604ad21de9f4882361e6b957ff61e81 Mon Sep 17 00:00:00 2001 From: Ronmia Bess <61058899+ronmiab@users.noreply.github.com> Date: Tue, 27 Jan 2026 16:39:02 -0600 Subject: [PATCH 3/3] Quick edit: Update unregister-register-machine.md --- azure-local/manage/unregister-register-machine.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/azure-local/manage/unregister-register-machine.md b/azure-local/manage/unregister-register-machine.md index 542b51ece1..3785937543 100644 --- a/azure-local/manage/unregister-register-machine.md +++ b/azure-local/manage/unregister-register-machine.md @@ -4,7 +4,7 @@ description: Learn how to unregister and re-register Azure Local machines withou ms.topic: how-to ms.author: alkohli author: alkohli -ms.date: 11/20/2025 +ms.date: 01/27/2026 ms.subservice: hyperconverged --- @@ -15,6 +15,8 @@ ms.subservice: hyperconverged This article provides guidance on how to unregister and re-register Azure Local machines without having to install the operating system (OS) again. This method uses PowerShell cmdlets and applies to registration with and without Azure Arc gateway. +> [!IMPORTANT] +> This guidance applies only to devices that haven't been deployed yet. ## About reregistration Azure Local machines