Update service memory mitigation: remove stale system health checks#266
Update service memory mitigation: remove stale system health checks#266alexbass112 wants to merge 1 commit intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates an Azure Local TSG mitigation script intended to reduce Update Service memory pressure by cleaning up health check artifacts and clearing the Update Service cache when cleanup occurs.
Changes:
- Moves
$removedHCinitialization outside the installed-updates loop and keeps cache-clear gated on whether anything was removed. - Adds logic to remove stale entries under the
HealthCheck\Systemdirectory (keeping the most recent set). - Updates the cache-clear endpoint discovery to use the explicit Update Service cluster group name.
| $oldSystemChecks = Get-ChildItem $systemHealthCheckPath | sort LastWriteTime | select -SkipLast 10 | ||
| if ($oldSystemChecks) | ||
| { | ||
| Write-Host "Some health checks were removed, so clearing update service cache." | ||
| $clientCert = ls Cert:\LocalMachine\My\ | ? Subject -match "URP" | sort NotAfter | select -last 1 | ||
| $updateEndpoint = "https://$(Get-ClusterGroup *update* | % OwnerNode | % Name).$($env:USERDNSDOMAIN):4900" | ||
| $clearCacheResult = Invoke-WebRequest -Certificate $clientCert -UseBasicParsing -Uri "$updateEndpoint/caches/Update" -Method "Delete" | ||
| if ([int]::TryParse($clearCacheResult.Content, [ref]$null)) | ||
| { | ||
| Write-Host "Removed $($clearCacheResult.Content) updates from the cache." | ||
| } | ||
| else | ||
| { | ||
| Write-Warning "Unexpected result from clearing the cache: $($clearCacheResult | Out-String)" | ||
| } | ||
| Write-Host "Removing $(oldSystemChecks.Count) old system health check results." | ||
| $removedHC = $true | ||
| $oldSystemChecks | Remove-Item -Force -Verbose |
There was a problem hiding this comment.
In the system health cleanup block, the script references $systemHealthCheckPath (undefined) instead of $systemHealthPath, and the log message uses $(oldSystemChecks.Count) without $ (will error in the subexpression). Additionally, Select-Object -SkipLast is not available in Windows PowerShell 5.1 (typical default on nodes), so this line will fail unless running PowerShell 7+. Update the path variable reference, fix the subexpression to use $oldSystemChecks, and use a PowerShell 5.1-compatible approach to keep the newest 10 items (e.g., sort descending and skip 10).
| { | ||
| Write-Host "Some health checks were removed, so clearing update service cache." | ||
| $clientCert = ls Cert:\LocalMachine\My\ | ? Subject -match "URP" | sort NotAfter | select -last 1 | ||
| $updateEndpoint = "https://$(Get-ClusterGroup "Azure Stack HCI Update Service Cluster Group" | % OwnerNode | % Name).$($env:USERDNSDOMAIN):4900" |
There was a problem hiding this comment.
This assignment to $updateEndpoint uses a double-quoted string that contains another double-quoted string inside the $() expression (Get-ClusterGroup "Azure Stack HCI Update Service Cluster Group"), which will break PowerShell parsing. Use single quotes for the cluster group name (or build the URL in multiple steps) so the outer string remains valid.
| $updateEndpoint = "https://$(Get-ClusterGroup "Azure Stack HCI Update Service Cluster Group" | % OwnerNode | % Name).$($env:USERDNSDOMAIN):4900" | |
| $updateEndpoint = "https://$(Get-ClusterGroup 'Azure Stack HCI Update Service Cluster Group' | % OwnerNode | % Name).$($env:USERDNSDOMAIN):4900" |
No description provided.