Commit f6c93fd
committed
Refactor: Improve Proxy Handling and Secure Boot in GPU Install Script
This commit significantly enhances the robustness and configurability of the GPU driver installation script, particularly for environments with HTTP/HTTPS proxies and those using Secure Boot.
**Key Changes:**
* **Enhanced Proxy Configuration (`set_proxy`):**
* Added support for `https-proxy` and `proxy-uri` metadata, providing more flexibility in proxy setups.
* Improved `NO_PROXY` handling with sensible defaults (including Google APIs) and user-configurable additions.
* Integrated support for custom proxy CA certificates via `http-proxy-pem-uri`, including installation into system, Java, and Conda trust stores.
* Connections to the proxy now use HTTPS when a custom CA is provided.
* Added proxy connection and reachability tests to fail fast on misconfiguration.
* Ensures `curl`, `apt`, `dnf`, `gpg`, and Java all respect the proxy settings.
* **Robust GPG Key Import (`import_gpg_keys`):**
* Introduced a new function to reliably import GPG keys from URLs or keyservers, fully respecting the configured proxy and custom CA settings.
* This replaces direct `curl | gpg --import` calls, making key fetching more resilient in restricted network environments.
* **Secure Boot Signing Refinements:**
* The `configure_dkms_certs` function now always fetches keys from Secret Manager if `private_secret_name` is set, ensuring `modulus_md5sum` is available for GCS cache paths.
* Kernel module signing is now more clearly integrated into the build process.
* Improved checks to ensure modules are actually signed and loadable after installation when Secure Boot is active.
* **Resilient Driver Installation:**
* The script now checks if the `nvidia` module can be loaded at the beginning of `install_nvidia_gpu_driver` and will re-attempt installation if it fails.
* `curl` calls for downloading drivers and other artifacts now use retry flags and honor proxy settings.
* **Conda Environment for PyTorch:**
* Adjusted package list for Conda environment, removing TensorFlow to streamline.
* Added specific workarounds for Debian 10, using `conda` instead of `mamba` and disabling SSL verification.
* **Documentation Updates (`gpu/README.md`):**
* Added details on the new proxy metadata: `https-proxy`, `proxy-uri`, `no-proxy`.
* Created a new section "Enhanced Proxy Support" explaining the features.
* Updated `http-proxy-pem-uri` description.
* Added proxy considerations to the "Troubleshooting" section.
These changes aim to make the GPU initialization action more reliable across a wider range of network environments and improve the Secure Boot workflow.1 parent 9c2983d commit f6c93fd
2 files changed
Lines changed: 633 additions & 266 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
191 | 192 | | |
192 | 193 | | |
193 | 194 | | |
194 | | - | |
| 195 | + | |
195 | 196 | | |
196 | 197 | | |
197 | 198 | | |
198 | 199 | | |
199 | 200 | | |
200 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
201 | 205 | | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
| 206 | + | |
| 207 | + | |
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
220 | 234 | | |
221 | 235 | | |
222 | 236 | | |
| |||
280 | 294 | | |
281 | 295 | | |
282 | 296 | | |
| 297 | + | |
283 | 298 | | |
284 | 299 | | |
285 | 300 | | |
| |||
298 | 313 | | |
299 | 314 | | |
300 | 315 | | |
301 | | - | |
| 316 | + | |
302 | 317 | | |
303 | 318 | | |
304 | 319 | | |
| |||
324 | 339 | | |
325 | 340 | | |
326 | 341 | | |
327 | | - | |
| 342 | + | |
0 commit comments