First round of ML updates by lipovsek-aws · Pull Request #413 · aws-samples/aws-hpc-tutorials

lipovsek-aws · 2023-04-27T10:04:18Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

content/03-Cluster/06-ami.md

mhuguesaws · 2023-04-27T13:22:44Z

content/03-Cluster/06-ami.md

+Lets create `ami.yml` (see [DLAMI release notes](https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html) to get AMI ARN (`ParentImage` in config bellow)):
+```
+Build:
+  SecurityGroupIds: [<insert you SG - in requires outbound traffic>]


Can you automate this?

it's automated with pcluster CLI, we could add cloudformation that sets up VPC, subnet, SG and runs pcluster CLI via bash runner (or even better trigger lambda to run it). I though to create short bash script with templating for config yaml file but I don't think it's worth the effort since it's not abstracting anything and adds boilerplate.

100% chance someone won't know how to find a security group and or even what it is. Provide the steps to create one and retrieve ID or provide the retrieve security group id step.

mhuguesaws · 2023-04-27T13:23:11Z

content/03-Cluster/06-ami.md

+First let's fetch the assets required to build the image:
+
+```bash
+wget https://ml.hpcworkshops.com/scripts/packer/packer.tar.gz


Explain the content of the archive before proposing to download.
For sake of clarity, I suggest you make the reader download the 3 files separately. At least they can review the file on github before hands. That's also give an opportunity to reviewer to look at the content of the files that are of this workshop.

@sean-smith you added this and I just moved it, any specific reason for this?

We don't need this anymore. This can just be:

git clone git@github.com:aws-samples/parallelcluster-efa-gpu-preflight-ami.git

mhuguesaws · 2023-04-27T13:23:33Z

content/03-Cluster/06-ami.md

+You can install Packer using [Brew](https://brew.sh/) on OSX or Linux as follows:
+
+```bash
+brew install packer


Standardize on Cloud9 or cloudshell..
If I have windows how do I do?

Provide a specific version to prevent regression in the future.

We'll standardize on Cloud9. Cloudshell storage space is too limited. IMHO most ML devops don't need instructions on how to use the cli. This is different than HPC.

mhuguesaws · 2023-04-27T13:24:07Z

content/03-Cluster/06-ami.md

+brew install packer
+```
+
+Alternatively, you can download the Packer binary through the [tool website](https://www.packer.io/). Ensure your `PATH` is set to use the binary or use its absolute path. Once Packer installed, proceed to the next stage.


too many options.

We'll remove.

Co-authored-by: mhuguesaws <71357145+mhuguesaws@users.noreply.github.com>

mhuguesaws

Left comments

mhuguesaws · 2023-04-27T13:30:57Z

content/03-Cluster/06-ami.md

+
+Now run [pcluster command](https://docs.aws.amazon.com/parallelcluster/latest/ug/pcluster.build-image-v3.html) that will add all pcluster dependencies to your DLAMI of choice:
+```
+pcluster build-image -c ami.yml -i NEW_AMI_ID -r REGION


variables...

Suggested change

pcluster build-image -c ami.yml -i NEW_AMI_ID -r REGION

pcluster build-image -c ami.yml -i $NEW_AMI_ID -r $AWS_REGION

mhuguesaws · 2023-04-27T13:31:37Z

content/04-Verify cluster/01-preflight.md

@@ -0,0 +1,10 @@
+---
+title: "b. Download, compile and run the NCCL tests"


Suggested change

title: "b. Download, compile and run the NCCL tests"

title: "b. Run the NCCL tests"

content/03-Cluster/06-ami.md

mhuguesaws · 2023-04-27T13:32:53Z