diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md deleted file mode 100644 index b2c662e0ce..0000000000 --- a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md +++ /dev/null @@ -1,310 +0,0 @@ ---- -title: Build the Topo Template from scratch -weight: 4 - -### FIXED, DO NOT MODIFY -layout: learningpathall ---- - -## Start from the application pieces - -The `topo-imx93-npu-deployment` repository is a Compose project with Topo metadata at the root. The Topo-specific part is not a replacement for Compose. The services still describe container builds, dependencies, ports, volumes, and runtime settings. The `x-topo` block adds the metadata Topo uses to identify the Template, check target requirements, and prompt for configuration. - -The project has three implementation areas: - -- `executorch-runner/`: builds the ExecuTorch `.pte` program and the Cortex-M33 firmware ELF. -- `webapp/`: builds the Flask application that stages memory and sends `RUN` commands over `RPMsg`. -- `compose.yaml`: connects the build artifacts, runtime services, Remoteproc Runtime settings, and Topo metadata. - -When bootstrapping this Template from scratch, first make the project work as a normal Compose build. Then add the `x-topo` metadata that lets Topo deploy it consistently to an Arm64 target. - -## Install the Topo Template authoring skills - -The [Topo Template Format](https://github.com/arm/Topo-Template-Format) repository includes public authoring skills for agents that support skill installation: - -- `topo-template-context`: provides Topo and Topo Template reference context for `x-topo` metadata, schema, docs, and CLI Template behavior. -- `topo-template-bootstrap`: converts a Compose repository into a Topo Template by adding or improving `compose.yaml` and `x-topo` metadata. -- `topo-template-lint`: reviews a Topo Template for schema correctness, metadata consistency, deployment success messages, and build argument wiring. - -Install the skills with `npx skills`: - -```bash -npx skills add arm/topo-template-format -``` - -If your agent does not use `npx skills`, clone the Template Format repository and manually copy or symlink the directories under `skills/` into your agent's skills directory: - -```bash -git clone https://github.com/arm/Topo-Template-Format.git -``` - -Restart your agent after installing or updating the skills. - -You can then use the skills as part of the Template authoring flow. From the root of any Compose project, ask your agent to use `topo-template-bootstrap`: - -```output -Use topo-template-bootstrap on this repository. -Treat the root compose.yaml as the Template root. -Preserve plain docker compose behavior. -Add x-topo metadata only where it reflects the actual services, hardware requirements, and build arguments. -``` - -After bootstrap, ask the agent to use `topo-template-lint`: - -```output -Use topo-template-lint on topo-imx93-npu-deployment. -Validate compose.yaml against the Topo Template Format schema. -Check README alignment, deployment_success_message, Remoteproc Runtime metadata, and x-topo.args wiring. -``` - -The lint pass should confirm that the Template has a root-level `x-topo.name`, that non-remoteproc services use `platform: linux/arm64`, that `cm33-runner` uses the Remoteproc Runtime annotation, and that every `x-topo.args` entry is carried into Compose or Docker build arguments where appropriate. - -## Create the runner build pipeline - -The `executorch-runner/Dockerfile` is a multi-stage Dockerfile. It builds two artifacts from one build context: - -- `mv2_ethosu65_256.pte`: the MobileNetV2 ExecuTorch program lowered for `ethos-u65-256`. -- `executorch_runner_cm33.elf`: the Cortex-M33 firmware image loaded by Linux `remoteproc`. - -The first half of the Dockerfile builds the model artifact: - -```Dockerfile -FROM build-base AS executorch-base -... -FROM executorch-base AS pte-builder -... -RUN source /workspace/executorch/examples/arm/arm-scratch/setup_path.sh && \ - python /usr/local/bin/export_mv2_imx93.py - -FROM busybox:1.36 AS pte-artifacts -COPY --from=pte-builder /workspace/build/mv2-imx93/mv2_ethosu65_256.pte /artifacts/mv2_ethosu65_256.pte -``` - -The second half builds and packages the firmware: - -```Dockerfile -FROM build-base AS runner-base -ARG MCUXSDK_MANIFEST_URL=https://github.com/nxp-mcuxpresso/mcuxsdk-manifests.git -ARG MCUXSDK_MANIFEST_REV=v25.09.00 -... -FROM runner-base AS runner-builder -RUN /usr/local/bin/build-runner.sh /artifacts - -FROM scratch AS runner-runtime -COPY --from=runner-builder /artifacts/executorch_runner_cm33.elf /executorch_runner_cm33.elf -ENTRYPOINT ["/executorch_runner_cm33.elf"] -``` - -The `runner-runtime` stage is intentionally a `scratch` image. The only payload is the ELF file. When the service starts with `runtime: io.containerd.remoteproc.v1`, containerd uses Remoteproc Runtime instead of a normal Linux process runtime. Remoteproc Runtime passes the ELF entrypoint to the Linux `remoteproc` driver, and the `imx-rproc` driver loads and releases the Cortex-M33. - -The project also applies patches before building the runner. One patch changes the MCUX SDK RAM linker and startup behavior so initialized data is loaded in-place by `remoteproc` rather than copied from a flash-style load address. The runner patches add RPMsg stability fixes and trace output used by the web application. - -## Add artifact-only Compose services - -At the root of the Template, create normal Compose services for the build outputs: - -```yaml -services: - pte-artifacts: - platform: linux/arm64 - scale: 0 - build: - context: executorch-runner - dockerfile: Dockerfile - target: pte-artifacts - - runner-artifacts: - platform: linux/arm64 - scale: 0 - build: - context: executorch-runner - dockerfile: Dockerfile - target: runner-artifacts -``` - -These services are not runtime application containers. `scale: 0` keeps them out of the running deployment while still making their build targets available to the rest of the Compose project. - -The web application imports the PTE artifact as a BuildKit additional context: - -```yaml -services: - webapp: - platform: linux/arm64 - build: - context: . - dockerfile: Dockerfile - additional_contexts: - pte_artifacts: service:pte-artifacts -``` - -The webapp Dockerfile then copies from that context: - -```Dockerfile -COPY --from=pte_artifacts /artifacts/mv2_ethosu65_256.pte /opt/mv2-imx93/mv2_ethosu65_256.pte -``` - -This keeps the model export pipeline separate from the Flask app while still producing one deployable webapp image. - -## Add the remote processor service - -The Cortex-M33 firmware is represented as another Compose service: - -```yaml -services: - cm33-runner: - platform: linux/arm64 - build: - context: executorch-runner - dockerfile: Dockerfile - target: runner-runtime - runtime: io.containerd.remoteproc.v1 - annotations: - remoteproc.name: imx-rproc -``` - -This is the key heterogeneous deployment hook. The service is still built by Docker, but it is not launched as a Linux userspace process. The `runtime` selects the containerd Remoteproc Runtime shim, and `remoteproc.name: imx-rproc` selects the i.MX 93 remote processor driver. - -After this service starts, Linux exposes the RPMsg device used by the Cortex-A web app. The Flask code waits for `/dev/ttyRPMSG*`, writes the `.pte` file to `0xC0000000`, writes the input tensor to `0xC036D000`, sends `RUN\n` over RPMsg, and parses the `CM33:` response lines into top-1 and top-5 ImageNet results. - -## Add the web application service - -The web application service extends `webapp/compose.yaml` from the root Compose file: - -```yaml -services: - webapp: - platform: linux/arm64 - extends: - file: webapp/compose.yaml - service: webapp - depends_on: - - cm33-runner -``` - -The extended service is privileged and mounts `/sys` and `/dev`: - -```yaml -services: - webapp: - privileged: true - ports: - - "${WEBAPP_PORT:-3001}:3000" - volumes: - - /sys:/sys - - /dev:/dev -``` - -Those mounts are required because the app checks `/proc/device-tree`, reads remoteproc state through `/sys/class/remoteproc`, talks to `/dev/ttyRPMSG*`, writes model and tensor data through `/dev/mem`, and checks for `/dev/ethosu0`. - -## Add Topo metadata - -After the Compose services are in place, add the root-level `x-topo` block: - -```yaml -x-topo: - name: "i.MX93 ExecuTorch runner" - description: "Runs a Cortex-A web application that sends image inference commands to a resident CM33 ExecuTorch runner over RPMsg." - features: - - "remoteproc-runtime" -``` - -Keep `x-topo` at the root of `compose.yaml`, not under `services`. The `features` entry is what tells Topo this Template needs a target with Remoteproc Runtime support. That is why `topo health` checks for: - -```output -Remoteproc Runtime: ✅ (remoteproc-runtime) -Remoteproc Shim: ✅ (containerd-shim-remoteproc-v1) -Subsystem Driver (remoteproc): ✅ (imx-rproc) -``` - -You can also add a deployment success message so users know exactly what to do after deployment: - -```yaml -x-topo: - deployment_success_message: | - The i.MX93 ExecuTorch runner is deployed. - Open http://:3001 and classify an ImageNet image. -``` - -## Expose project configuration - -Topo arguments are metadata for project parameters. Compose still carries the values into the build. - -The current Template exposes optional cache image parameters: - -```yaml -x-topo: - args: - EXECUTORCH_BASE_CACHE_IMAGE: - description: Optional GHCR image used as a BuildKit cache source for the ExecuTorch PTE build. - required: false - default: ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04 - IMX93_RUNNER_BUILD_CACHE_IMAGE: - description: Optional GHCR image used as a BuildKit cache source for the CM33 runner build. - required: false - default: ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04 -``` - -Those values are used by Compose interpolation in `build.cache_from`: - -```yaml -cache_from: - - ${EXECUTORCH_BASE_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04} -``` - -For build-time configuration, wire Topo arguments into standard Compose `build.args`. The runner Dockerfile already declares project-specific arguments for the MCUX SDK manifest: - -```Dockerfile -ARG MCUXSDK_MANIFEST_URL=https://github.com/nxp-mcuxpresso/mcuxsdk-manifests.git -ARG MCUXSDK_MANIFEST_REV=v25.09.00 -``` - -To expose the SDK revision through Topo, add matching Compose build args to the services that build `runner-base` descendants: - -```yaml -services: - runner-artifacts: - build: - args: - MCUXSDK_MANIFEST_REV: ${MCUXSDK_MANIFEST_REV:-v25.09.00} - - cm33-runner: - build: - args: - MCUXSDK_MANIFEST_REV: ${MCUXSDK_MANIFEST_REV:-v25.09.00} - -x-topo: - args: - MCUXSDK_MANIFEST_REV: - description: MCUX SDK manifest revision used to build the Cortex-M33 runner. - required: false - default: v25.09.00 -``` - -With that wiring, Topo can prompt for the value when the Template is cloned or extended, Compose passes the value into Docker BuildKit, and the Dockerfile consumes it through `ARG MCUXSDK_MANIFEST_REV`. - -Use this only for configuration that should be chosen at Template setup time. Runtime-only settings, such as `WEBAPP_PORT`, should remain normal Compose environment interpolation unless you intentionally want Topo to collect them as build-time parameters. - -## Lint the Template - -Before publishing the Template, validate the root Compose file: - -```bash -check-jsonschema \ - --schemafile ../topo-template-format/schema/topo-template-format.json \ - compose.yaml -``` - -Then review the Template the same way Topo Template linting does: - -- The Template root contains `compose.yaml`. -- `compose.yaml` contains a root-level `x-topo.name`. -- Non-remoteproc services set `platform: linux/arm64`. -- The `cm33-runner` service uses `runtime: io.containerd.remoteproc.v1` and `remoteproc.name: imx-rproc`. -- `x-topo.description` matches the README and the actual Cortex-A to Cortex-M33 RPMsg flow. -- `x-topo.features` includes `remoteproc-runtime`. -- `x-topo.args` entries are either consumed through Compose interpolation, such as the cache image values, or wired into `services..build.args` and declared as Dockerfile `ARG` instructions. -- `deployment_success_message` tells the user to open the web app on the configured target port. - -## What you've accomplished - -You now understand how the `topo-imx93-npu-deployment` Template is built from ordinary Compose services plus Topo metadata: artifact-only build stages produce the model and firmware, Remoteproc Runtime starts the Cortex-M33 ELF, RPMsg connects the processors at runtime, and `x-topo.args` provides a path for setup-time configuration without replacing Docker or Compose. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md deleted file mode 100644 index 1baa3282ca..0000000000 --- a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration using Topo -weight: 2 - -### FIXED, DO NOT MODIFY -layout: learningpathall ---- - -## Get started - -Before getting started, complete the Learning Path [Deploy containerized workloads to Arm-based Linux targets with Topo](/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/) to learn how to install Topo, run host and target health checks, inspect a target, list compatible Templates, and deploy a containerized workload. - -For more background on the underlying NPU example, read [Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration](/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/). You do not need to complete that Learning Path before using this one, but it helps explain the model, firmware, and [Ethos-U65](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u65) execution flow. - -## What is Topo? - -[Topo](https://github.com/arm/topo) is an open-source command-line tool developed by Arm used to deploy projects to an Arm-based Linux target over SSH. Topo builds container images on the host, transfers them to the target, and starts the services on the target. Topo can also build and deploy directly on the target. - -## What you'll learn - -In this Learning Path, you will deploy the [topo-imx93-npu-deployment](https://github.com/Arm-Examples/topo-imx93-npu-deployment) Topo Template to an NXP FRDM i.MX 93 board. - -The Template builds and deploys a browser-based MobileNetV2 image classifier. The user interface runs on the Cortex-A Linux side of the SoC. The inference runner is packaged as Cortex-M33 firmware and is started by [remoteproc-runtime](https://github.com/arm/remoteproc-runtime). The model is exported to an [ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) `.pte` [file](https://docs.pytorch.org/executorch/stable/pte-file-format.html) for Ethos-U65 NPU acceleration. - -### What does deploying the topo-imx93-npu-deployment Template do? - -Deploying the Template starts two runtime services on the target: - -- `webapp`: Web application running on the Cortex-A Linux host. It receives an image to run a classification on. -- `cm33-runner`: Cortex-M33 firmware, receives the image to classify from the web application and runs the classification Machine Learning model on it. - -When you select an image in the browser and click **Classify**, the web application: - -1. Resizes and normalizes the image to classify into an input tensor compatible with the [MobileNetV2](https://arxiv.org/abs/1801.04381) model. -2. Writes the ExecuTorch program and input tensor into reserved physical memory. -3. Sends a `RUN` command to the Cortex-M33 runner over `RPMsg`. -4. Waits for the Cortex-M33 firmware to run inference using Ethos-U65 acceleration. -5. Displays the top-1 and top-5 ImageNet classification results in the browser. - -## System Architecture - -The deployed application spans three processing domains on the i.MX 93: - -- **Cortex-A Linux host**: runs Docker, Topo-deployed containers, the Flask web app, and the Linux `remoteproc` and `RPMsg` interfaces. -- **Cortex-M33 firmware domain**: runs the ExecuTorch runner firmware loaded by `remoteproc-runtime`. -- **Ethos-U65 NPU**: accelerates delegated neural network operators from the ExecuTorch MobileNetV2 program. - -The high-level data flow is: - -```output -Browser - | - v -Flask web application on Cortex-A Linux - | - | writes .pte and input tensor to reserved memory - | sends RUN over RPMsg - v -Cortex-M33 ExecuTorch runner firmware - | - | delegates supported operators - v -Ethos-U65 NPU - | - v -Cortex-M33 returns classification results over RPMsg - | - v -Browser displays ImageNet top-1 and top-5 results -``` - -## What you've accomplished and what's next - -You now understand what the Topo Template deploys and how the Cortex-A, Cortex-M33, and Ethos-U65 parts work together. Next, you will prepare the i.MX 93 target and deploy the Template with Topo. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.webp b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.webp deleted file mode 100644 index 2393184074..0000000000 Binary files a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.webp and /dev/null differ diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/_index.md similarity index 59% rename from content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md rename to content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/_index.md index d60d1c3340..f92d5bfdb6 100644 --- a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/_index.md @@ -1,30 +1,32 @@ --- -title: Deploy a machine learning model to an NPU-capable system with Topo +title: Deploy an ML application to the Ethos-U65 NPU on NXP FRDM i.MX 93 with Topo draft: true cascade: draft: true -description: Use Topo to deploy a web application on Cortex-A that triggers a MobileNetV2 image classifier running as Cortex-M firmware with Ethos-U65 NPU acceleration. +description: Use Topo to deploy a Cortex-A web application that sends MobileNetV2 image classification requests to Cortex-M33 firmware accelerated by the Ethos-U65 NPU. minutes_to_complete: 60 -who_is_this_for: This is an introductory topic for embedded, edge, and cloud software developers who want to deploy machine learning workloads to heterogeneous Arm-based Linux targets using Topo. +who_is_this_for: This is an introductory topic for embedded/edge software developers who want to deploy machine learning workloads to heterogeneous Arm-based Linux targets using Topo, including leveraging Arm Ethos-U NPUs. learning_objectives: - Explain how Topo deploys an application that spans Cortex-A, Cortex-M, and Ethos-U - - Prepare an NXP FRDM i.MX 93 board for remoteproc-runtime and shared-memory inference - - Clone and deploy the topo-imx93-npu-deployment template - - Describe how the Template is bootstrapped from Compose services, Remoteproc Runtime metadata, and Topo arguments - - Run image classification from a browser and verify that inference is executed by the Cortex-M33 firmware + - Deploy the topo-imx93-npu-deployment Template, which operates across Cortex-A, Cortex-M, and Ethos-U, to perform image classification using an ExecuTorch MobileNetV2 model + - Describe how the Template is bootstrapped from Compose services, Remoteproc Runtime metadata, and Topo arguments and follow this process yourself + - Understand how to take similar projects and create Topo Templates, including using Agent Skills prerequisites: - A host machine (x86 or Arm) with Linux, macOS, or Windows - - An NXP FRDM i.MX 93 target board accessible over SSH with root access - - Docker installed on the host and target. For installation steps, see [Install Docker](/install-guides/docker/). + - An NXP FRDM i.MX 93 target board with Linux setup, accessible over SSH with root access. To do this, see [Use Linux on the NXP FRDM i.MX 93 board](https://learn.arm.com/learning-paths/embedded-and-microcontrollers/linux-nxp-board/). + - Docker installed on the host and target. For installation steps, see [Install Docker](https://learn.arm.com/install-guides/docker/). + - At least 25 GB of free disk space on the host if you are building without cache images. + - The Device Tree Compiler (`dtc`) installed on the host. - lscpu installed on the target (pre-installed on most Linux distributions) - - Topo installed on the host. For installation steps, see [Deploy containerized workloads to Arm-based Linux targets with Topo](/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/). + - Topo installed on the host. For installation steps, see [Deploy containerized workloads to Arm-based Linux targets with Topo](https://learn.arm.com/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/). - Basic familiarity with containers, SSH, and CLI tools + - (Optional) Access to an Agent, such as Codex, or Claude Code author: Tomas Agustin Gonzalez Orlando @@ -50,13 +52,6 @@ operatingsystems: - macOS - Windows -### Cross-platform metadata only -shared_path: true -shared_between: - - servers-and-cloud-computing - - laptops-and-desktops - - embedded-and-microcontrollers - further_reading: - resource: title: Topo repository diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_next-steps.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/_next-steps.md similarity index 100% rename from content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_next-steps.md rename to content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/_next-steps.md diff --git a/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/build-the-template.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/build-the-template.md new file mode 100644 index 0000000000..537d1c8c46 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/build-the-template.md @@ -0,0 +1,410 @@ +--- +title: Build the Topo Template from scratch +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## What you will build + +In this section, you will build the `topo-imx93-npu-deployment` Template starting from two non-Topo, non-Compose projects: + +- a Cortex-A web application that prepares images, writes model and tensor data into shared memory, and sends inference commands over `RPMsg` +- a Cortex-M33 ExecuTorch runner firmware project for the FRDM i.MX 93 + +You will combine those sources into one repository, then make the repository a normal Compose project, and only then add the Topo metadata and Remoteproc Runtime services. + +## Create the repository from the base projects + +We will copy the original base projects from the Topo Template. Clone the Topo Template Format repository for the validation schema, clone the original Topo Template for the source files, and start a new empty repository: + +```bash +git clone https://github.com/arm/topo-template-format.git +git clone https://github.com/Arm-Examples/topo-imx93-npu-deployment.git +mkdir new-topo-npu-template +cd new-topo-npu-template +``` + +Create the project layout: + +```bash +mkdir -p webapp executorch-runner licenses +``` + +Copy over the relevant `webapp` files: + +```bash +cp -R ../topo-imx93-npu-deployment/webapp/src webapp +``` + +Copy the Cortex-M33 runner build inputs from the firmware project: + +```bash +cp ../topo-imx93-npu-deployment/executorch-runner/build-runner.sh executorch-runner/build-runner.sh +cp ../topo-imx93-npu-deployment/executorch-runner/export_mv2_imx93.py executorch-runner/export_mv2_imx93.py +cp ../topo-imx93-npu-deployment/executorch-runner/docker-entrypoint.sh executorch-runner/docker-entrypoint.sh +cp -R ../topo-imx93-npu-deployment/executorch-runner/patches executorch-runner +``` + +Add the licenses and ignore rules used by the source projects: + +```bash +cp ../topo-imx93-npu-deployment/LICENSE.md . +cp -R ../topo-imx93-npu-deployment/licenses . +cp ../topo-imx93-npu-deployment/.gitignore . +``` + +We have now obtained a typical starting point. We have two sets of source code, combined into one repository. It is not a Compose project and it is not a Topo Template. We will now create a Compose project and Topo Template around the source code. + +The Compose project provides the container build and runtime structure. A Dockerfile describes how to build one image. A Compose file describes the services that use those images, their build contexts, ports, volumes, dependencies, and runtime settings. In this Template: + +- `webapp/Dockerfile` builds the Flask image. +- `webapp/compose.yaml` keeps the web app's build context and Linux runtime settings close to the web app source. +- `executorch-runner/Dockerfile` builds the ExecuTorch `.pte` model and Cortex-M33 runner ELF through multi-stage Docker builds. +- the root `compose.yaml` is the Template entry point. It combines the web app, artifact build services, the Remoteproc Runtime service, and the root-level `x-topo` metadata. + +For a general introduction to Compose projects, services, and the `compose.yaml` file, see Docker's [How Compose works](https://docs.docker.com/compose/intro/compose-application-model/) documentation. + +When a step below says to create a file, paste the complete file contents shown. When a step says to add or update part of an existing Compose file, merge the YAML into the existing top-level key shown by the snippet. For example, if a snippet starts with `services:`, add the named service under the existing top-level `services:` map. Do not create a second `services:` block in the same file. + +## Turn the sources into a Compose project + +Before adding Topo metadata, make the project work as ordinary Compose. Start by containerizing the Cortex-A web application. + +Create `webapp/Dockerfile` with the following complete contents: + +```Dockerfile +FROM python:3.12-slim + +WORKDIR /app + +ENV PYTHONUNBUFFERED=1 + +RUN python -m pip install --no-cache-dir flask==3.0.3 + +COPY src/data/imagenet_classes.txt /opt/mv2-imx93/imagenet_classes.txt +COPY src/app.py . +COPY src/templates/ templates/ +COPY src/static/ static/ + +EXPOSE 3000 + +CMD ["python", "app.py"] +``` + +Create `webapp/compose.yaml` with the following complete contents: + +```yaml +services: + webapp: + platform: linux/arm64 + build: + context: . + dockerfile: Dockerfile + privileged: true + ports: + - "${WEBAPP_PORT:-3001}:3000" + volumes: + - /sys:/sys + - /dev:/dev + restart: unless-stopped +``` + +Create the root `compose.yaml` with the following complete contents: + +```yaml +services: + webapp: + platform: linux/arm64 + extends: + file: webapp/compose.yaml + service: webapp +``` + +Check that Compose can read the project: + +```bash +docker compose config +``` + +You should see output that includes the resolved `webapp` service: + +```output +services: + webapp: + build: + context: /path/to/new-topo-npu-template/webapp + dockerfile: Dockerfile + ports: + - mode: ingress + target: 3000 + published: "3001" +``` + +At this point, Compose can build and run the Cortex-A web application as a normal Linux container. The image runs `webapp/src/app.py`, packages the Jinja templates from `webapp/src/templates/`, the static assets from `webapp/src/static/`, and the ImageNet labels from `webapp/src/data/imagenet_classes.txt`. The container listens on port `3000`, and Compose publishes it on host port `3001` unless you set `WEBAPP_PORT` to another value. + +## Add the ExecuTorch artifact pipeline + +The web application needs an ExecuTorch `.pte` model, and the target needs a Cortex-M33 ELF image. Both artifacts are built by `executorch-runner/Dockerfile`. + +Copy the Dockerfile into the runner build context: + +```bash +cp ../topo-imx93-npu-deployment/executorch-runner/Dockerfile executorch-runner/ +``` + +For this multi-stage Dockerfile: + +- `build-base`: installs the common Ubuntu build tools. +- `executorch-base`: clones ExecuTorch, installs the Arm backend dependencies, and copies `export_mv2_imx93.py` and `docker-entrypoint.sh`. +- `pte-builder`: exports `mv2_ethosu65_256.pte`. +- `pte-artifacts`: packages the `.pte` file as a BuildKit artifact context. +- `runner-base`: installs the Arm GNU toolchain, MCUX SDK, RPMsg-Lite dependencies, runner sources, and local patches. +- `runner-builder`: builds `executorch_runner_cm33.elf`. +- `runner-artifacts`: packages the ELF for inspection or reuse. +- `runner-runtime`: produces a `scratch` image whose entrypoint is the ELF file. + +The important artifact stages look like this: + +```Dockerfile +FROM busybox:1.36 AS pte-artifacts +COPY --from=pte-builder /workspace/build/mv2-imx93/mv2_ethosu65_256.pte /artifacts/mv2_ethosu65_256.pte + +FROM busybox:1.36 AS runner-artifacts +COPY --from=runner-builder /artifacts/executorch_runner_cm33.elf /artifacts/executorch_runner_cm33.elf + +FROM scratch AS runner-runtime +COPY --from=runner-builder /artifacts/executorch_runner_cm33.elf /executorch_runner_cm33.elf +ENTRYPOINT ["/executorch_runner_cm33.elf"] +``` + +## Connect the artifact services + +Add `pte-artifacts` and `runner-artifacts` as siblings of the existing `webapp` service in the root `compose.yaml`: + +```yaml +services: + pte-artifacts: + platform: linux/arm64 + scale: 0 + build: + context: executorch-runner + dockerfile: Dockerfile + target: pte-artifacts + cache_from: + - ${EXECUTORCH_BASE_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04} + + runner-artifacts: + platform: linux/arm64 + scale: 0 + build: + context: executorch-runner + dockerfile: Dockerfile + target: runner-artifacts + cache_from: + - ${IMX93_RUNNER_BUILD_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04} +``` + +Do not replace the existing root `webapp` service with the snippet above. The root file should now have three service names under the same top-level `services:` map: `webapp`, `pte-artifacts`, and `runner-artifacts`. + +These services are used only to build artifacts. They do not run as part of the deployed application. `scale: 0` tells Compose not to start containers for them, while still allowing other services to copy files from their build outputs. + +Replace `webapp/compose.yaml` with the following version so the Flask image imports the `.pte` artifact: + +```yaml +services: + webapp: + platform: linux/arm64 + build: + context: . + dockerfile: Dockerfile + additional_contexts: + pte_artifacts: service:pte-artifacts + privileged: true + ports: + - "${WEBAPP_PORT:-3001}:3000" + volumes: + - /sys:/sys + - /dev:/dev + restart: unless-stopped +``` + +Then add the `.pte` copy line to `webapp/Dockerfile` with the other `COPY` commands: + +```Dockerfile +COPY --from=pte_artifacts /artifacts/mv2_ethosu65_256.pte /opt/mv2-imx93/mv2_ethosu65_256.pte +``` + +The `/opt/mv2-imx93/` path is the location the Flask application expects for its MobileNetV2 support files. At run time, the app reads the `.pte` file from this path before copying it into reserved memory for the Cortex-M33 runner. + +## Add the Remoteproc Runtime service + +Add the Cortex-M33 runner as another sibling under the top-level `services:` map in the root `compose.yaml`: + +```yaml +services: + cm33-runner: + platform: linux/arm64 + build: + context: executorch-runner + dockerfile: Dockerfile + target: runner-runtime + cache_from: + - ${IMX93_RUNNER_BUILD_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04} + runtime: io.containerd.remoteproc.v1 + annotations: + remoteproc.name: imx-rproc +``` + +Keep the existing `webapp`, `pte-artifacts`, and `runner-artifacts` services in the same file. This step adds one more service; it does not replace any of the previous services. + +This is the heterogeneous deployment hook. Docker still builds an image, but the service is not started as a Linux userspace process. The runtime `io.containerd.remoteproc.v1` selects Remoteproc Runtime, and the `remoteproc.name` annotation tells the shim to use the i.MX remote processor driver. + +Update the existing root `webapp` service so it depends on the CM33 runner and passes the cache image values into the build. Keep the existing `extends` block, then add `depends_on` and `build.args` as shown: + +```yaml +services: + webapp: + platform: linux/arm64 + extends: + file: webapp/compose.yaml + service: webapp + depends_on: + - cm33-runner + build: + args: + EXECUTORCH_BASE_CACHE_IMAGE: ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04 + IMX93_RUNNER_BUILD_CACHE_IMAGE: ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04 +``` + +The web app is privileged and mounts `/sys` and `/dev` because it checks the device tree, reads remoteproc state through `/sys/class/remoteproc`, talks to `/dev/ttyRPMSG*`, writes shared memory through `/dev/mem`, and checks for `/dev/ethosu0`. + +Keep the web app build context in `webapp/compose.yaml`. The root `webapp.build.args` block above only supplies Topo-collected build arguments; it should not replace the extended build context and Dockerfile from `webapp/compose.yaml`. + +## Add Topo metadata and arguments + +After the Compose services are complete, add the root-level `x-topo` block. +Keep it at the root of `compose.yaml`, as a sibling of `services`, not under `services`. + +If you want to use an agent skill to perform this step, skip to the optional step below. + +```yaml +x-topo: + name: "i.MX93 ExecuTorch runner" + description: "Runs a Cortex-A web application that sends image inference commands to a resident CM33 ExecuTorch runner over RPMsg." + features: + - "remoteproc-runtime" + args: + EXECUTORCH_BASE_CACHE_IMAGE: + description: Optional GHCR image used as a BuildKit cache source for the ExecuTorch PTE build. + required: false + default: ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04 + IMX93_RUNNER_BUILD_CACHE_IMAGE: + description: Optional GHCR image used as a BuildKit cache source for the CM33 runner build. + required: false + default: ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04 +``` + +The `features` value tells Topo that this Template requires `remoteproc-runtime` support on the target. This is useful when checking for project compatibility with the `topo templates --target ` command. + +The `args` entries describe configurable build inputs. Compose consumes those values through the `cache_from` interpolation you added earlier: + +```output +cache_from: + - ${EXECUTORCH_BASE_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04} +``` + +The root `webapp.build.args` block also makes the Topo-provided values visible in the Compose build model while preserving the `webapp/` build context inherited through `extends`. + +Keep runtime settings such as `WEBAPP_PORT` as normal Compose interpolation unless you intentionally want Topo to collect them as Template setup arguments. + +## (Optional) Use an Agent Skill to add the Topo metadata + +The [Topo Template Format](https://github.com/arm/topo-template-format) repository includes public authoring skills for agents that support skill installation: + +- `topo-template-context`: provides Topo and Topo Template reference context for `x-topo` metadata, schema, docs, and CLI Template behavior. +- `topo-template-bootstrap`: converts a Compose repository into a Topo Template by adding or improving `compose.yaml` and `x-topo` metadata. +- `topo-template-lint`: reviews a Topo Template for schema correctness, metadata consistency, deployment success messages, and build argument wiring. + +Install the skills with `npx skills`: + +```bash +npx skills add arm/topo-template-format +``` + +If your agent does not use `npx skills`, manually copy or symlink the directories under `../topo-template-format/skills/` into your agent's skills directory. + +Restart your agent after installing or updating the skills. + +From the root of the Compose project, ask your agent to use `topo-template-bootstrap`: + +``` +Use topo-template-bootstrap on this repository. +Treat the root compose.yaml as the Template root. +Preserve plain docker compose behavior. +Add x-topo metadata only where it reflects the actual services, hardware requirements, and build arguments. +``` + +After bootstrap, ask the agent to use `topo-template-lint`: + +``` +Use topo-template-lint on this repository. +Validate compose.yaml against the Topo Template Format schema. +Check README alignment, Remoteproc Runtime metadata, and x-topo.args wiring. +``` + +## Validate the final Template + +Check the Compose model and check that the Topo metadata is present: + +```bash +docker compose config +``` + +In the `docker compose config` output, check that the resolved `webapp` service has: + +- `build.context` ending in `/webapp` +- `build.dockerfile` set to `Dockerfile` +- `build.additional_contexts.pte_artifacts` set to `service:pte-artifacts` + +Install `check-jsonschema` if it is not already available: + +{{< tabpane code=true >}} + {{< tab header="macOS" language="shell" >}} +brew install check-jsonschema + {{< /tab >}} + {{< tab header="Linux / WSL" language="shell" >}} +sudo apt update +sudo apt install -y pipx +pipx ensurepath +pipx install check-jsonschema +export PATH="$HOME/.local/bin:$PATH" + {{< /tab >}} +{{< /tabpane >}} + +Validate the root Compose file with the schema in the Topo Template Format: + +```bash +check-jsonschema \ + --schemafile ../topo-template-format/schema/topo-template-format.json \ + compose.yaml +``` + +Review these points: + +- `compose.yaml` contains root-level `x-topo` metadata. +- `x-topo.features` includes `remoteproc-runtime`. +- non-remoteproc services set `platform: linux/arm64`. +- `pte-artifacts` and `runner-artifacts` use `scale: 0`. +- `cm33-runner` uses `runtime: io.containerd.remoteproc.v1`. +- `cm33-runner` has `remoteproc.name: imx-rproc`. +- `webapp` depends on `cm33-runner`. +- `webapp` imports the `.pte` file through `additional_contexts`. +- every `x-topo.args` entry is consumed by Compose interpolation. + +## What you've accomplished and what's next + +You started with two non-Topo, non-Compose projects, made them a standard Compose project, and then converted that Compose project into a Topo Template. You created the web app image, added artifact builds for the ExecuTorch `.pte` model and Cortex-M33 ELF, packaged the firmware as a Remoteproc Runtime service, and exposed the build cache inputs as Topo arguments. + +Next, you will prepare the FRDM i.MX 93 target, deploy the Template with Topo, and run the image classification application. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/deploy.md similarity index 66% rename from content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md rename to content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/deploy.md index 659e2aa797..e9470d222d 100644 --- a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md +++ b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/deploy.md @@ -1,6 +1,6 @@ --- -title: Deploy the project -weight: 3 +title: Clone and deploy the application with Topo +weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall @@ -38,6 +38,7 @@ Hardware Info: ✅ (lscpu) Subsystem Driver (remoteproc): ✅ (imx-rproc) ``` +{{% notice Note %}} If `remoteproc-runtime` is missing, install it with Topo: ```bash @@ -49,14 +50,15 @@ Run the health check again: ```bash topo health --target @ ``` +{{% /notice %}} ## Reserve memory in the device tree -The web application and Cortex-M33 firmware exchange data through reserved physical memory. The target device tree must reserve memory for the model/input buffer and for Ethos-U65. You are now going to modify the device tree and reboot the target so that these modifications take effect. +The web application and Cortex-M33 firmware exchange data through reserved physical memory. The target device tree must reserve memory for the model/input buffer and for the Ethos-U65. This prevents Linux from allocating memory that the Cortex-M33 firmware and Ethos-U65 need to access by physical address. You are now going to modify the device tree and reboot the target so that these modifications take effect. {{% notice Warning %}} Back up the board's original device tree before modifying it. The exact boot partition can differ between Linux images, so check the paths on your board before copying files. -{{< /notice >}} +{{% /notice %}} On your host, create a working directory and dump the live device tree from the target: @@ -66,7 +68,7 @@ ssh @ 'cat /sys/firmware/fdt' > devicetree/live.dtb dtc -I dtb -O dts -o devicetree/live.dts devicetree/live.dtb ``` -Open `devicetree/live.dts` in an editor. +Open `devicetree/live.dts` in a text editor of your choice. Under `remoteproc-cm33`, add the CM33 power domain if it is not already present: @@ -100,7 +102,7 @@ Add `iomem=relaxed` to `chosen.bootargs`. For example: bootargs = "clk-imx93.mcore_booted console=ttyLP0,115200 earlycon root=/dev/mmcblk1p2 rootwait rw iomem=relaxed"; ``` -Build the patched device tree: +Return to your host machine terminal and build the patched device tree: ```bash dtc -I dts -O dtb -o devicetree/patched.dtb devicetree/live.dts @@ -124,36 +126,37 @@ sync reboot ``` -After the board reboots, run the Topo health check again from the host: +After the board reboots, run the Topo health check again from the host and verify everything is still correct: ```bash topo health --target @ ``` -## Clone the Template +## Deploy to the board -Clone the Template onto your host: +You can choose to deploy from the original Template, or from the Template you built from scratch. If you have not already cloned the original Template, clone it now: ```bash topo clone https://github.com/Arm-Examples/topo-imx93-npu-deployment.git ``` -Topo prompts for optional build cache image arguments: +Topo prompts for optional build cache image arguments. Accept the defaults unless you have your own cache images. -```output -EXECUTORCH_BASE_CACHE_IMAGE -IMX93_RUNNER_BUILD_CACHE_IMAGE -``` +Then `cd` into the correct directory: -Accept the defaults unless you have your own cache images. +```bash +cd topo-imx93-npu-deployment +``` -Enter the project directory: +Or: ```bash -cd topo-imx93-npu-deployment +cd new-topo-npu-template ``` -## Deploy to the board +{{% notice Note %}} +If not pulling from the cache, the first build can take a long time and requires about 25 GB of free disk space. It downloads and builds ExecuTorch, the Arm GNU toolchain, MCUX SDK components, RPMsg-Lite, and the Cortex-M33 runner sources. Later builds are faster when Docker can reuse local cache layers or import the configured GHCR cache layers. +{{% /notice %}} Deploy the project to your target: @@ -161,8 +164,6 @@ Deploy the project to your target: topo deploy --target @ ``` -If not pulling from the cache, the first build can take a long time and requires about 25 GB of free disk space. It downloads and builds ExecuTorch, the Arm GNU toolchain, MCUX SDK components, RPMsg-Lite, and the Cortex-M33 runner sources. Later builds are faster when Docker can reuse local cache layers or import the configured GHCR cache layers. - During deployment, Topo builds the required images, transfers them to the target, starts the Cortex-M33 firmware through `remoteproc-runtime`, and starts the web application. When deployment succeeds, the output includes a successful service startup. You can also check the deployed services: @@ -171,24 +172,23 @@ When deployment succeeds, the output includes a successful service startup. You topo ps --target @ ``` -## Open the web application - -Open the web application in a browser: +Your output should show a process on both the Cortex-M33, and the Linux Host, similar to below: ```output -http://:3001 +Image Status Processing Domain Address +topo-imx93-npu-deployment-cm33-runner Up 50 minutes imx-rproc +topo-imx93-npu-deployment-webapp Up 50 minutes Linux Host imx93-scorpio.cambridge.arm.com:3001, [::]:3001% ``` -The application shows: +## Open the web application -- an image selector -- a **Classify** button -- board prerequisite checks -- classification results -- an expandable analysis section with runtime details +Open the web application in a browser: -Select an image from an ImageNet-supported class, then click **Classify**. A successful run returns top-1 and top-5 ImageNet classifications. +``` +http://:3001 +``` +{{% notice Note %}} If you need to use a different target port, set `WEBAPP_PORT` when deploying: ```bash @@ -200,13 +200,32 @@ Then open: ```output http://:3002 ``` +{{% /notice %}} + +The application shows: + +- an image selector +- a **Classify** button +- board prerequisite checks +- classification results +- an expandable analysis section with runtime details You should see something similar to: +![Screenshot of the web interface running on an Arm-based target, showing an image and the model response. This confirms successful deployment and provides a visual reference for the expected result.#center](topo_npu_classifier.png "Image Classification Web App showing correctly classified German Shepherd") + +When you select an image in the browser and click **Classify**, the web application: -![Screenshot of the web interface running on an Arm-based target, showing an image and the model response. This confirms successful deployment and provides a visual reference for the expected result.#center](topo_npu_classifier.webp "Image classification as seen in the web app") +1. Resizes and normalizes the image to classify into an input tensor compatible with the [MobileNetV2](https://arxiv.org/abs/1801.04381) model. +2. Writes the ExecuTorch `.pte` program and input tensor into reserved physical memory. +3. Sends a `RUN` command to the Cortex-M33 runner over `RPMsg`. +4. Waits for the Cortex-M33 firmware to run inference using Ethos-U65 acceleration. +5. Displays the top-1 and top-5 ImageNet classification results in the browser. + +Try this out with an image from an ImageNet-supported class. ## What you've accomplished -You have prepared an FRDM i.MX 93 board for shared-memory NPU inference, deployed the `topo-imx93-npu-deployment` Template with Topo, started Cortex-M33 firmware through `remoteproc-runtime`, and used a browser-based application to run MobileNetV2 classification with Ethos-U65 acceleration. -Next, you will review how this project is structured as a Topo Template. +You have prepared an FRDM i.MX 93 board for shared-memory NPU inference, deployed the `topo-imx93-npu-deployment` Template with Topo, started Cortex-M33 firmware through `remoteproc-runtime`, and used a browser-based application to stage the ExecuTorch `.pte` program and input tensor for MobileNetV2 classification with Ethos-U65 acceleration. + +You can now use the deployed application as a reference for your own heterogeneous Arm applications, or adapt the model, firmware runner, web interface, or Topo metadata for another target. diff --git a/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/overview.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/overview.md new file mode 100644 index 0000000000..dd27918320 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/overview.md @@ -0,0 +1,73 @@ +--- +title: Overview - deploying an image classification app on i.MX 93 with Topo +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## What you'll learn + +In this Learning Path, you will deploy the [topo-imx93-npu-deployment](https://github.com/Arm-Examples/topo-imx93-npu-deployment) Topo Template to an NXP FRDM i.MX 93 board, and understand how this Topo Template was created. + +To refresh, [Topo](https://github.com/arm/topo) is an open-source command-line tool developed by Arm used to deploy projects to an Arm-based Linux target over SSH. Topo builds container images on the host, transfers them to the target, and starts the services on the target. Topo Templates are the standardized format by which projects are deployed with Topo. + +The Topo Template builds and deploys a browser-based MobileNetV2 image classifier. The user interface runs on the Cortex-A (Linux) side of the SoC. The inference runner is packaged as Cortex-M33 firmware and is started by [remoteproc-runtime](https://github.com/arm/remoteproc-runtime). The model is exported to an [ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) `.pte` [file](https://docs.pytorch.org/executorch/stable/pte-file-format.html) for Ethos-U65 NPU acceleration. + +## Prerequisites + +Before getting started, ensure that your i.MX 93 board is set up with Linux and accessible over SSH. You can use this Learning Path as a guide: [Use Linux on the NXP FRDM i.MX 93 board](https://learn.arm.com/learning-paths/embedded-and-microcontrollers/linux-nxp-board/). + +You should also be familiar with Topo and have it installed on your host development machine. You can complete the Learning Path [Deploy containerized workloads to Arm-based Linux targets with Topo](https://learn.arm.com/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/) to learn how to install Topo, run host and target health checks, inspect a target, list compatible Templates, and deploy a containerized workload. + +## (Optional) Background reading + +To understand more about Topo Templates, and how to create a basic Topo Template for a web application, you can complete the introductory [Create and deploy a custom Topo Template](https://learn.arm.com/learning-paths/cross-platform/create-your-own-topo-templates/) Learning Path. However, this is not required for this guide. + +For more background on the underlying NPU example, use the [Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration](https://learn.arm.com/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/) Learning Path. This can help explain the model, firmware, and [Ethos-U65](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u65) execution flow. + +## What does the Template do? + +Deploying the Template starts two runtime services on the target: + +- `webapp`: Web application running on the Cortex-A Linux host. It receives an image input from the user and outputs the results of the ML image classification. +- `cm33-runner`: Cortex-M33 firmware that receives the image tensor from the web application, runs the compiled MobileNetV2 ExecuTorch `.pte` program, delegates supported operators to the Ethos-U65 NPU, and runs non-delegated operators on the Cortex-M33 CPU. + +## System Architecture + +The deployed application spans three processing domains on the i.MX 93: + +- **Cortex-A Linux host**: runs Docker, Topo-deployed containers, the Flask web app, and the Linux `remoteproc` and `RPMsg` interfaces. +- **Cortex-M33 firmware domain**: runs the ExecuTorch runner firmware loaded by `remoteproc-runtime`. +- **Ethos-U65 NPU**: accelerates delegated neural network operators from the ExecuTorch MobileNetV2 program. + +The high-level data flow is: + +```output +Browser + | + v +Flask web application on Cortex-A Linux + | + | writes .pte file and input tensor to reserved memory + | sends RUN over RPMsg + v +Cortex-M33 ExecuTorch runner firmware + | + | loads the .pte program from reserved memory + | delegates supported operators + v +Ethos-U65 NPU + | + v +Cortex-M33 returns classification results over RPMsg + | + v +Browser displays ImageNet top-1 and top-5 results +``` + +## What you've accomplished and what's next + +You now understand that the Topo Template deploys a Cortex-A web application, a Cortex-M33 ExecuTorch runner, and Ethos-U65 NPU acceleration as one heterogeneous application. You have also seen how inference uses reserved memory for the `.pte` program and input tensor, with `RPMsg` carrying commands and results between Cortex-A and Cortex-M33. + +Next, you will review the toolchains and runtime interfaces used by the Template. diff --git a/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png new file mode 100644 index 0000000000..f744e12c64 Binary files /dev/null and b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png differ diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md similarity index 71% rename from content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md rename to content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md index 0ffefc32f7..d5b9d40f0e 100644 --- a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md +++ b/content/learning-paths/embedded-and-microcontrollers/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md @@ -1,6 +1,6 @@ --- title: Understand the toolchains -weight: 5 +weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall @@ -12,12 +12,14 @@ The `topo-imx93-npu-deployment` Template combines several toolchains. Topo hides ## ExecuTorch -[ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) is the PyTorch Edge runtime for deploying PyTorch models to edge devices, using any acceleration hardware that is available on the target device. In this Template, ExecuTorch is used in two places: +[ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) is PyTorch's runtime for deploying PyTorch models to edge devices. By using different backends within ExecuTorch, you can target specific hardware. For example, you can target Ethos-U65 by using the Ethos-U backend. To learn more about how the MobileNetV2 model was exported from PyTorch to ExecuTorch, and delegated to the Ethos-U, look at [Build ExecuTorch models for Ethos-U65](https://learn.arm.com/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/7-build-executorch-pte/). + +In this Template, ExecuTorch is used in two places: - At build time, the Template exports a MobileNetV2 model to an ExecuTorch `.pte` program. - At run time, the Cortex-M33 firmware loads and executes that `.pte` program. -The export pipeline uses the ExecuTorch Arm backend and targets `ethos-u65-256`. The model is quantized and lowered so supported neural network operators can be delegated to the Ethos-U65 NPU. The generated file is: +The export pipeline targets `ethos-u65-256`, which means the Ethos-U65 has 256 multiply-accumulate (MAC) units. The model is quantized and lowered so supported neural network operators can be delegated to the Ethos-U65 NPU. The generated file is: ```output mv2_ethosu65_256.pte @@ -77,10 +79,12 @@ The web application checks these ranges at startup through `/proc/device-tree`. ## Web application -The `webapp` service is a Python Flask application. It serves the browser UI, preprocesses selected images, stages memory for the images sent to the Cortex-M33 runner, sends inference commands over `RPMsg`, and renders the ImageNet top-1 and top-5 results. +The `webapp` service is a Python Flask application. It serves the browser UI, preprocesses selected images, stages the .pte program and input tensor in reserved memory, sends inference commands over `RPMsg`, and renders the ImageNet top-1 and top-5 results. + +By default, the service publishes port `3001` on the target and forwards it to container port `3000`. -By default, the service maps target port `3001` to container port `3000`. +## What you've accomplished and what's next -## What you've accomplished +You now understand the major toolchains and runtime interfaces used by the Template: ExecuTorch, the Cortex-M33 firmware runner, remoteproc-runtime, RPMsg, reserved memory, and the Flask web application. You have also seen how the web application stages the `.pte` program and input data in reserved memory before sending inference commands to the Cortex-M33 firmware. -You now understand the major toolchains and runtime interfaces used by the Template: ExecuTorch, the Cortex-M33 firmware runner, remoteproc-runtime, RPMsg, reserved memory, and the Flask web application. +Next, you will build the Template from the base projects by adding the Compose services, build artifacts, Remoteproc Runtime metadata, and Topo arguments.