Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Overview of the Adler-32 algorithm and optimization approach
title: Understand the Adler-32 algorithm and optimization approach
weight: 2

### FIXED, DO NOT MODIFY
Expand All @@ -8,24 +8,24 @@ layout: learningpathall

## The optimization task

You'll take a simple scalar implementation of the Adler-32 checksum algorithm written in C and incrementally optimize it to use Arm Scalable Vector Extension (SVE) intrinsics. The final SVE version runs significantly faster than the original scalar code on Neoverse processors.
In this Learning Path, you'll take a simple scalar implementation of the Adler-32 checksum algorithm written in C and incrementally optimize it to use Arm Scalable Vector Extension (SVE) intrinsics. The final SVE version runs significantly faster than the original scalar code on Neoverse processors.

What makes this Learning Path different from a typical optimization tutorial is how you'll get there. Rather than being handed a finished SVE implementation, you'll use an AI coding assistant connected to the Arm MCP server to guide you through each step. You'll ask questions, look up intrinsics, understand the algorithm's constraints, and build the solution piece by piece.
This Learning Path is different from a typical optimization tutorial. Rather than starting with a finished SVE implementation, you'll use an AI coding assistant connected to the Arm MCP server to guide you through each step. You'll ask questions, look up intrinsics, understand the algorithm's constraints, and build the solution piece by piece.

AI coding assistants are not yet able to automatically generate optimized code, but you can use them to guide your learning and the implementation details. This way, you can maintain and explain the code and arrive at optimized solutions. This process mirrors what you'd do on your own projects.
AI coding assistants are not yet able to generate optimized code, but you can use them to guide your learning and the implementation details. By working this way, you can maintain and explain the code and arrive at optimized solutions, mirroring what you'd do on your own projects.

## The Adler-32 algorithm

Adler-32 is a checksum algorithm used to verify data integrity. It is used in the zlib compression format. It's fast, simple, and a good candidate for vectorization because its inner loop processes one byte at a time.
Adler-32 is a checksum algorithm used to verify data integrity. It's used in the zlib compression format. The algorithm is fast, simple, and a good candidate for vectorization because its inner loop processes one byte at a time.

The algorithm maintains two 16-bit accumulators, `A` and `B`:

- `A` starts at 1 and accumulates the sum of all input bytes
- `B` accumulates the running sum of all `A` values

Both are taken modulo 65521, the largest prime smaller than 2^16. The final checksum is `(B << 16) | A`.
Both are taken modulo 65521, the largest prime number smaller than 2^16. The final checksum is `(B << 16) | A`.

The scalar implementation is straightforward:
The scalar implementation is as follows:

```c
#define MOD_ADLER 65521
Expand All @@ -46,28 +46,20 @@ uint32_t adler32(const uint8_t *data, size_t len)

This loop has two characteristics that make it interesting to vectorize:

- The `a` accumulator is a simple sum that parallelizes well
- The `a` accumulator is a sum that parallelizes well
- The `b` accumulator depends on the running value of `a` after each byte, which makes it harder to vectorize

You'll learn how SVE intrinsics solve both of these challenges.

## The role of the Arm MCP server

The Arm MCP server gives your AI coding assistant access to Arm-specific knowledge, including the full SVE intrinsics reference. When you ask about specific intrinsics like `svdot` or `svwhilelt`, the assistant queries the MCP server and returns the exact function signature, pseudocode, and required compiler flags.
The Arm MCP server gives your AI coding assistant access to Arm-specific knowledge, including the full SVE intrinsics reference. When you ask about specific intrinsics such as `svdot` or `svwhilelt`, the assistant queries the MCP server and returns the exact function signature, pseudocode, and required compiler flags.

This means you don't need to keep opening the intrinsics reference material. You can ask questions in plain language and get precise, actionable answers grounded in the actual Arm documentation.
This means you don't need to keep referring to the intrinsics reference material. You can ask questions in plain language and get precise, actionable answers grounded in Arm documentation.

## Outline of each section

Each section follows a consistent pattern:
## What you've learned and what's next

1. A short explanation of what you need to understand at this stage
2. Suggested prompts to ask your AI assistant
3. An explanation of what to look for in the response
4. The code or configuration changes that result from the conversation
You now understand the Adler-32 algorithm and how you can use the Arm MCP server to optimize the algorithm to use SVE intrinsics.

You can follow along exactly, or adapt the prompts to your own style. The goal is to learn the process of using an AI assistant to apply SVE and achieve improved performance.

## What's next

Start by setting up the project and establishing a performance baseline for the scalar implementation. The baseline is required before you can measure any improvement.
Next, you'll set up the project and establish a performance baseline for the scalar implementation.
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ layout: learningpathall

## Before you begin

To get started, you need an Arm Linux system with SVE support. Suitable cloud instances include AWS Graviton3 or Graviton4, Microsoft Cobalt 100, or Google Axion. The examples in this Learning Path were tested on Ubuntu 26.04.
To get started, you need an Arm Linux system with SVE support. Suitable cloud instances can run on AWS Graviton3 or Graviton4, Microsoft Cobalt 100, or Google Axion. The examples in this Learning Path were tested on Ubuntu 26.04.

You also need an AI coding assistant with the Arm MCP server configured. Supported assistants include [GitHub Copilot](/install-guides/github-copilot/), [Kiro CLI](/install-guides/kiro-cli/), [Claude Code](/install-guides/claude-code/), [Gemini CLI](/install-guides/gemini/), and [Codex CLI](/install-guides/codex-cli/). See the [Arm MCP server Learning Path](/learning-paths/servers-and-cloud-computing/arm-mcp-server/) for setup instructions.
You also need an AI coding assistant with the Arm MCP server configured. Supported assistants include [GitHub Copilot](/install-guides/github-copilot/), [Kiro CLI](/install-guides/kiro-cli/), [Claude Code](/install-guides/claude-code/), [Gemini CLI](/install-guides/gemini/), and [Codex CLI](/install-guides/codex-cli/). For setup instructions, see the [Arm MCP server Learning Path](/learning-paths/servers-and-cloud-computing/arm-mcp-server/).

{{< notice Note >}}
The AI responses shown are samples. Your AI assistant may word responses differently, include more or less detail, or structure the output differently depending on the tool and model you are using. Focus on the key concepts rather than the exact wording.
The AI responses shown are samples. Your AI assistant's responses will vary in wording, detail, and structure depending on the tool and model you use. Focus on the key concepts rather than the exact wording.
{{< /notice >}}

Start by installing the required software and check your system includes SVE.
Start by installing the required software and checking your system includes SVE.

Install GCC and GNU Make:

Expand All @@ -40,7 +40,7 @@ If `sve` does not appear, the system does not support SVE and the final implemen

## Create the project files

On your Arm Neoverse system, create a working directory and add the source files.
On your Arm Neoverse system, create a working directory and add the source files:

```bash
mkdir adler32-sve && cd adler32-sve
Expand Down Expand Up @@ -177,10 +177,12 @@ clean:

The `-mcpu=native` flag tells GCC to optimize for the exact CPU you're running on, which enables SVE code generation on Neoverse processors that have SVE.

### ASK AI: about compiler flags
### Ask AI about compiler flags

Before running anything, ask your AI assistant to confirm that your build setup is correct for SVE.

Your prompt can be similar to:

```text
My Makefile uses `-O3 -mcpu=native`. Does this enable SVE code generation on a Neoverse processor? Do I need any special flags for SVE intrinsics?
```
Expand Down Expand Up @@ -213,11 +215,9 @@ For more on SVE programming, Arm has a good learning path: Port Code to Arm SVE
(https://learn.arm.com/learning-paths/servers-and-cloud-computing/sve/).
```

The response explains that `-mcpu=native` enables SVE. It also provides useful info about running on other systems and confirm special flags, such as `-march=armv8-a+sve` are not needed. The response also tells you to include `<arm_sve.h>`.

You also notice the reference to a Learning Path about SVE at the end. This confirms the Arm MCP server is consulted on answering the question.
The response explains that `-mcpu=native` enables SVE and provides useful information about running on other systems. It confirms that special flags, such as `-march=armv8-a+sve`, are not needed and also tells you to include `<arm_sve.h>`. All of this is information you'll need when you create the SVE source later.

This is good information you'll need when you create the SVE source later.
The SVE Learning Path reference at the end of the response confirms that the assistant used the Arm MCP server to answer your question.

## Build and run the baseline

Expand All @@ -240,11 +240,11 @@ Performance:
10 MB 10485760 bytes 10 iters 262.388 ms 381.1 MB/s checksum=0x285FF1B1
```

Your numbers will differ depending on your specific Neoverse processor and memory configuration. Make a note of the MB/s values for the 1 MB and 10 MB cases, as these are your baseline numbers to compare against after each optimization.
Your numbers will differ depending on your specific Neoverse processor and memory configuration. Note the MB/s values for the 1 MB and 10 MB cases, as these are your baseline numbers to compare against after each optimization.

### ASK AI: about auto-vectorization
### Ask AI about auto-vectorization

Now ask your AI assistant a question that many developers wonder about:
Ask your AI assistant about auto-vectorization. Your prompt can be similar to:

```text
Can GCC auto-vectorize my adler32 function with SVE if I just use `-mcpu=native`? What would prevent auto-vectorization?
Expand Down Expand Up @@ -277,16 +277,12 @@ No, GCC cannot auto-vectorize your adler32 function. It tried every vector mode
- Break the dependency — use vector lanes to accumulate a and b contributions independently, then reduce at the end.
```

The response explains that the modulo operation in every iteration (`% MOD_ADLER`) is the main blocker. The compiler can't easily prove that the intermediate values won't overflow in a way that changes the result when operations are reordered. The loop-carried dependency between iterations also makes it difficult.

Since auto-vectorization won't work, you need to restructure the algorithm before SVE can be applied effectively. The restructuring is explained in the next two sections.
The response explains that the modulo operation in every iteration (`% MOD_ADLER`) is the main blocker. The compiler can't easily prove that the intermediate values won't overflow in a way that changes the result when operations are reordered. The loop-carried dependency between iterations also makes it difficult to auto-vectorize.

## What you've learned and what's next
Because auto-vectorization won't work, you need to restructure the algorithm before you can apply SVE effectively. You'll learn more about the restructuring in the next two sections.

In this section:
## What you've accomplished and what's next

- You created the scalar Adler-32 implementation and benchmark harness
- You recorded your baseline performance numbers
- You learned that auto-vectorization won't work
You've now created the scalar Adler-32 implementation and benchmark harness, and recorded your baseline performance numbers. Using the Arm MCP server and your AI assistant of choice, you've learned that auto-vectorization won't work.

In the next section, you'll use the Arm MCP server to learn the core SVE concepts you need before writing any intrinsics code.
In the next section, you'll use your AI assistant and the Arm MCP server to learn core SVE concepts before writing intrinsics code.
Loading
Loading