Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Deploy Alluxio on Azure Cobalt 100 Arm64 virtual machines for data orchestration and caching

draft: true
cascade:
draft: true

description: Learn how to install and configure Alluxio on an Azure Cobalt 100 Arm64 virtual machine, integrate it with Apache Spark, enable data caching, and benchmark performance improvements for analytics workloads.


minutes_to_complete: 90

who_is_this_for: This is an introductory topic for developers, data engineers, and platform engineers who want to build high-performance data pipelines and analytics systems using Alluxio on Arm-based cloud environments.

learning_objectives:
- Install and configure Alluxio on Azure Cobalt 100 Arm64 virtual machines
- Configure data caching using Alluxio memory storage
- Integrate Alluxio with Apache Spark for analytics workloads
- Benchmark data access performance and understand caching benefits

prerequisites:
- A [Microsoft Azure account](https://azure.microsoft.com/) with access to Cobalt 100 based instances (Dpsv6)
- Basic knowledge of Linux command-line operations
- Familiarity with SSH and remote server access
- Basic understanding of data processing, storage systems, and caching concepts

author: Pareena Verma

### Tags
skilllevels: Introductory
subjects: Containers and Virtualization
cloud_service_providers:
- Microsoft Azure

armips:
- Neoverse

tools_software_languages:
- Alluxio
- Apache Spark
- Java

operatingsystems:
- Linux

further_reading:
- resource:
title: Alluxio Official Website
link: https://www.alluxio.io/
type: website
- resource:
title: Alluxio Documentation
link: https://docs.alluxio.io/
type: documentation
- resource:
title: Apache Spark Documentation
link: https://spark.apache.org/docs/latest/
type: documentation
- resource:
title: Azure Cobalt 100 processors
link: https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353
type: documentation

### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1
layout: "learningpathall"
learning_path_main_page: "yes"
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Understand Alluxio on Azure Cobalt 100

weight: 2

layout: "learningpathall"
---

## Why run Alluxio on Azure Cobalt 100

Alluxio on Arm-based Azure Cobalt 100 processors delivers high-performance data access for analytics and AI workloads. Cobalt 100's dedicated physical cores per vCPU provide consistent and predictable performance, which complements Alluxio’s in-memory caching and data orchestration capabilities.

By combining Alluxio’s memory-centric architecture with the efficiency of Arm-based infrastructure, you can significantly reduce data access latency, accelerate compute frameworks like Apache Spark, and optimize overall data pipeline performance.sors delivers high-performance, low-latency data operations for real-time messaging and event processing. Cobalt 100's dedicated physical cores per vCPU provide consistent performance that suits Redis's in-memory architecture and event-driven workloads.

## Azure Cobalt 100 Arm-based processor

Azure’s Cobalt 100 is Microsoft’s first-generation, in-house Arm-based processor. Built on Arm Neoverse N2, Cobalt 100 is a 64-bit CPU that delivers strong performance and energy efficiency for cloud-native, scale-out Linux workloads such as web and application servers, data analytics, open-source databases, and caching systems. Running at 3.4 GHz, Cobalt 100 allocates a dedicated physical core for each vCPU, which helps ensure consistent and predictable performance.

To learn more, see the Microsoft blog [Announcing the preview of new Azure VMs based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).

## Alluxio

Alluxio is an open-source data orchestration platform that enables fast and reliable access to data across distributed storage systems. It acts as a unified layer between compute frameworks and storage systems, improving performance for data-intensive applications.

Alluxio is widely used in modern data platforms to accelerate analytics workloads by caching frequently accessed data in memory, reducing latency and minimizing repeated reads from slower storage systems such as local disks or cloud storage.

Alluxio integrates seamlessly with popular analytics frameworks like Apache Spark, Presto, and Hadoop, making it ideal for building high-performance data pipelines and AI/ML workloads.

To learn more, see the official [Alluxio documentation](https://docs.alluxio.io/).

Alluxio provides key capabilities for data orchestration and performance optimization:

- **Data Caching:** Frequently accessed data is stored in memory, significantly reducing access time compared to disk-based reads.

- **Unified Namespace:** Alluxio presents a single logical view of data across multiple storage systems, simplifying data access.

- **Tiered Storage:** Supports multiple storage layers (memory, SSD, HDD), enabling efficient data management based on access patterns.

- **Compute Integration:** Works with analytics engines like Apache Spark to accelerate data processing without modifying application logic.

Alluxio is commonly used in:

- Big data analytics and processing
- AI and machine learning pipelines
- Data lake acceleration
- ETL and batch processing workflows
- High-performance data access layers

In this Learning Path, you'll deploy Alluxio on an Azure Cobalt 100 Arm64 virtual machine and build a data orchestration and caching layer for analytics workloads. You will integrate Alluxio with Apache Spark and benchmark performance to understand how caching improves data access speed.

## What you've learned and what's next

You now have the context for why Azure Cobalt 100 and Alluxio are a strong combination for high-performance data orchestration and analytics workloads. Next, you'll create the virtual machine that will run Alluxio throughout this Learning Path.
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
---
title: Deploy Alluxio on Azure Cobalt 100
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Deploy Alluxio on Azure Cobalt 100 (Arm)

This section guides you through installing Alluxio on an Azure Cobalt 100 Arm-based virtual machine and configuring it with local storage.

You will set up a unified data orchestration layer that sits between compute frameworks and storage systems.

### Why Alluxio?

- Speeds up data access using memory caching
- Reduces repeated disk I/O
- Improves performance for analytics workloads

## Update your system

```bash
sudo apt update && sudo apt upgrade -y
```

## Install required dependencies
These tools are required for downloading and extracting software:

```bash
sudo apt install -y wget curl tar rsync nano
```

## Install Java 11 (Required)

Alluxio supports **Java 8 and Java 11**.
Java 17 will cause runtime errors sometimes (as already experienced).

```bash
wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | \
sudo gpg --dearmor -o /usr/share/keyrings/adoptium.gpg

echo "deb [signed-by=/usr/share/keyrings/adoptium.gpg] https://packages.adoptium.net/artifactory/deb noble main" | \
sudo tee /etc/apt/sources.list.d/adoptium.list

sudo apt update
sudo apt install -y temurin-11-jdk
```

**Set Java:**

```bash
sudo update-alternatives --config java
```

- Select Java 11

**Verify:**

```bash
java -version
```

The output is similar to:

```output
openjdk version "11.0.30" 2026-01-20
openJDK Runtime Environment Temurin-11.0.30+7 (build 11.0.30+7)
openJDK 64-Bit Server VM Temurin-11.0.30+7 (build 11.0.30+7, mixed mode)
```

## Download and install Alluxio

```bash
cd /opt
sudo wget https://downloads.alluxio.io/downloads/files/2.9.4/alluxio-2.9.4-bin.tar.gz
sudo tar -xvzf alluxio-2.9.4-bin.tar.gz
sudo mv alluxio-2.9.4 alluxio
sudo chown -R $USER:$USER /opt/alluxio
```

## Configure environment variables
This allows you to run Alluxio commands globally.

```bash
echo 'export ALLUXIO_HOME=/opt/alluxio' >> ~/.bashrc
echo 'export PATH=$PATH:$ALLUXIO_HOME/bin' >> ~/.bashrc
source ~/.bashrc
```

## Configure Alluxio
Navigate to configuration directory:

```bash
cd /opt/alluxio/conf
cp alluxio-env.sh.template alluxio-env.sh
cp alluxio-site.properties.template alluxio-site.properties
```

## Configure RAM-based storage
Alluxio uses memory for fast data access.

**Edit:**

```bash
nano alluxio-env.sh
```

**Add:**

```bash
export ALLUXIO_RAM_FOLDER=/dev/shm
```

`/dev/shm` is a Linux in-memory filesystem (RAM-backed storage)

## Configure core properties

```bash
nano alluxio-site.properties
```

```bash
alluxio.master.hostname=localhost
alluxio.worker.memory.size=6GB
alluxio.master.mount.table.root.ufs=/mnt/data
```

**Explanation:**

- `master.hostname` → where Alluxio master runs
- `worker.memory.size` → RAM allocated for caching
- `root.ufs` → underlying storage (your disk)

## Setup storage directory
This is your underlying file system (UFS).

```bash
sudo mkdir -p /mnt/data
sudo chmod -R 777 /mnt/data
```

## Start Alluxio
Format metadata (first time only):

```bash
alluxio format
```

**Start Alluxio in local mode:**

```bash
alluxio-start.sh local NoMount
```

The output is similar to:

```output
Starting to monitor all local services.
-----------------------------------------
--- [ OK ] The master service @ alluxio-arm64.xaxcsurvhrzefjc5ihdpsf2vbc.rx.internal.cloudapp.net is in a healthy state.
--- [ OK ] The job_master service @ alluxio-arm64.xaxcsurvhrzefjc5ihdpsf2vbc.rx.internal.cloudapp.net is in a healthy state.
--- [ OK ] The worker service @ alluxio-arm64.xaxcsurvhrzefjc5ihdpsf2vbc.rx.internal.cloudapp.net is in a healthy state.
--- [ OK ] The job_worker service @ alluxio-arm64.xaxcsurvhrzefjc5ihdpsf2vbc.rx.internal.cloudapp.net is in a healthy state.
--- [ OK ] The proxy service @ alluxio-arm64.xaxcsurvhrzefjc5ihdpsf2vbc.rx.internal.cloudapp.net is in a healthy state.
```

## Verify Alluxio services

```bash
jps
```

**Expected output:**

```output
AlluxioJobWorker
AlluxioJobMaster
Jps
AlluxioMaster
AlluxioProxy
AlluxioWorker
```

**Open:**
Open in your browser:

```text
http://<VM-IP>:19999
```

![Alluxio dashboard showing cluster summary and worker status on Azure Cobalt 100 VM#center](images/alluxio-ui.png "Alluxio Web UI with cluster summary and worker details")

## Alluxio UI Overview

What you can see:

- Master status (Leader node)
- Worker memory usage
- Storage capacity
- Cached data blocks
- Cluster health

## What you've learned and what's next

You have successfully:

- Installed Alluxio on an Arm-based VM
- Configured compute and storage layers
- Enabled memory-based data caching
- Verified cluster health via UI

You are now ready to integrate Alluxio with analytics frameworks.
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Create a firewall rule on Azure
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Configure Azure firewall for Alluxio Web UI

To allow external traffic on port **19999** for Alluxio running on an Azure virtual machine, open the port in the Network Security Group (NSG) attached to the virtual machine's network interface or subnet.

{{% notice Note %}}For more information about Azure setup, see [Getting started with Microsoft Azure Platform](/learning-paths/servers-and-cloud-computing/csp/azure/).{{% /notice %}}

## Create a firewall rule in Azure

To expose the TCP port **19999**, create a firewall rule.

Navigate to the [Azure Portal](https://portal.azure.com), go to **Virtual Machines**, and select your virtual machine.

![Azure Portal showing Virtual Machines list alt-txt#center](images/virtual_machine.png "Virtual Machines")

In the left menu, select **Networking** and in the **Networking** select **Network settings** that's associated with the virtual machine's network interface.

![Azure Portal showing Network settings with security group configuration alt-txt#center](images/networking.png "Network settings")

Navigate to **Create port rule**, and select **Inbound port rule**.

![Azure Portal showing Create port rule dropdown menu alt-txt#center](images/port_rule.png "Create rule")

Configure the inbound security rule with the following settings:

- **Source:** Any
- **Source port ranges:** *
- **Destination:** Any
- **Destination port ranges:** **19999**
- **Protocol:** TCP
- **Action:** Allow
- **Name:** allow-alluxio-port

After filling in the details, select **Add** to save the rule.

![Azure Portal showing inbound security rule form with port 9999 configuration alt-txt#center](images/inbound_rule.png "Network settings")

The network firewall rule is now created, allowing Alluxio Web UI to be accessed over port **19999**.

## What you've learned and what's next

You've configured the Azure Network Security Group to allow incoming traffic on port 19999. This firewall rule enables external access to the Alluxio Web UI for monitoring cluster status and storage usage.

Next, you'll integrate Alluxio with Apache Spark and begin analyzing cached data performance.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading