Original Article: MLIR , A Global Optimization and Dataflow Analysis by Jeremy Kun
Windows Adaptation: Focus on MLIR's dataflow analysis framework with practical examples using CMake build integration.
This tutorial uses emojis to help you navigate:
- 📖 Reading sections - Conceptual explanations and background
- 🔬 Examples - Code samples and detailed examination
- 🔍 Deep dives - Feature exploration and sage advice
- 👉 Action sections - Commands to run and tasks to complete
- Understanding dataflow analysis fundamentals
- Using MLIR's dataflow analysis framework
- Implementing lattice structures for analysis values
- Writing transfer functions for operations
- Creating custom analyses (noise propagation example)
- Building optimization passes using analysis results
- Integrating IntegerRangeAnalysis for range tracking
- CMake integration for analysis passes
Here's a fundamental compiler problem: you want to optimize code, but you can't run it.
You need to answer questions like:
- "Is this variable always constant?"
- "Is this expression already computed somewhere?"
- "Could this pointer alias that one?"
But you can't just execute the program and observe, the compiler runs at compile time, before the program has inputs.
Dataflow analysis solves this by propagating abstract information through the program's control flow graph. Instead of executing with concrete values, you reason with abstract properties that hold across all possible executions.
Dataflow analysis combines two essential components:
1. Transfer Functions - How operations transform information
If x = y + z, and we know y ∈ [1,10] and z ∈ [5,15],
then x ∈ [6,25]
2. Join Operations - How to merge information from different paths
If one branch sets x = 5 and another sets x = 10,
then after the merge, x ∈ [5,10]
Together, these let you propagate facts through arbitrary control flow, loops, branches, function calls, arriving at safe approximations of program behavior.
Jeremy Kun's tutorial uses a concrete example: tracking "noise" in homomorphic encryption operations. The question: "At this program point, how many bits of noise could this ciphertext contain?"
This isn't constant propagation (concrete values) or type checking (static types). It's abstract interpretation, tracking a property (noise bounds) that affects correctness but isn't directly represented in the IR.
Why this example matters:
- Shows analysis for a domain-specific property
- Demonstrates global optimization (where to insert noise reduction ops)
- Illustrates analysis-to-transformation pipeline
- Uses MLIR's extensibility (custom transfer functions via interfaces)
For context, here are traditional analyses:
Reaching definitions: "Which assignments might reach this use?"
- Application: Dead code elimination, register allocation
Live variable: "Is this variable's value used later?"
- Application: Dead store elimination
Constant propagation: "Is this variable always constant?"
- Application: Compile-time evaluation (SCCP from Tutorial 07)
Available expressions: "Has this expression been computed already?"
- Application: Common subexpression elimination
MLIR's framework generalizes these patterns, letting you implement domain-specific analyses without reinventing the propagation machinery.
Every dataflow analysis needs three things:
1. Lattice (Value Domain)
A partially-ordered set with:
- Bottom (⊥): Uninitialized/Unknown
- Top (⊤): Could be anything/Overdefined
- Join operation (∨): Combines values from multiple paths
- Partial order (≤): Represents information refinement
Example: Constant propagation lattice
⊤ (overdefined - not constant)
/ | \
/ | \
-1 0 1 2 3 ... (concrete values)
\ | /
\ | /
⊥ (uninitialized)
2. Transfer Function
Describes how an operation transforms input values to outputs.
transfer: (Operation × InputLatticeValues) → OutputLatticeValues
Example: Addition transfer function
transfer(add, [Constant(2), Constant(3)]) = Constant(5)
transfer(add, [Constant(2), ⊥]) = ⊥
transfer(add, [Constant(2), ⊤]) = ⊤
3. Join Operation
Combines values when control flow merges.
join: (LatticeValue × LatticeValue) → LatticeValue
Properties (must be satisfied):
- Associative:
(a ∨ b) ∨ c = a ∨ (b ∨ c) - Commutative:
a ∨ b = b ∨ a - Idempotent:
a ∨ a = a
Example: Constant propagation join
Constant(5) ∨ Constant(5) = Constant(5)
Constant(5) ∨ Constant(7) = ⊤ (not constant!)
Constant(5) ∨ ⊥ = Constant(5)
⊤ ∨ anything = ⊤
Analysis runs iteratively:
1. Initialize all values to ⊥ (bottom)
2. Repeat:
a. For each operation:
- Apply transfer function
- Join with existing output values
b. Propagate to successor operations
3. Until no values change (fixed point reached)
Termination guarantee: Lattice is finite and values only move upward (toward ⊤).
MLIR provides infrastructure so you only implement domain-specific logic.
┌─────────────────────────────────┐
│ DataflowSolver │ ← Orchestrates analysis
│ - Fixed-point iteration │
│ - Dependency tracking │
│ - Lattice management │
└─────────────────────────────────┘
↓
┌─────────────────────────────────┐
│ AbstractDenseLattice │ ← Base class for lattices
│ - join() operation │
│ - Value domain │
└─────────────────────────────────┘
↓
┌─────────────────────────────────┐
│ Your Custom Lattice │ ← You implement this
│ - Specific value type │
│ - Domain-specific join │
└─────────────────────────────────┘
↓
┌─────────────────────────────────┐
│ Transfer Functions │ ← You implement these
│ - Per operation logic │
│ - InferIntRangeInterface etc. │
└─────────────────────────────────┘
DataflowSolver:
- Manages fixed-point iteration
- Tracks analysis state
- Provides query interface
AbstractDenseLattice<T>:
- Base class for lattice values
- Templated on value type
T - Provides join operation
IntegerValueRange:
- Built-in lattice for integer ranges
- Tracks minimum and maximum values
- Used by
IntegerRangeAnalysis
Let's build a custom analysis for a "noisy arithmetic" dialect where operations accumulate bounded random noise.
Noisy operations:
noisy.add %x, %y- Adds values plus ±1 bit of noisenoisy.mul %x, %y- Multiplies values, doubles noisenoisy.reduce_noise %x- Reduces noise by half (expensive!)
Goal: Track noise bounds through the program and optimize placement of reduce_noise operations.
File: lib/Analysis/NoiseAnalysis.h
#ifndef NOISE_ANALYSIS_H
#define NOISE_ANALYSIS_H
#include "mlir/Analysis/DataFlowFramework.h"
#include "mlir/IR/Value.h"
namespace mlir {
namespace tutorial {
//===----------------------------------------------------------------------===//
// NoiseBound - Represents noise level bounds
//===----------------------------------------------------------------------===//
class NoiseBound {
public:
NoiseBound() : min(0), max(0), isUninitialized(true) {}
NoiseBound(int minNoise, int maxNoise)
: min(minNoise), max(maxNoise), isUninitialized(false) {}
// Check if value is bottom (uninitialized)
bool isBottom() const { return isUninitialized; }
// Get noise bounds
int getMin() const { return min; }
int getMax() const { return max; }
// Join operation - combines two noise bounds
static NoiseBound join(const NoiseBound &lhs, const NoiseBound &rhs) {
// Bottom ∨ x = x
if (lhs.isBottom())
return rhs;
if (rhs.isBottom())
return lhs;
// Take conservative estimate (maximum range)
int newMin = std::min(lhs.min, rhs.min);
int newMax = std::max(lhs.max, rhs.max);
return NoiseBound(newMin, newMax);
}
bool operator==(const NoiseBound &other) const {
if (isUninitialized != other.isUninitialized)
return false;
if (isUninitialized)
return true;
return min == other.min && max == other.max;
}
bool operator!=(const NoiseBound &other) const {
return !(*this == other);
}
private:
int min;
int max;
bool isUninitialized;
};
//===----------------------------------------------------------------------===//
// NoiseLattice - Lattice wrapper for dataflow framework
//===----------------------------------------------------------------------===//
class NoiseLattice : public dataflow::AbstractDenseLattice {
public:
using AbstractDenseLattice::AbstractDenseLattice;
// Get current noise bound
const NoiseBound &getValue() const { return value; }
// Join operation for the lattice
ChangeResult join(const NoiseLattice &other) {
NoiseBound newValue = NoiseBound::join(value, other.value);
if (newValue != value) {
value = newValue;
return ChangeResult::Change;
}
return ChangeResult::NoChange;
}
// Print for debugging
void print(raw_ostream &os) const override {
if (value.isBottom()) {
os << "⊥";
} else {
os << "[" << value.getMin() << ", " << value.getMax() << "]";
}
}
private:
NoiseBound value;
};
} // namespace tutorial
} // namespace mlir
#endif // NOISE_ANALYSIS_HFile: lib/Analysis/NoiseAnalysis.cpp
#include "NoiseAnalysis.h"
#include "Dialect/Noisy/NoisyOps.h"
#include "mlir/Analysis/DataFlowFramework.h"
using namespace mlir;
using namespace mlir::tutorial;
using namespace mlir::tutorial::noisy;
//===----------------------------------------------------------------------===//
// NoiseAnalysis - Main analysis implementation
//===----------------------------------------------------------------------===//
class NoiseAnalysis : public dataflow::DenseForwardDataFlowAnalysis<NoiseLattice> {
public:
using DenseForwardDataFlowAnalysis::DenseForwardDataFlowAnalysis;
// Visit an operation and compute output noise bounds
void visitOperation(Operation *op,
ArrayRef<const NoiseLattice *> operands,
ArrayRef<NoiseLattice *> results) override {
// Handle noisy.add: adds ±1 bit of noise
if (auto addOp = dyn_cast<AddOp>(op)) {
if (operands[0]->getValue().isBottom() ||
operands[1]->getValue().isBottom()) {
// Uninitialized input
return;
}
NoiseBound lhs = operands[0]->getValue();
NoiseBound rhs = operands[1]->getValue();
// Noise combines additively, plus ±1 from operation
int newMin = lhs.getMin() + rhs.getMin() - 1;
int newMax = lhs.getMax() + rhs.getMax() + 1;
propagateIfChanged(results[0],
results[0]->join(NoiseLattice(NoiseBound(newMin, newMax))));
return;
}
// Handle noisy.mul: doubles noise
if (auto mulOp = dyn_cast<MulOp>(op)) {
if (operands[0]->getValue().isBottom() ||
operands[1]->getValue().isBottom()) {
return;
}
NoiseBound lhs = operands[0]->getValue();
NoiseBound rhs = operands[1]->getValue();
// Noise doubles during multiplication
int newMin = (lhs.getMin() + rhs.getMin()) * 2;
int newMax = (lhs.getMax() + rhs.getMax()) * 2;
propagateIfChanged(results[0],
results[0]->join(NoiseLattice(NoiseBound(newMin, newMax))));
return;
}
// Handle noisy.reduce_noise: halves noise
if (auto reduceOp = dyn_cast<ReduceNoiseOp>(op)) {
if (operands[0]->getValue().isBottom()) {
return;
}
NoiseBound input = operands[0]->getValue();
// Noise reduced by half
int newMin = input.getMin() / 2;
int newMax = input.getMax() / 2;
propagateIfChanged(results[0],
results[0]->join(NoiseLattice(NoiseBound(newMin, newMax))));
return;
}
// Handle noisy.constant: zero noise
if (auto constOp = dyn_cast<ConstantOp>(op)) {
propagateIfChanged(results[0],
results[0]->join(NoiseLattice(NoiseBound(0, 0))));
return;
}
// Default: propagate input noise unchanged
setAllToEntryStates(results);
}
// Set initial state for block arguments
void setToEntryState(NoiseLattice *lattice) override {
// Function arguments start with zero noise
lattice->join(NoiseLattice(NoiseBound(0, 0)));
}
};File: lib/Analysis/NoiseAnalysisPass.cpp
#include "NoiseAnalysis.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Analysis/DataFlowFramework.h"
using namespace mlir;
using namespace mlir::tutorial;
namespace {
struct NoiseAnalysisPass
: public PassWrapper<NoiseAnalysisPass, OperationPass<func::FuncOp>> {
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(NoiseAnalysisPass)
StringRef getArgument() const final { return "noise-analysis"; }
StringRef getDescription() const final {
return "Analyze noise propagation in noisy arithmetic";
}
void runOnOperation() override {
Operation *op = getOperation();
// Create dataflow solver
DataFlowSolver solver;
// Load the noise analysis
solver.load<NoiseAnalysis>();
// Initialize with entry points
if (failed(solver.initializeAndRun(op))) {
signalPassFailure();
return;
}
// Query and print results
op->walk([&](Operation *innerOp) {
for (Value result : innerOp->getResults()) {
const NoiseLattice *lattice = solver.lookupState<NoiseLattice>(result);
if (!lattice) {
continue;
}
llvm::outs() << "Value: ";
result.print(llvm::outs());
llvm::outs() << " -> Noise: ";
lattice->print(llvm::outs());
llvm::outs() << "\n";
}
});
}
};
} // namespace
void registerNoiseAnalysisPass() {
PassRegistration<NoiseAnalysisPass>();
}File: tests/noise_analysis.mlir
// RUN: tutorial-opt %s --noise-analysis | FileCheck %s
func.func @example() {
// CHECK: Value: %{{.*}} -> Noise: [0, 0]
%c1 = noisy.constant 5 : i32
// CHECK: Value: %{{.*}} -> Noise: [0, 0]
%c2 = noisy.constant 10 : i32
// CHECK: Value: %{{.*}} -> Noise: [-1, 1]
%sum = noisy.add %c1, %c2 : i32
// CHECK: Value: %{{.*}} -> Noise: [-2, 2]
%product = noisy.mul %sum, %sum : i32
// CHECK: Value: %{{.*}} -> Noise: [-1, 1]
%reduced = noisy.reduce_noise %product : i32
return
}MLIR provides built-in integer range analysis - tracks min/max values for integer operations.
Tracks possible value ranges for integers:
%c5 = arith.constant 5 : i32 // Range: [5, 5]
%c10 = arith.constant 10 : i32 // Range: [10, 10]
%sum = arith.addi %c5, %c10 : i32 // Range: [15, 15]
// After control flow merge:
%x = some_unknown_op : i32 // Range: [-∞, +∞]
%bounded = arith.andi %x, 0xFF : i32 // Range: [0, 255]To enable range analysis for your operations:
File: lib/Dialect/Noisy/NoisyOps.td
def Noisy_AddOp : Noisy_Op<"add", [Pure, Commutative]> {
let summary = "Noisy addition with random error";
let arguments = (ins I32:$lhs, I32:$rhs);
let results = (outs I32:$result);
// Enable integer range inference
let extraClassDeclaration = [{
void inferResultRanges(ArrayRef<ConstantIntRanges> argRanges,
SetIntRangeFn setResultRange);
}];
}
File: lib/Dialect/Noisy/NoisyOps.cpp
#include "mlir/Interfaces/InferIntRangeInterface.h"
void AddOp::inferResultRanges(ArrayRef<ConstantIntRanges> argRanges,
SetIntRangeFn setResultRange) {
// Get input ranges
const ConstantIntRanges &lhs = argRanges[0];
const ConstantIntRanges &rhs = argRanges[1];
// Compute output range: lhs + rhs ± noise
// For simplicity, assume ±1 noise
APInt minValue = lhs.smin() + rhs.smin() - 1; // Worst case: -1 noise
APInt maxValue = lhs.smax() + rhs.smax() + 1; // Worst case: +1 noise
// Set result range
setResultRange(getResult(), ConstantIntRanges::range(minValue, maxValue));
}File: tests/range_analysis.mlir
// RUN: tutorial-opt %s --int-range-analysis | FileCheck %s
func.func @ranges(%arg: i32) -> i32 {
// Known ranges
%c5 = arith.constant 5 : i32
%c10 = arith.constant 10 : i32
// Range: [15, 15]
%sum = arith.addi %c5, %c10 : i32
// Multiply by bounded value
// Range: [0, 15 * max(%arg)]
%result = arith.muli %sum, %arg : i32
return %result : i32
}Analysis results enable powerful optimizations.
Use analysis results to insert reduce_noise operations optimally.
File: lib/Transform/OptimizeNoise.cpp
#include "Analysis/NoiseAnalysis.h"
#include "mlir/Pass/Pass.h"
#include "mlir/IR/PatternMatch.h"
using namespace mlir;
using namespace mlir::tutorial;
namespace {
// Pattern: Insert reduce_noise when noise exceeds threshold
struct InsertNoiseReduction : public OpRewritePattern<noisy::AddOp> {
InsertNoiseReduction(MLIRContext *context, DataFlowSolver &solver, int threshold)
: OpRewritePattern<noisy::AddOp>(context), solver(solver), threshold(threshold) {}
LogicalResult matchAndRewrite(noisy::AddOp op,
PatternRewriter &rewriter) const override {
// Query noise analysis
const NoiseLattice *lattice = solver.lookupState<NoiseLattice>(op.getResult());
if (!lattice || lattice->getValue().isBottom()) {
return failure();
}
NoiseBound noise = lattice->getValue();
// Check if noise exceeds threshold
if (noise.getMax() <= threshold) {
return failure(); // Within acceptable bounds
}
// Insert reduce_noise operation after this add
rewriter.setInsertionPointAfter(op);
Value reduced = rewriter.create<noisy::ReduceNoiseOp>(
op.getLoc(), op.getResult().getType(), op.getResult());
// Replace uses (except the reduce_noise op itself)
op.getResult().replaceAllUsesExcept(reduced, {reduced.getDefiningOp()});
return success();
}
private:
DataFlowSolver &solver;
int threshold;
};
//===----------------------------------------------------------------------===//
// Pass implementation
//===----------------------------------------------------------------------===//
struct OptimizeNoisePass
: public PassWrapper<OptimizeNoisePass, OperationPass<func::FuncOp>> {
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(OptimizeNoisePass)
StringRef getArgument() const final { return "optimize-noise"; }
StringRef getDescription() const final {
return "Insert noise reduction operations optimally";
}
Option<int> threshold{*this, "threshold",
llvm::cl::desc("Maximum acceptable noise level"),
llvm::cl::init(4)};
void runOnOperation() override {
func::FuncOp func = getOperation();
// Run analysis
DataFlowSolver solver;
solver.load<NoiseAnalysis>();
if (failed(solver.initializeAndRun(func))) {
signalPassFailure();
return;
}
// Apply optimization patterns
RewritePatternSet patterns(&getContext());
patterns.add<InsertNoiseReduction>(&getContext(), solver, threshold);
if (failed(applyPatternsAndFoldGreedily(func, std::move(patterns)))) {
signalPassFailure();
}
}
};
} // namespace
void registerOptimizeNoisePass() {
PassRegistration<OptimizeNoisePass>();
}File: tests/optimize_noise.mlir
// RUN: tutorial-opt %s --optimize-noise --threshold=2 | FileCheck %s
func.func @excessive_noise() -> i32 {
%c1 = noisy.constant 5 : i32
%c2 = noisy.constant 10 : i32
// Noise: [-1, 1]
%sum1 = noisy.add %c1, %c2 : i32
// Noise: [-2, 2]
// CHECK: noisy.mul
// CHECK-NEXT: noisy.reduce_noise
%product = noisy.mul %sum1, %sum1 : i32
// Noise: [-3, 3] - exceeds threshold!
// CHECK: noisy.add
// CHECK-NEXT: noisy.reduce_noise
%sum2 = noisy.add %product, %c1 : i32
return %sum2 : i32
}Dataflow analysis can work across function boundaries.
class InterproceduralNoiseAnalysis : public CallGraphAnalysis<NoiseLattice> {
public:
using CallGraphAnalysis::CallGraphAnalysis;
// Handle function calls
void visitCall(CallOpInterface call,
ArrayRef<const NoiseLattice *> arguments,
ArrayRef<NoiseLattice *> results) override {
// Propagate noise through function call
Operation *callee = call.resolveCallable();
if (!callee) {
// Unknown callee - conservative estimate
setAllToEntryStates(results);
return;
}
// Analyze callee with given argument lattices
// ... implementation details ...
}
};add_mlir_library(MLIRNoiseAnalysis
NoiseAnalysis.cpp
NoiseAnalysisPass.cpp
DEPENDS
MLIRNoisyOpsIncGen
LINK_LIBS PUBLIC
MLIRIR
MLIRPass
MLIRAnalysis
MLIRDataFlowFramework
MLIRInferIntRangeInterface
MLIRNoisyDialect
)add_mlir_library(MLIRNoiseOptimization
OptimizeNoise.cpp
DEPENDS
MLIRNoisyOpsIncGen
LINK_LIBS PUBLIC
MLIRIR
MLIRPass
MLIRTransforms
MLIRNoiseAnalysis
MLIRNoisyDialect
)cd D:\repos\mlir-tutorial\build
ninja MLIRNoiseAnalysis MLIRNoiseOptimization
# Run analysis
.\bin\tutorial-opt.exe ..\tests\noise_analysis.mlir --noise-analysis
# Run optimization
.\bin\tutorial-opt.exe ..\tests\optimize_noise.mlir --optimize-noise# Show dataflow iterations
.\build\bin\tutorial-opt.exe test.mlir --noise-analysis `
--debug-only=dataflow
# Show lattice values at each step
.\build\bin\tutorial-opt.exe test.mlir --noise-analysis `
--mlir-print-ir-after-allvoid NoiseAnalysis::visitOperation(Operation *op, ...) {
llvm::errs() << "Visiting: ";
op->print(llvm::errs());
llvm::errs() << "\n";
// ... analysis logic ...
llvm::errs() << "Result lattice: ";
results[0]->print(llvm::errs());
llvm::errs() << "\n";
}void NoiseAnalysisPass::runOnOperation() {
// ... run analysis ...
// Verify all values have valid lattices
getOperation()->walk([&](Operation *op) {
for (Value result : op->getResults()) {
const NoiseLattice *lattice = solver.lookupState<NoiseLattice>(result);
assert(lattice && "Missing lattice for value");
assert(!lattice->getValue().isBottom() && "Uninitialized lattice");
}
});
}Time complexity: O(N × H)
- N = number of operations
- H = height of lattice (⊥ to ⊤)
For finite lattices:
- Guaranteed termination
- Polynomial time complexity
For infinite lattices:
- May not terminate
- Need widening operators
1. Use sparse analysis:
// Only analyze relevant operations
if (!isa<AddOp, MulOp, ReduceNoiseOp>(op)) {
return; // Skip analysis
}2. Cache results:
// Store computed lattices
DenseMap<Value, NoiseLattice> cache;3. Limit lattice height:
// Cap maximum noise level
if (noise.getMax() > MAX_NOISE) {
return NoiseBound::top(); // Over-approximation
}Location: mlir/lib/Analysis/DataFlow/IntegerRangeAnalysis.cpp
Purpose: Track integer value ranges
Usage:
#include "mlir/Analysis/DataFlow/IntegerRangeAnalysis.h"
DataFlowSolver solver;
solver.load<IntegerRangeAnalysis>();
solver.initializeAndRun(op);
// Query ranges
const IntegerValueRange *range = solver.lookupState<IntegerValueRange>(value);Location: mlir/lib/Analysis/DataFlow/DeadCodeAnalysis.cpp
Purpose: Identify unreachable code
Usage:
#include "mlir/Analysis/DataFlow/DeadCodeAnalysis.h"
DataFlowSolver solver;
solver.load<DeadCodeAnalysis>();Location: mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp
Purpose: Propagate constant values
Conceptual:
✅ Lattice theory is a practical tool, not just math: A semilattice, values with a "join" operation that merges information, provides the theoretical guarantee that your analysis will terminate. The join operation naturally expresses "what we know when we merge two control flow paths."
✅ Separation of analysis from transformation: The dataflow framework computes facts about your program without modifying it. Optimization passes then consume these facts to perform safe transformations. This separation means you can reuse analyses across multiple optimization contexts.
✅ Abstract interpretation enables domain reasoning: Instead of tracking concrete values (which explodes exponentially), track abstract properties, ranges, constants, types, nullability. The lattice structure ensures your approximations remain sound even when information flows through complex control flow.
✅ Fixed-point iteration has guarantees: Kildall's method repeatedly propagates information until nothing changes. The lattice's finite height guarantees this process terminates. Your only responsibility is defining a sound join operation and transfer functions.
✅ The framework is the real deliverable: MLIR's dataflow infrastructure handles worklists, state management, and interprocedural coordination. You focus on domain logic: "What properties matter?" and "How do operations transform those properties?" This is profound reuse, write 50 lines instead of 500.
Practical:
✅ IntegerRangeAnalysis tracks value ranges as intervals with join-as-union semantics, demonstrating the pattern for custom analyses
✅ Custom analyses extend base classes (DataFlowAnalysis, AbstractDenseLattice) and implement domain-specific join/transfer logic
✅ Interprocedural analysis crosses function boundaries when call graph information is available
✅ CMake integration follows standard pass patterns with dataflow framework linkage
- Tutorial 13: PDLL Patterns - Advanced pattern description language
- Implement custom analysis for your dialect
- Study MLIR analyses in
mlir/lib/Analysis/DataFlow/ - Build optimization passes using analysis results
- Experiment with lattice designs for different domains
- Profile analysis performance and optimize hot paths
- Dataflow Framework: mlir.llvm.org/docs/DataFlowAnalysis/
- IntegerRangeAnalysis: Check
mlir/include/mlir/Analysis/DataFlow/IntegerRangeAnalysis.h - Classic Dataflow: "Compilers: Principles, Techniques, and Tools" (Dragon Book)
- Kildall's Algorithm: "A Unified Approach to Global Program Optimization" (1973)
- Original Article: jeremykun.com
The dataflow analysis framework in MLIR operates within a control flow paradigm, the dominant model inherited from Von Neumann architecture. But understanding the distinction between control flow and dataflow computation reveals why next-generation compilers, including the Fidelity Framework's Firefly compiler, need to manage both paradigms deftly.
Traditional CPUs (x86, ARM, RISC-V) execute programs as sequences of instructions:
1. Fetch instruction from memory
2. Decode instruction
3. Execute operation
4. Store result
5. Update program counter
6. Repeat
Characteristics:
- Sequential execution - one instruction at a time (or small instruction windows)
- Implicit data dependencies - instructions reference memory locations
- Program counter driven - "what to do next" determined by control flow
- Memory-centric - data moves between registers and memory hierarchy
MLIR's role: The dataflow analysis framework computes abstract properties by analyzing control flow graphs (CFGs). As we've seen in this tutorial, information propagates through basic blocks, joins at merge points, and iterates to fixed points. This matches the control flow execution model perfectly.
// Control flow representation
func.func @example(%cond: i1) -> i32 {
cf.cond_br %cond, ^bb1, ^bb2
^bb1:
%c5 = arith.constant 5 : i32
cf.br ^bb3(%c5 : i32)
^bb2:
%c10 = arith.constant 10 : i32
cf.br ^bb3(%c10 : i32)
^bb3(%result: i32):
return %result : i32
}Analysis perspective: At ^bb3, dataflow analysis must join the lattice values from both paths. If one path provides [5,5] and the other [10,10], the result is [5,10]. This is fundamentally control-flow reasoning.
Emerging architectures (Groq, Tenstorrent, NextSilicon, SambaNova) execute programs as graphs of operations where data availability triggers computation:
When all inputs to an operation are ready → Execute operation → Outputs become available
Characteristics:
- Spatial parallelism - thousands of operations execute simultaneously
- Explicit data dependencies - operations connected by dataflow edges
- Data-driven execution - "what can execute now" determined by data availability
- Communication-centric - data flows directly between compute units
MLIR's representation: Operations naturally express dataflow through SSA (Static Single Assignment):
// The same computation as dataflow
func.func @example_dataflow(%cond: i1) -> i32 {
%c5 = arith.constant 5 : i32
%c10 = arith.constant 10 : i32
// This select expresses dataflow: result depends on data, not control flow
%result = arith.select %cond, %c5, %c10 : i32
return %result : i32
}Execution perspective: On a dataflow processor, %c5 and %c10 compute in parallel immediately. When both values and %cond are ready, the select operation fires. No program counter, no sequential ordering, pure data availability.
Real-world systems exist on a spectrum:
| Architecture Type | Control Flow Emphasis | Dataflow Emphasis | Examples |
|---|---|---|---|
| Traditional CPU | ████████████░░░░ 80% | ░░░░░░░░░░░░░░░░ 20% | x86, ARM, RISC-V |
| Hybrid CPU+GPU | ████████░░░░░░░░ 60% | ░░░░░░░░████████ 40% | CUDA, ROCm, oneAPI |
| Dataflow CGRA | ████░░░░░░░░░░░░ 30% | ░░░░████████████ 70% | NextSilicon, SambaNova |
| Pure Dataflow | ░░░░░░░░░░░░░░░░ 5% | ████████████████ 95% | Groq, Tenstorrent |
The semantic gap: Traditional compilers, including LLVM, force all programs into control flow representations. This creates a fundamental mismatch when targeting dataflow hardware.
Multi-way relationships lost: Consider a simple computation:
// High-level semantic relationships
let result =
if condition then
compute_expensive_A x y z
else
compute_expensive_B x y zWhat the compiler sees (natural hypergraph):
- Three inputs (
x,y,z) participate in both branches - Multi-way relationship:
{condition, x, y, z} → {compute_A OR compute_B} → result - This is naturally a hyperedge (3+ participants)
What LLVM forces (binary graph):
condition → branch_decision
branch_decision → phi_node_x
x → phi_node_x
branch_decision → phi_node_y
y → phi_node_y
branch_decision → phi_node_z
z → phi_node_z
phi_node_x → compute_result
phi_node_y → compute_result
phi_node_z → compute_result
The natural 3-way relationship becomes 9 binary edges. Semantic richness is destroyed.
The Fidelity Framework's Firefly compiler uses a Program Hypergraph (PHG) as its central intermediate representation. Unlike traditional compiler IRs that represent programs as either control flow graphs or dataflow graphs, PHG preserves both paradigms simultaneously.
Traditional graph edge (binary):
type Edge = Node * Node // Can only connect two participantsHyperedge (N-ary):
type Hyperedge = {
Participants: Set<Node> // 3, 4, 5... any number
Relationship: RelationType // What connects them
Temporal: TemporalProperties // When/how they interact
}Example - dataflow relationship:
// Multi-way dataflow operation
let hyperedge = {
Participants = { input_x, input_y, input_z, operation, output }
Relationship = DataflowComputation
Temporal = Simultaneous // All inputs needed at once
}This single hyperedge expresses what LLVM needs 5+ binary edges to represent.
The key insight: The same hypergraph can be projected into different execution models depending on target architecture.
For traditional CPUs (control flow):
// PHG → LLVM IR control flow
let compileToControlFlow (hypergraph: ProgramHypergraph) =
hypergraph.Hyperedges
|> projectToControlFlow
|> generateBasicBlocks
|> insertBranches
|> lowerToLLVMFor dataflow processors:
// PHG → Spatial dataflow
let compileToDataflow (hypergraph: ProgramHypergraph) =
hypergraph.Hyperedges
|> preserveDataflowSemantics
|> mapToComputeUnits
|> routeDataPaths
|> generateSpatialConfigFor hybrid systems (CPU + GPU):
// PHG → Heterogeneous partitioning
let compileToHybrid (hypergraph: ProgramHypergraph) =
hypergraph.Hyperedges
|> partitionByCharacteristics
|> Map.ofList [
ControlFlowDominant, compileToCPU
DataflowDominant, compileToGPU
Mixed, compileToAccelerator
]
|> coordinateExecutionThe PHG isn't just a static representation - it's a learning system that improves with each compilation:
type TemporalProgramHypergraph = {
Current: ProgramHypergraph
History: TemporalProjection list
LearnedPatterns: CompilationKnowledge
RecursionSchemes: SchemeLibrary
}
and TemporalProjection = {
Timestamp: DateTime
GraphSnapshot: ProgramHypergraph
CompilationDecisions: Decision list
PerformanceMetrics: Metrics
ArchitectureTarget: HardwareType
}Learning parallelization through graph coloring:
Traditional compilers use static heuristics for parallelization. The PHG learns optimal patterns:
// First compilation: Conservative
let firstCompile graph =
graph.ColorNodes BasicStrategy // Safe, limited parallelism
|> measurePerformance
|> storeInHistory
// Tenth compilation: Learned patterns
let laterCompile graph history =
let patterns = learnFromHistory history
graph.ColorNodes (OptimizedStrategy patterns) // Aggressive, proven safe
|> measurePerformance
|> updateKnowledgeExample - loop parallelization:
// MLIR input (control flow)
scf.for %i = %c0 to %c1000 step %c1 {
%val = compute %i
store %val, %array[%i]
}After temporal learning:
- First compilation: Sequential execution (safe, unknown dependencies)
- Fifth compilation: 4-way parallelization (observed no dependencies)
- Tenth compilation: Full vectorization + GPU offload (learned access pattern)
The hypergraph accumulates knowledge:
{
PatternID = "loop_array_compute_store"
SeenCount = 10
SafeParallelism = VectorWidth 16
PreferredTarget = GPU
Confidence = 0.95
}How does this relate to Tutorial 12?
MLIR's dataflow analysis framework operates within the control flow paradigm - it propagates abstract values through CFGs, joins at merge points, and reasons about program states at control flow boundaries.
The PHG extends this in two critical ways:
1. Unified analysis across paradigms:
MLIR analysis:
// Traditional: propagate through control flow
void visitOperation(Operation *op,
ArrayRef<const NoiseLattice *> operands,
ArrayRef<NoiseLattice *> results) {
// Transfer function for this operation
// Join at control flow merges
}PHG analysis:
// Extended: propagate through hypergraph
let analyzeHyperedge edge lattices =
match edge.Relationship with
| ControlFlowMerge ->
joinLattices lattices edge.ControlFlowSemantics
| DataflowComputation ->
transferThroughDataflow lattices edge.DataflowSemantics
| HybridPattern ->
analyzeWithBothSemantics lattices edge2. Architecture-aware optimization:
MLIR (target-agnostic):
// Optimize based on analysis results
if (noise.getMax() <= threshold) {
// Insert reduce_noise operation
}PHG (target-adaptive):
// Optimize based on analysis AND target architecture
let optimize analysis targetArch =
match targetArch with
| VonNeumann cpu ->
// Control flow optimization
insertNoiseReductionSequentially analysis cpu
| Dataflow accel ->
// Spatial optimization
fuseNoiseReductionPipeline analysis accel
| Hybrid (cpu, gpu) ->
// Partition optimization
partitionNoiseComputation analysis cpu gpuWhy this matters now:
Modern heterogeneous systems already mix paradigms:
- On-die integration: CPU cores + GPU units + AI accelerators on same chip
- CXL coherent memory: Shared memory between control flow and dataflow processors
- PCIe accelerators: Dataflow cards (GPUs, FPGAs, custom ASICs) in control flow hosts
- Edge hybrids: Low-power dataflow units paired with control flow microcontrollers
Firefly's strategy: Rather than forcing programs into one paradigm, preserve the natural computational semantics in the hypergraph, then project to the appropriate execution model for each hardware target.
Advanced PHG traversal:
Traditional compiler passes walk IRs with fixed traversal patterns (pre-order, post-order, etc.). PHG uses recursion schemes and bidirectional zippers for intelligent navigation.
Catamorphism (fold): Collapse structure bottom-up
// Constant propagation as catamorphism
let propagateConstants =
cata (fun node children ->
match node, children with
| Add, [Constant a; Constant b] -> Constant (a + b)
| _, _ -> node)Anamorphism (unfold): Generate structure top-down
// Loop unrolling as anamorphism
let unrollLoop bound =
ana (fun count ->
if count >= bound then None
else Some (iteration count, count + 1))Hylomorphism (fold after unfold): Transform through intermediate structure
Bidirectional zipper: Navigate with context
type Zipper<'a> = {
Focus: 'a // Current location
Left: 'a list // Already visited (backward context)
Right: 'a list // Not yet visited (forward context)
Up: Zipper<'a> option // Parent context
}
// Intelligent traversal with learning
let traverseWithLearning zipper history =
match lookupPattern zipper.Focus history with
| Some learnedPath ->
// Jump directly based on previous compilation
moveToOptimal zipper learnedPath
| None ->
// Explore and record new pattern
exploreAndLearn zipperThis enables event-sourced compilation intelligence - each compilation leaves a trail that future compilations leverage.
As an MLIR developer, how does this affect you?
Today:
- Build dialects and analyses within MLIR's control flow framework
- Use dataflow analysis for optimization within that paradigm
- Lower to LLVM IR for CPU targets, extend for GPU/accelerators
Tomorrow (with hypergraph-aware compilers):
- Same MLIR dialects, but intermediate compilation through PHG
- Analysis results inform both control flow and dataflow projections
- Single source compiles optimally to heterogeneous targets
- Temporal learning improves compilation over time
Practical integration:
// MLIR dialect → PHG → Multiple targets
let compileThroughFirefly mlirModule =
mlirModule
|> parseMLIR
|> liftToHypergraph // MLIR ops → Hyperedges
|> applyMLIRAnalysis // Reuse MLIR dataflow analysis
|> temporalOptimization // Learn from history
|> projectToTargets [ // Generate multiple outputs
LLVM_IR // Control flow (CPU)
CUDA_PTX // Hybrid (GPU)
SambaNova_SDF // Dataflow (CGRA)
Groq_Config // Pure dataflow
]🔬 Control flow and dataflow are projection paradigms, not fundamental truths
Programs have natural computational semantics. Control flow (sequential instructions) and dataflow (data-driven execution) are execution models we impose. Hypergraphs preserve the semantics, letting the compiler choose the right projection.
🔬 Multi-way relationships are first-class citizens
Traditional graphs force N-way relationships into N binary edges, losing semantic information. Hyperedges maintain the natural arity of computational relationships, enabling better analysis and optimization.
🔬 Temporal learning bridges compilation and runtime
Traditional compilers forget everything between compilations. PHG accumulates knowledge about what optimizations work, building a library of proven patterns that accelerate future compilations and improve quality.
🔬 Heterogeneous architectures need unified abstractions
Modern systems mix CPUs, GPUs, FPGAs, custom accelerators. Rather than maintaining separate compilation paths for each, hypergraph representations compile naturally to all targets from a single unified semantic representation.
🔬 MLIR's dataflow analysis is the foundation, not the ceiling
The lattice theory, transfer functions, and fixed-point iteration you've learned in this tutorial apply regardless of execution paradigm. Hypergraph compilers extend these principles across both control flow and dataflow execution models.
- Hyping Hypergraphs - SpeakEZ Blog: Deep dive into Program Hypergraph architecture
- A Vision For Unified Cognitive Architecture - SpeakEZ Blog: Hypergraphs for AI and compilation convergence
- The Advent of Neuromorphic AI - SpeakEZ Blog: Dataflow architectures and forward gradient learning
- MLIR Dataflow Framework Documentation: mlir.llvm.org/docs/DataFlowAnalysis/
- Kildall's Algorithm (1973): "A Unified Approach to Global Program Optimization"
Previous: ← Tutorial 11: Lowering through LLVM Next: Tutorial 13: PDLL Patterns →