[examples][XeGPU] Add XeGPU MLP example by tkarna · Pull Request #56 · llvm/lighthouse

tkarna · 2026-02-23T15:36:48Z

Adds xegpu_mlp example. Supports arbitrary MLP models. Optional ReLU (on all layers). Bias not yet supported.
Matrix multiplication example has no-accumulate-c option to compute MLP-like C=A*B instead of C+=A*B.
Matmul and MLP examples share the same payload generator and lowering schedule:
- Payload generator moved to lighthouse/ingress/gpu/matmul.py.
- Schedule moved to lighthouse/schedule/xegpu/matmul_schedule.py.

Example: Run simplest KernelBench MLP:

python mlp.py -b 128 -i 16384 -o 8192 --hidden-sizes 16384 16384
MLP with 3 layers
  Layer 0: M=128, N=16384, K=16384
  Layer 1: M=128, N=16384, K=16384
  Layer 2: M=128, N=8192, K=16384
b=128 i=16384 o=8192 hs=16384,16384 dt=f16,f32 time(us): [...] GFLOPS: [...]

rengolin · 2026-02-23T15:47:20Z

examples/xegpu_mlp/mlp.py

+        # cache allocated memrefs
+        self.gpu_memrefs = {}
+
+    def _allocate_array(


can we outline the allocation into a helper module?

Garra1980 · 2026-02-24T13:46:23Z

lighthouse/schedule/xegpu/matmul_schedule.py

+):
+    """Transform schedule for matmul-like payload."""
+    try:
+        mod = bundle_xepu_mlp_schedule(


fschlimb · 2026-02-24T14:33:00Z

examples/xegpu_matmul/payload.py

-def emit_gpu_copy(suffix: str, element_type: ir.Type, rank: int = 2):
-    """Emit GPU copy function."""
-    dyn = ir.ShapedType.get_dynamic_size()
-    memref_dyn_t = ir.MemRefType.get(rank * (dyn,), element_type)
-
-    @func.func(memref_dyn_t, memref_dyn_t, name="gpu_copy_" + suffix)
-    def copy_func(src, dst):
-        gpu.memcpy(None, [], dst, src)
-
-    copy_func.func_op.attributes["llvm.emit_c_interface"] = ir.UnitAttr.get()


This pattern (@func.func + c_interface) could be a decorator provided somewhere in lighthouse (or even better the @func.func decorator could accept this flag or generally attributes)

fschlimb · 2026-02-24T14:43:47Z

examples/xegpu_mlp/mlp.py

+        execution_engine: ExecutionEngine,
+    ) -> ctypes.Structure:
+        key = (name, dtype_str)
+        if key in self.gpu_memrefs:


Why allow the same with different types?

fschlimb · 2026-02-24T14:45:24Z

examples/xegpu_mlp/mlp.py

+        alloc_func = execution_engine.lookup("gpu_alloc_" + dtype_str)
+        mref = make_nd_memref_descriptor(len(shape), as_ctype(dtype))()
+        ptr_mref = ctypes.pointer(ctypes.pointer(mref))
+        ptr_dims = [ctypes.pointer(ctypes.c_int32(d)) for d in shape]
+        alloc_func(get_packed_arg([ptr_mref] + ptr_dims))
+        self.gpu_memrefs[key] = mref


Consider using execution_engine.invoke.

fschlimb · 2026-02-24T14:51:30Z

examples/xegpu_mlp/mlp.py

+        # use integer values to avoid f16/f32 floating point discrepancies
+        def gen_random(shape, dtype):
+            # generate values in range [-3, 3]
+            a = np.round(6 * np.random.random_sample(shape)) - 3


np.random.randint(-3, 4, shape)?

fschlimb · 2026-02-24T15:20:15Z

examples/xegpu_mlp/mlp.py

+            if shape in self.param_db:
+                params = self.param_db[shape]
+            else:
+                raise ValueError(f"No parameters found for matmul shape {shape}")
+            parameters[f"layer_{i}"] = params


Suggested change

if shape in self.param_db:

params = self.param_db[shape]

else:

raise ValueError(f"No parameters found for matmul shape {shape}")

parameters[f"layer_{i}"] = params

if shape not in self.param_db:

raise ValueError(f"No parameters found for matmul shape {shape}")

parameters[f"layer_{i}"] = self.param_db[shape]

fschlimb · 2026-02-24T15:24:55Z

examples/xegpu_mlp/mlp.py

+}
+
+
+class ParameterOracleMLP:


Why is this a class and not just a function?

fschlimb · 2026-02-24T15:26:10Z

examples/xegpu_mlp/mlp.py

+            "xegpu-inst",
+            "final",
+        ],
+        help="Dump kernel IR at different stages of lowering.",


Might want to also mention that it will do nothing else than lowering/dumping.

fschlimb · 2026-02-24T15:31:32Z

examples/xegpu_mlp/mlp.py

+            parts = [
+                f"b={args.batch_size}",
+                f"i={args.input_size}",
+                f"o={args.output_size}",
+                f"hs={list2str(hidden_sizes)}",
+                f"dt={ab_type},{c_type}",
+                f"time(us): {elapsed:.2f}",
+                f"GFLOPS: {gflops:.2f}",
+            ]
+            print(" ".join(parts))


Suggested change

parts = [

f"b={args.batch_size}",

f"i={args.input_size}",

f"o={args.output_size}",

f"hs={list2str(hidden_sizes)}",

f"dt={ab_type},{c_type}",

f"time(us): {elapsed:.2f}",

f"GFLOPS: {gflops:.2f}",

]

print(" ".join(parts))

print(

f"b={args.batch_size} "

f"i={args.input_size} "

f"o={args.output_size} "

f"hs={list2str(hidden_sizes)} "

f"dt={ab_type},{c_type} "

f"time(us): {elapsed:.2f} "

f"GFLOPS: {gflops:.2f}"

)

fschlimb · 2026-02-24T15:34:36Z

examples/xegpu_mlp/README.md

+python mlp.py -b 128 -i 16384 -o 8192 --hidden-sizes 16384 16384 ...
+```
+
+which corresponds to
+
+```txt
+MLP with 3 layers
+  Layer 0: M=128, N=16384, K=16384
+  Layer 1: M=128, N=16384, K=16384
+  Layer 2: M=128, N=8192, K=16384
+```


For clarity, consider using different values for every parameter (instead of 16384 for three parameters). Same above.

fschlimb · 2026-02-24T15:36:13Z