A lightweight, general-purpose framework for evaluating GPU kernel correctness and performance.
- Three Evaluation Modes: Analyze, Compare, Benchmark
- Heterogeneous Hardware: AMD (HIP) and NVIDIA (CUDA) GPUs
- Execution Environments: Local, Sandbox Container, and Remote Ray Cluster
- Hardware Control: Hardware-aware kernel evaluation under controlled execution settings
- Auto GPU Selection: Benchmark mode picks idle GPU(s) before launching (AMD + NVIDIA)
- Trace Analysis: TraceLens integration for performance profiling analysis
- MCP Server: Model Context Protocol integration for AI agents
- Structured Reports: JSON output for pipeline integration
- Python 3.10+
- AMD ROCm (HIP) or NVIDIA CUDA toolchain (for kernel compilation/profiling)
rocprof-compute(AMD) orncu(NVIDIA) if you enable performance profiling- Docker (default for Benchmark mode when
run_modeisdocker; host execution usesrun_mode: local)
# Basic installation
pip install git+https://github.com/AMD-AGI/Magpie.git
git clone https://github.com/AMD-AGI/Magpie.git
cd Magpie
# Editable install (recommended for development)
pip install -e .
# Or use make
make install# Analyze a kernel using a config file
magpie analyze --kernel-config Magpie/kernel_config.yaml.example
# Compare kernels using a config file
magpie compare --kernel-config examples/ck_grouped_gemm_compare.yaml
# Benchmark vLLM (see examples/benchmarks/*.yaml)
magpie benchmark --benchmark-config examples/benchmarks/benchmark_vllm_dsr1.yaml
# GPU / toolchain summary
magpie --gpu-info
# Run MCP server
python -m Magpie.mcpNote: You can use
python -m Magpieinstead of themagpieCLI for the same subcommands.
| Mode | Description | Status |
|---|---|---|
| Analyze | Single kernel evaluation with testcase | β |
| Compare | Multi-kernel comparison and ranking | β |
| Benchmark | Framework-level benchmarking (vLLM/SGLang) with trace analysis | β |
π See Benchmark mode for vLLM/SGLang usage.
π See Analyze vs Compare for kernel evaluation modes.
Key categories:
gpu: device selection and hardware control (power/frequency).scheduler: local, container, or Ray execution and worker settings.compiling/correctness: default compile behavior, testcase vs Accordo, tolerances.performance: profiler backend (rocprof-compute, ncu, Metrix), timeouts, metric blocks.compare: perf metric weights and winner selection for compare mode.benchmark: InferenceX path, image mapping, default profiler flags.logging: log levels and optional file output.
See Magpie/kernel_config.yaml.example for full examples.
Example configs live in examples/:
| Mode | Config File | Description |
|---|---|---|
| Analyze | examples/ck_gemm_add.yaml |
Single kernel evaluation |
| Analyze | examples/simple_hip_test/analyze_default.yaml |
Minimal HIP example |
| Compare | examples/ck_grouped_gemm_compare.yaml |
Multi-kernel comparison |
| Benchmark | examples/benchmarks/benchmark_vllm_dsr1.yaml |
vLLM (DeepSeek-R1-style) |
| Benchmark | examples/benchmarks/benchmark_vllm_tracelens.yaml |
vLLM + TraceLens |
| Benchmark | examples/benchmarks/benchmark_vllm_kimi_k2.yaml |
vLLM + gap analysis example |
| Benchmark | examples/benchmarks/benchmark_sglang_dsr1.yaml |
SGLang benchmark |
| Benchmark | examples/benchmarks/benchmark_vllm_*_ray.yaml |
vLLM on Ray |
MCP configuration example: Magpie/mcp/config.json
Available tools:
analyze- Analyze kernel correctness and performancecompare- Compare multiple kernel implementationshardware_spec- Query GPU hardware specificationsconfigure_gpu- Configure GPU power and frequencydiscover_kernels- Scan a project and suggest analyzable kernels/configssuggest_optimizations- Suggest performance optimizations from analyze outputcreate_kernel_config- Generate a kernel config YAML for analyzebenchmark- Run vLLM/SGLang framework benchmark with optional profilinggap_analysis- Run gap analysis on existing torch profiler traceslist_benchmark_images- List available Docker images per framework/archlist_benchmark_results- List previous benchmark workspaces and summariesget_benchmark_result- Read detailed results from a specific benchmark runcompare_benchmark_reports- Compare TraceLens reports across benchmark runs
For environments without MCP, install the Magpie skill; see docs/skills-install.md.
make install-dev
make lint
make formatβββ README.md
βββ LICENSE
βββ .gitignore
βββ pyproject.toml # Package configuration (pip install)
βββ requirements.txt
βββ Makefile
βββ examples/ # Example configurations
βββ docs/ # Documentation
β βββ benchmark.md # Benchmark mode (vLLM / SGLang)
β βββ analysis_compare.md # Analyze vs Compare kernel modes
β βββ skills-install.md # Agent skill installation
β βββ images/ # Architecture diagrams
βββ Magpie/
βββ __init__.py # Package initialization
βββ __main__.py # Entry point for python -m Magpie
βββ main.py # CLI implementation
βββ config.yaml # Framework configuration
βββ kernel_config.yaml.example
βββ config/ # Configuration classes
βββ core/ # Core engine components
βββ eval/ # Evaluation pipeline
βββ modes/ # Evaluation modes
β βββ analyze_eval/ # Single kernel analysis
β βββ compare_eval/ # Multi-kernel comparison
β βββ benchmark/ # Framework-level benchmarking
β βββ benchmarker.py # Benchmark orchestration
β βββ config.py # Benchmark configuration
β βββ tracelens.py # TraceLens integration
β βββ gap_analysis.py # Kernel bottleneck report from torch traces
β βββ result.py # Result data structures
βββ mcp/ # MCP Server
β βββ __init__.py
β βββ __main__.py # Entry point for python -m Magpie.mcp
β βββ server.py # MCP server implementation
β βββ config.json # MCP client configuration
βββ utils/ # Utility functions
MIT License. See LICENSE.


