⚡ OmniTrace

A cycle-accurate hybrid C++/Rust GPU Simulator investigating the physics of H100 Streaming Multiprocessors.

📖 Overview

OmniTrace bridges the gap between high-level ML frameworks and low-level hardware execution. By simulating warp physics, SM bank conflicts, and tensor core latency, it provides a rigorous testbed for optimizing GPU kernels before they touch silicon.

The system utilizes a "Layer Cake" Architecture to balance raw performance with developer safety:

🚀 Core (C++20): High-performance simulation engine handling warp state, shared memory physics, and execution pipelines.
🛡️ Interface (Rust): Provides memory safety, robust FFI bindings, and a parallelized CLI for managing simulation tasks.
🧠 Agent (Python): An AI-driven optimizer that generates PTX kernels, analyzes latency feedback, and iteratively tunes memory access patterns.

📊 Performance Proof: Bank Conflict Analysis

The simulator accurately models the massive latency penalties incurred by Shared Memory Bank Conflicts.

Kernel Strategy	Stride	Latency	Outcome
Unoptimized	32	1024 Cycles	❌ Massive Serialization Stalls
Optimized	1	32 Cycles	✅ Perfect Parallelism

Result: The optimizer achieves a 32.0x speedup by realigning memory access patterns to eliminate bank conflicts.

📂 Project Structure

omnitrace/
├── cpp_core/           # The high-performance simulation engine (CMake)
│   ├── include/        # Public API headers (omni_api.h)
│   └── src/            # Core physics logic (sm_banks.cpp, tensor_core.cpp)
├── rust_tooling/       # The safe CLI wrapper and parser (Cargo)
│   ├── src/            # FFI bindings and command-line logic
│   └── Cargo.toml      # Rust dependency management
└── python_agent/       # The AI optimization logic
    └── agent.py        # Self-optimizing feedback loop script

🚀 Usage

Prerequisites

Rust Toolchain: cargo (for the CLI)
C++ Compiler: cmake, g++ or clang++ (supporting C++20)
Python: Python 3.10+ (for the agent)

Quick Start

Build the Simulator: Compile the C++ core and Rust bindings in release mode.

cd rust_tooling
cargo build --release

Run the Test Suite: Verify the physics engine against known baselines.

cargo test

Launch the AI Agent: Run the self-optimizing feedback loop to demonstrate automatic conflict resolution.

cd ..
python python_agent/agent.py

🧩 Technical Details

Shared Memory Simulation

The SharedMemory class in the C++ core simulates the 32-bank architecture of modern GPUs. It detects when multiple threads within a warp attempt to access different addresses mapping to the same bank, calculating the resulting serialization penalty.

Warp Physics

The WarpState struct maintains the program counter (PC), a simulated register file (32 threads × 64 registers), and an active mask to accurately model divergent execution paths.

🤝 Contributing

Contributions are welcome! Please focus on:

Tensor Core Modeling: Enhancing the simulate_mma_sync logic.
Instruction Set: Expanding the parser to support more PTX instructions.
Visualizations: Improving the reporting of the Python agent.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cpp_core		cpp_core
python_agent		python_agent
rust_tooling		rust_tooling
.gitignore		.gitignore
PROFILE_README.md		PROFILE_README.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ OmniTrace

📖 Overview

📊 Performance Proof: Bank Conflict Analysis

📂 Project Structure

🚀 Usage

Prerequisites

Quick Start

🧩 Technical Details

Shared Memory Simulation

Warp Physics

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ OmniTrace

📖 Overview

📊 Performance Proof: Bank Conflict Analysis

📂 Project Structure

🚀 Usage

Prerequisites

Quick Start

🧩 Technical Details

Shared Memory Simulation

Warp Physics

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages