Algebraic enhancements for GEMM & AI accelerators
-
Updated
Feb 28, 2025 - Python
Algebraic enhancements for GEMM & AI accelerators
Minimal TPU implementation with 8x8 systolic array and PyTorch integration
Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.
SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
Hardware accelerator for 2D convolution using an 8×8 weight-stationary systolic array with split-kernel support, dual-port SRAM architecture, and DMA-based streaming
Modular systolic array with software interface
High-performance systolic array computing framework with AI agents and medical compliance.
AS501 AI Semiconductor Design Basics & Practice Final Project
Parameterized N×N output-stationary systolic array accelerator for INT8 neural network inference. Full RTL-to-GDS flow on ASAP7 7nm using Cadence Genus + Innovus. 667 MHz, 42.7 GOPS peak throughput, 0.33 mW/GOP. SystemVerilog RTL, synthesis, place-and-route and self-checking testbench included.
INT8 Systolic-Array AI Accelerator on Zynq SoC with HW-SW Co-Design and Roofline Performance Analysis
4×4 7-bit matrix multiplication hardware accelerator using a systolic array, with a Python driver for the Basys 3 FPGA and a systolic array UVC using UVM.
This project explores the implementation of Digital Signal Processing algorithms using systolic array architecture to enable parallel computation and pipelined data processing. The design demonstrates efficient hardware utilization and improved processing speed for DSP operations.
8x8 Systolic Array AI Accelerator - Non-pipelined single-cycle MAC PE Systolic array uses wave-skew dataflow for data movement between PEs
4×4 systolic array accelerator in Verilog with cycle-level execution tracing and Python-based dataflow visualization
Parametric Verilog systolic implementation of Cannon's Matrix Multiplication on an M×M torus mesh.
A lightweight RISC-V SoC deployed on FPGA, featuring a custom 2x2 systolic array AI accelerator with wavefront scheduling and bare-metal C drivers built using LiteX.
Designed a Verilog-based systolic array accelerator for matrix and convolution-style computation, featuring parallel processing elements, streamed data flow, and hardware-oriented optimization for TPU-style workloads.
An AXI-native 8x8 systolic array accelerator in Verilog. Features pure dataflow pipelining, Q-format fixed-point arithmetic, and hardware validation on the Kria KV260 FPGA.
Add a description, image, and links to the systolic-array topic page so that developers can more easily learn about it.
To associate your repository with the systolic-array topic, visit your repo's landing page and select "manage topics."