A VMIPS-like vector ISA simulator that models both correctness and performance for vectorized workloads.
Built to explore how low-level systems behave under contention and strict correctness constraints.
This project implements a full simulator for a vector processor, including:
- A functional simulator that executes assembly-level instructions
- A timing simulator that models pipeline execution, memory contention, and parallelism
The system is designed to explore how architectural decisions (e.g., pipeline depth, number of lanes, memory banks) impact performance and correctness under real workloads.
The simulator is split into two major components:
- Instruction fetch and decode
- Dependency tracking using register busy boards
- Dispatch queues for:
- Vector compute
- Vector memory
- Scalar operations
- Parallel execution pipelines:
- Add, multiply, divide, shuffle
- Vector load/store
- Multi-lane processing units
- Banked vector memory with conflict detection
- Data hazards and pipeline stalls
- Resource contention across execution units
- Memory bank conflicts and scheduling delays
- Deterministic execution under constrained resources
The timing simulator evaluates execution efficiency (CPI) across workloads:
- Dot product
- Fully connected layer
- Convolution
- Increasing parallel lanes improves throughput up to a saturation point
- Memory bank conflicts significantly impact performance
- Pipeline depth affects latency vs. throughput tradeoffs
This project focuses on building stateful, deterministic systems under resource constraints, similar to real-world high-frequency execution environments.
The project emphasizes:
- Correctness under concurrency
- Clean modeling of system bottlenecks
- Giving low-level mechanics measurable performance outcomes
make run # Functional simulator
make timing # Timing simulator
make verify # Validate outputs
make clean # Cleanup