The source code is organized into several sections including a compiler for parsing Python model descriptions into accelerator instructions, synthesizable SystemVerilog RTL, scripts for setting up simulation environments for testing, UVM-based testbench source code for verifying the accelerator in simulation using Cocotb, and additional Python packages and scripts used in this project for general development utilities and aids. The repository also includes configurable parameters such as the systolic array type, height/width, and input bitwidths.
Key takeaways:
- The repository contains source code for ML hardware architectures that require nearly half the number of multiplier units to achieve the same performance, by executing alternative inner-product algorithms.
- A new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture are introduced, which improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968.
- FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units.
- The source code organization includes a compiler for parsing Python model descriptions, synthesizable SystemVerilog RTL, scripts for setting up simulation environments, UVM-based testbench source code, and additional Python packages and scripts used in the project.