Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - trevorpogue/algebraic-nnhw: AI acceleration using matrix multiplication with half the multiplications

Feb 13, 2024 - github.com
This repository contains the source code for machine learning (ML) hardware architectures that use alternative inner-product algorithms to achieve the same performance with nearly half the number of multiplier units. The new algorithm, called the Free-pipeline Fast Inner Product (FFIP), and its hardware architecture improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968. The FFIP can be incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget.

The source code is organized into several sections including a compiler for parsing Python model descriptions into accelerator instructions, synthesizable SystemVerilog RTL, scripts for setting up simulation environments for testing, UVM-based testbench source code for verifying the accelerator in simulation using Cocotb, and additional Python packages and scripts used in this project for general development utilities and aids. The repository also includes configurable parameters such as the systolic array type, height/width, and input bitwidths.

Key takeaways:

  • The repository contains source code for ML hardware architectures that require nearly half the number of multiplier units to achieve the same performance, by executing alternative inner-product algorithms.
  • A new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture are introduced, which improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968.
  • FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units.
  • The source code organization includes a compiler for parsing Python model descriptions, synthesizable SystemVerilog RTL, scripts for setting up simulation environments, UVM-based testbench source code, and additional Python packages and scripts used in the project.
View Full Article

Comments (0)

Be the first to comment!