Skip to content

implementation of using systolic array for matrix multiplication implementation in Verilog 2005

Notifications You must be signed in to change notification settings

Hugina/Systolic-Array-Verilog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matrix Multiplication Accelerator Project

Introduction

In this project, we tackled the inefficiencies of matrix multiplication on CPUs due to their general-purpose architectures and limited internal registers. Our objective was to design, verify, and synthesize a hardware accelerator that minimizes data transfers and leverages parallel processing for efficient matrix multiplication. This accelerator is particularly suited for applications in machine learning, where large-scale linear algebra operations are common.

Project Overview

Design Goals

Our design aims to address the core challenge of matrix multiplication - the movement of data between main memory and the computing units. We focused on:

  • Implementing a hardware accelerator for matrix multiplication (A × B + C).
  • Minimizing data transfers by ensuring each data element is moved only once and is used in all relevant computations.
  • Achieving significant parallelism by utilizing systolic arrays for the processing elements (PEs).

Implementation Details

  • Hardware Design: The accelerator is designed as an array of multiply or multiply-accumulate processing elements arranged in a systolic array architecture for minimal data transfers.
  • Software Control: A software layer controls the hardware accelerator, handling data movement from memory to the design using AHB to APB transactions.
  • Verification and Synthesis: The design was verified and synthesized using HDL-Designer and QuestaSim, ensuring compatibility and performance.

Key Features

  • Reconfigurable Dimensions: Supports reconfigurable nk and m dimensions, with constraints to ensure efficient processing.
  • Intermediate Result Handling: Capable of holding and using intermediate results as needed.
  • Error Reporting: Includes mechanisms for detecting overflows and reporting errors through a specific output port.

Submission Requirements and Evaluation

The project submission included:

  • A Verilog-2005 top-level module named matmul.
  • A comprehensive design document detailing the design's block diagram, functional description, flow-chart/state-machine diagram, and any deviations from standard design-checker rules.

The evaluation focused on RTL code quality, comprehensive documentation, and the design's ability to compile with various parameter values.

Conclusion

This project represents a significant step towards optimizing matrix multiplication processes by reducing the reliance on CPU architecture limitations. By designing a specialized hardware accelerator, we demonstrated an innovative approach to achieving faster and more efficient matrix calculations, crucial for the computational demands of modern machine learning applications.


About

implementation of using systolic array for matrix multiplication implementation in Verilog 2005

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published