Matrix Multiplication Accelerator Project

Introduction

In this project, we tackled the inefficiencies of matrix multiplication on CPUs due to their general-purpose architectures and limited internal registers. Our objective was to design, verify, and synthesize a hardware accelerator that minimizes data transfers and leverages parallel processing for efficient matrix multiplication. This accelerator is particularly suited for applications in machine learning, where large-scale linear algebra operations are common.

Project Overview

Design Goals

Our design aims to address the core challenge of matrix multiplication - the movement of data between main memory and the computing units. We focused on:

Implementing a hardware accelerator for matrix multiplication (A × B + C).
Minimizing data transfers by ensuring each data element is moved only once and is used in all relevant computations.
Achieving significant parallelism by utilizing systolic arrays for the processing elements (PEs).

Implementation Details

Hardware Design: The accelerator is designed as an array of multiply or multiply-accumulate processing elements arranged in a systolic array architecture for minimal data transfers.
Software Control: A software layer controls the hardware accelerator, handling data movement from memory to the design using AHB to APB transactions.
Verification and Synthesis: The design was verified and synthesized using HDL-Designer and QuestaSim, ensuring compatibility and performance.

Key Features

Reconfigurable Dimensions: Supports reconfigurable nk and m dimensions, with constraints to ensure efficient processing.
Intermediate Result Handling: Capable of holding and using intermediate results as needed.
Error Reporting: Includes mechanisms for detecting overflows and reporting errors through a specific output port.

Submission Requirements and Evaluation

The project submission included:

A Verilog-2005 top-level module named matmul.
A comprehensive design document detailing the design's block diagram, functional description, flow-chart/state-machine diagram, and any deviations from standard design-checker rules.

The evaluation focused on RTL code quality, comprehensive documentation, and the design's ability to compile with various parameter values.

Conclusion

This project represents a significant step towards optimizing matrix multiplication processes by reducing the reliance on CPU architecture limitations. By designing a specialized hardware accelerator, we demonstrated an innovative approach to achieving faster and more efficient matrix calculations, crucial for the computational demands of modern machine learning applications.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
apbslave.v		apbslave.v
mat_pad.v		mat_pad.v
matmul_calc.v		matmul_calc.v
matmul_top.v		matmul_top.v
pe.v		pe.v
reg_file_op.v		reg_file_op.v
scrachpad.v		scrachpad.v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix Multiplication Accelerator Project

Introduction

Project Overview

Design Goals

Implementation Details

Key Features

Submission Requirements and Evaluation

Conclusion

About

Releases

Packages

Languages

Hugina/Systolic-Array-Verilog

Folders and files

Latest commit

History

Repository files navigation

Matrix Multiplication Accelerator Project

Introduction

Project Overview

Design Goals

Implementation Details

Key Features

Submission Requirements and Evaluation

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages