This repository accompanies the paper "Sparse Matrix Multiplication on Cerebras WSE-2: Evaluating SpMM Algorithms in Spatial Computing", where we explores efficient sparse matrix multiplication (SpMM) algorithms on the Cerebras WSE-2, a cutting-edge spatial accelerator with massive parallelization capabilities.
Sparse matrix multiplications are a fundamental component of various scientific disciplines, including computational physics, machine learning, and data analysis. Efficient and scalable algorithms for sparse matrix multiplications are essential for improving the performance of crucial computational methods and applications.
This work investigates the performance of sparse matrix multiplications on a novel platform, the Cerebras WSE-2. The spatial architecture of the WSE-2 enables unprecedented levels of parallelism, surpassing previous studies in the field. Our research involves implementing four sparse matrix multiplication algorithms, each utilizing different sparse storage formats. We optimize these implementations for overall performance on the WSE-2 and conduct performance analysis to isolate and examine computational efficiency.
Our findings indicate that optimizing sparse matrix multiplication for overall performance on the WSE-2 may lead to computation task performance degradation, particularly at higher sparsity levels. This highlights the trade-off between global performance optimization and computational efficiency, offering valuable insights for future hardware-aware SpMM optimizations on the Cerebras WSE-2.
Here are key references and learning materials related to the project:
- 🎥 Cerebras Hardware Video Playlist
- 🚀 Computational Fluid Dynamics Acceleration with Cerebras
- 🖥️ Cerebras SDK Overview Video
- 📄 Technical Overview of the Cerebras SDK (PDF)
- 📜 Graph Neural Networks (GNN) Paper
The repository is organized as follows:
📂 summaries/
– Summaries of reviewed material
📂 documents/
– Research papers and other relevant PDFs
📂 src/
– Source code for various implementations
📂 src/benchmarks/
– Benchmarking-related source code
📂 test/
– Input and output files for testing
📂 lib/
– External libraries used in the project
📂 plots/
– Final plots and figures from benchmarking
To generate a random sparse matrix with specified dimensions and density, run:
./a.out A_height A_width A_density Py Px Format
Parameter | Description |
---|---|
A_height |
Number of rows in matrix A (N) |
A_width |
Number of columns in matrix A (K) |
A_density |
Matrix density (as a percentage) |
Py |
Number of processing element (PE) rows |
Px |
Number of processing element (PE) columns |
Format |
0: CSC, 1: CSR, 2: Custom |
This command generates four output files prefixed with tmp
.
To add padding to the converted files (prefix tmp
), run:
python3 add_padding.py Format
Parameter | Description |
---|---|
Format |
0: CSC, 1: CSR, 2: Custom |
The script appends _pad
to the new padded files by default.
To execute a simulation on the WSE-2:
- Generate a sparse matrix and corresponding input files using:
src/sparse_format_convertors/convertor.c
- Add padding to the generated files:
src/sparse_format_convertors/add_padding.py
- Move the input files to the simulator's
test_vectors/
folder. - Modify
commands.sh
to configure the new input files, including:Nt
,Kt
,M
- Array lengths
- Prefix name (
A_prefix
) for input files (col_ptr
,row_ptr
,val
)
- Run the simulation with the adjusted parameters.
For automated validation of implementations, run:
src/automated_testing/full_test.sh
full_test.sh
, compile convertor.c
using GCC in src/sparse_format_convertors/
.
Contributions are welcome! If you have improvements or bug fixes, feel free to:
- Submit a pull request
- Report issues via GitHub Issues
For major changes, consider opening a discussion first.