-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] integrate zeknox GPU-acceleration library into gnark #1332
base: master
Are you sure you want to change the base?
Conversation
@ivokub need your help and review! |
On it. Would it be possible to allow adding commits directly to the branch for easier review? |
Sure, I've add grant you push permission in https://github.com/okx/gnark/invitations Let me delete those examples to keep the PR clean |
I'm not able to create a proof for now, in the debug logs I see the last action is: �[90m14:05:05�[0m DBG Bs.MultiExp done �[36mMSMG2 5 took=�[0m0.86421 �[36macceleration=�[0mzeknox �[36mbackend=�[0mgroth16 �[36mcurve=�[0mbn254 �[36mnbConstraints=�[0m6 I guess it is probably some deadlock somewhere. Have you been able to run end-to-end prover? |
Hi Ivo, May I check: if you use the precompiled zeknox libraries, does your GPU have compute capability 8.6 or 8.9? (only these two are supported by our precompiled libraries). On our systems, the end-to-end example (go run -tags=zeknox examples/zeknox/main.go) is working. |
I'm using AWS g4dn.xlarge instance which by documentation is T4. And it seems it is compute capability 7.5. Should it work if I compile the libraries myself? I started compiling them, but it took quite a bit of time and I didn't let it terminate. When I benchmarked previously, then g4dn was quite good balance between performance and $-per-proof cost. |
Yeah, compile by yourself should work. Compile BN254 MSM G2 takes ~5mins on our device. expect a long compile time |
Indeed I got it working and the speedup is similar to the one claimed in the PR (1.6x). I also had to build libblst. But now it seems that there is an issue with the proof, I get invalid proof:
I could try looking into it, but it would probably take a bit time to compare the computed values against CPU execution - would it be possible to try out with another GPU and see if you hit the same problem? |
This is an edge case. We found this bug, tried many methods to fix it, but it still happens... |
Description
This PR aims to integrate zeknox GPU-acceleration library into gnark. Specifically, this PR targets the GPU (NVIDIA CUDA) acceleration of groth16 backend over BN254. In addition, this PR adds a new example consisting of proving/verifying a batch of secp256r1 (P256) signatures. Our benchmarking shows 1.54-1.57X speedup of the CPU+GPU execution (with zeknox) compared to the default CPU-only execution.
In summary, we did the following addition:
backend/groth16/bn254/zeknox
folder.backend/groth16/bn254/prove.go
printed in debug mode.examples/p256
.README.md
on how to run gnark with zeknox.Type of change
How has this been tested?
We wrote new tests under
backend/groth16/bn254/zeknox
andexamples/p256
. In addition, we also run tests underbackend/groth16/bn254
.How has this been benchmarked?
We ran the P256 example to prove/verify a batch of 10 secp256r1 keys. The steps to run:
cd examples go build -tags zeknox ./examples
Results
The times below represent the proving time (in milliseconds) for 10 secp256r1 keys.
Checklist:
golangci-lint
does not output errors locally