int 8 experimental support #188
puppetm4st3r
started this conversation in
General
Replies: 1 comment
-
This should do it. pip install infinity_emb[all]
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype int8
# is only slightly faster than
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype float16 Performance is not significantly faster, and memory savings small due to batch size. Also all weights are loaded in fp32 in any case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!, how do I can test the experimental int8 support, I understand that int8 will work on any cuda device, or is a restriction on the gpu model ? fp8 is for h100 and greater and understand.
Beta Was this translation helpful? Give feedback.
All reactions