int 8 experimental support #188

puppetm4st3r · 2024-04-01T16:45:47Z

puppetm4st3r
Apr 1, 2024

Hi!, how do I can test the experimental int8 support, I understand that int8 will work on any cuda device, or is a restriction on the gpu model ? fp8 is for h100 and greater and understand.

michaelfeil · 2024-04-01T16:56:05Z

michaelfeil
Apr 1, 2024
Maintainer

This should do it.

pip install infinity_emb[all]
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype int8
# is only slightly faster than
infinity_emb --no-bettertransformer --model-name-or-path BAAI/bge-small-en-v1.5 --dtype float16

Performance is not significantly faster, and memory savings small due to batch size. Also all weights are loaded in fp32 in any case.
int8 works on cpu / cuda. fp8 on h100/amd mi 300+ only

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int 8 experimental support #188

{{title}}

Replies: 1 comment

{{title}}

Select a reply

int 8 experimental support #188

puppetm4st3r Apr 1, 2024

Replies: 1 comment

michaelfeil Apr 1, 2024 Maintainer

puppetm4st3r
Apr 1, 2024

michaelfeil
Apr 1, 2024
Maintainer