-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for dimensions field like in OpenAI text-embedding-3, thanks #476
Comments
Hey, you are welcome to work on this. I have seen you never contributed, so here is a Add a This needs modifications in batch handler for embed,audio,video.
There is also engine + syncengine, which needs three times the integration for above. -> Add as test in tests/end-to-end/test_dummyengine.py (will only be based on numpy model) You also need to make the parameters / signature available to matryoska dim. Support in the pydantic model:
Should be a small code change (~50 LOC in 10 files), but its the first time adding a request time parameter and therefore needs extensive testing. |
okay. Let me work on this. |
It will make a lot of sense to first know the basics of git & the feature branch pattern, as well as being familiar with inheritance & unit testing in Python. |
@ericg108 with the recent merge to master, you should be able now to truncate the returned embeddings by passing in a Let me know if that works for you. |
@wirthual thank you for your effort. It took me a lot time to learn the codebase and still I didn't find the place to modify the |
@ericg108 yes, that's the feature you requested. |
@michaelfeil got it, thanks! |
Yes, its the same as passing return embeddings[..., :truncate_dim] However if you managed to pass truncate_dims to Sentence Transformers, this might be of interest for issue #482 Cheers |
Feature request
when initializing with SentenceTransformers, we can use the
truncate_dim
argument, like below:model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1", truncate_dim=dimensions)
and in calling OpenAI text-embedding-3, we can also pass a `` argument to get variant-length embeddings
dimensions integer Optional The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
see also: https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions
Motivation
more and more embedding models are supporting Matryoshka embeddings, namely allowing users to get dimensions of varying length, like mxbai-embed-large-v1, jina-embeddings-v3 etc.
this is very useful in scenarios with limited resources. hope it could be supported. Thanks.
Your contribution
I guess it's not a big modification. I may be able to add this feature when I'm told where to modify. Thanks.
The text was updated successfully, but these errors were encountered: