Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(embedding): always request embedding creation as base64 #1312

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

manekinekko
Copy link

Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

Closes #1310

  • I understand that this repository is auto-generated and my pull request may not be merged

Changes being requested

We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter.

Additional context & links

After running a few benchmarks, requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

This performance improvement could translate to:

  • ✅ Faster HTTP responses
  • ✅ Less bandwidth used when generating multiple embeddings

This is the result of a request that creates embedding from a 10kb chunk, run 10 times (the number are the size of response body in kb):

Benchmark Min (ms) Max (ms) Mean (ms) Min (+) Max (+) Mean (+)
float32 vs base64 41.742 19616.000 9848.819 40.094 (3.9%) 8351.000 (57.4%) 4206.126 (57.3%)

Read more #1310

@manekinekko manekinekko requested a review from a team as a code owner February 8, 2025 17:09
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter.

Closes openai#1310
Copy link
Collaborator

@RobertCraigie RobertCraigie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +20 to +22
// Force base64 encoding for vector embeddings creation
// See https://github.com/openai/openai-node/issues/1310
encoding_format: 'base64',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to always use base64, if the user explicitly asked for a different format we should have the exact same behaviour as we do prior to this PR which is to just let them.

console.log(embeddingBase64Obj);
const embeddingBase64Str = embeddingBase64Obj.embedding as unknown as string;
embeddingBase64Obj.embedding = Array.from(
new Float32Array(Buffer.from(embeddingBase64Str, 'base64').buffer),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is a Node.js specific API, we need to use something that is available everywhere, would be great to add a generic helper function in core.ts.

return base64Response._thenUnwrap((response) => {
if (response && response.data) {
response.data.forEach((embeddingBase64Obj) => {
console.log(embeddingBase64Obj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
console.log(embeddingBase64Obj);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf: Improve vector embeddings creation by 60%
2 participants