feat: Offline batched inference based on OpenAI offline batching API #47

gaocegege · 2025-01-31T01:35:17Z

This issue is created to track the conversation about the offline batch API.

gaocegege@

I was just wondering if there are any plans to add openai api compatible offline batch support in the router. I saw a comment about it vllm-project/vllm#1636 (comment), and it looks like this feature needs file interfaces for uploads, which might not fit well with vLLM. It could be a nice addition to the router, acting as a bridge between users and vLLM.

ApostaC@

Seems like there should be multiple design choices to handle file uploads. Will add this to the roadmap (which will be released as an issue in the project tomorrow). Maybe we can have more detailed discussion there?

simon-mo@

I also agree this can be a lightweight optional component in the stack! given in K8s you can provision persistent volume easily or mount s3-fuse

gaocegege · 2025-01-31T01:38:57Z

We could start by keeping the files in the local file system, especially since our router currently doesn't support multi-instance deployment. We can design a robust abstraction for file uploads to ensure extensibility, allowing us to support other storage backends (e.g., MinIO, S3) in the future, particularly for k8s environments.

ApostaC · 2025-01-31T22:04:21Z

@gaocegege Thanks for leading the discussion here!

We could start by keeping the files in the local file system, especially since our router currently doesn't support multi-instance deployment.

The proposal looks good to me! Would you like to take a stab at designing the concrete interface in the router?

gaocegege · 2025-01-31T22:59:07Z

Yes! I'd be happy

gaocegege mentioned this issue Feb 1, 2025

feat: OpenAI batch API part 1 #52

Merged

ApostaC mentioned this issue Feb 3, 2025

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Open

15 tasks

gaocegege added the feature request New feature or request label Feb 7, 2025

gaocegege mentioned this issue Feb 11, 2025

[Router] Support Batch API part 2 #109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Offline batched inference based on OpenAI offline batching API #47

feat: Offline batched inference based on OpenAI offline batching API #47

gaocegege commented Jan 31, 2025

gaocegege commented Jan 31, 2025

ApostaC commented Jan 31, 2025

gaocegege commented Jan 31, 2025

feat: Offline batched inference based on OpenAI offline batching API #47

feat: Offline batched inference based on OpenAI offline batching API #47

Comments

gaocegege commented Jan 31, 2025

gaocegege commented Jan 31, 2025

ApostaC commented Jan 31, 2025

gaocegege commented Jan 31, 2025