Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] vLLM Roadmap Q1 2025 #11862

Open
36 tasks
simon-mo opened this issue Jan 8, 2025 · 0 comments
Open
36 tasks

[Roadmap] vLLM Roadmap Q1 2025 #11862

simon-mo opened this issue Jan 8, 2025 · 0 comments

Comments

@simon-mo
Copy link
Collaborator

simon-mo commented Jan 8, 2025

This page is accessible via roadmap.vllm.ai

This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack

vLLM Core

These projects will deliver performance enhancements to majority of workloads running on vLLM, and the core team has assigned priorities to signal what must get done. Help is also wanted here, especially for people want to get more involved in the core of vLLM.

Ship a performant and modular V1 architecture (#8779, #sig-v1)

Support large and long context models

  • (P0) MoE optimizations: Data Parallel for Attention + Expert Parallel for MoE
  • (P1) Productionize Prefill Disaggregation
  • (P1) Productionize KV Cache offloading to CPU and disk
  • (Help Wanted) Investigate context parallelism

Improved performance in batch mode

  • (P0) Optimized vLLM in post training workflow (#sig-post-training)
  • (P1) Efficiency in batch inference and long generations

Others

  • (P0) Blackwell Support
  • (P1) Track vLLM Performance
  • (Help Wanted) Extensible sampler

Model Support

Hardware Support

  • PagedAttention and Chunked Prefill on Trainium and Inferentia
  • Progress in Gaudi Support
  • Out of tree support for IBM Spyre, Ascend, and Tenstorrent ([RFC]: Hardware pluggable #11162)

Optimizations

  • AsyncTP
  • FlashAttention3
  • Design for sparse KV cache framework

CI and Developer Productivity

  • Wheel server
  • Multi-platform wheels and docker
  • Better performance tracker
  • Easier installation (optional dependencies, separate kernel packages)

Ecosystem Projects

These are independent projects that we love to have native collaboration and integration with!

  • Distributed batch inference
  • Large scale serving
  • Prefix aware router
  • Multi-modality output
  • Collaboration with HuggingFace
  • Collaboration with Ollama

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #9006, #5805, #3861, #2681, #244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant