Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.1k
Star 33.6k

Code
Issues 1.2k
Pull requests 456
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Roadmap] vLLM Roadmap Q1 2025 #11862

Open

36 tasks

simon-mo opened this issue Jan 8, 2025 · 0 comments

Open

36 tasks

[Roadmap] vLLM Roadmap Q1 2025 #11862

simon-mo opened this issue Jan 8, 2025 · 0 comments

Comments

Copy link

Collaborator

simon-mo commented Jan 8, 2025 •

edited

Loading

This page is accessible via roadmap.vllm.ai

This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack

vLLM Core

These projects will deliver performance enhancements to majority of workloads running on vLLM, and the core team has assigned priorities to signal what must get done. Help is also wanted here, especially for people want to get more involved in the core of vLLM.

Ship a performant and modular V1 architecture (#8779, #sig-v1)

(P0) Optimized default path that is on by default
(P0) Speculative decoding (n-gram on by default)
(P0) Efficient memory manager for different shapes of KV cache ([RFC]: Hybrid Memory Allocator #11382)
(P1) Efficient structured decoding & Jump decoding in V1 ([RFC]: Implement Structured Output support for V1 engine #11908)
(P1) Full multi-modal support in V1
(P1) LoRA ([V1] LoRA Support #10957)
(P2) Hardware support
(P2) Pipeline parallelism
(P2) Extension system

Support large and long context models

(P0) MoE optimizations: Data Parallel for Attention + Expert Parallel for MoE
(P1) Productionize Prefill Disaggregation
(P1) Productionize KV Cache offloading to CPU and disk
(Help Wanted) Investigate context parallelism

Improved performance in batch mode

(P0) Optimized vLLM in post training workflow (#sig-post-training)
(P1) Efficiency in batch inference and long generations

Others

(P0) Blackwell Support
(P1) Track vLLM Performance
(Help Wanted) Extensible sampler

Model Support

Arbitrary HF model (DRAFT: Add transformers backend support #11330)
Alternative or private checkpoint format

Hardware Support

PagedAttention and Chunked Prefill on Trainium and Inferentia
Progress in Gaudi Support
Out of tree support for IBM Spyre, Ascend, and Tenstorrent ([RFC]: Hardware pluggable #11162)

Optimizations

AsyncTP
FlashAttention3
Design for sparse KV cache framework

CI and Developer Productivity

Wheel server
Multi-platform wheels and docker
Better performance tracker
Easier installation (optional dependencies, separate kernel packages)

Ecosystem Projects

These are independent projects that we love to have native collaboration and integration with!

Distributed batch inference
Large scale serving
Prefix aware router
Multi-modality output
Collaboration with HuggingFace
Collaboration with Ollama

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #9006, #5805, #3861, #2681, #244

The text was updated successfully, but these errors were encountered:

WZRPW and patrickvonplaten reacted with thumbs up emoji

mgoin, ruipeterpan, MengqingCao, jiangguochaoGG, Esthesia, wangxiyuan, houseroad, RuixiangMa, imkero, Yikun, and 2 more reacted with rocket emoji

All reactions

👍 2 reactions
🚀 12 reactions

simon-mo added misc and removed misc labels

simon-mo pinned this issue

FurtherAI mentioned this issue

[Feature]: Support Multiple Tasks Per Model #11905

Open

1 task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.