Skip to content

Commit

Permalink
[Doc] split table for features and devices and added best-of, beam se…
Browse files Browse the repository at this point in the history
…arch and guided decoding

Signed-off-by: Wallas Santos <[email protected]>
  • Loading branch information
wallashss committed Sep 25, 2024
1 parent f89582d commit 19e1482
Showing 1 changed file with 155 additions and 12 deletions.
167 changes: 155 additions & 12 deletions docs/source/serving/compatibility_matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,44 @@
Compatibility Matrix
====================

The table below shows mutually exclusive features along with support for some device types.
The tables below show mutually exclusive features and the support on some hardware.

Feature x Feature
^^^^^^^^^^^^^^^^^^

.. raw:: html

<style>
/* Make smaller to try to improve readability */
td {
font-size: 0.8rem;
}
th {
font-size: 0.8rem;
}
</style>

.. list-table::
:header-rows: 1
:widths: 20 8 8 8 8 8 8 8 8 8 8 8 8
:widths: auto

* - Feature
- Chunked Prefill
- APC
- LoRA
- Prompt Adapter
- Speculative decoding
- SD
- CUDA Graphs
- Encoder/Decoder
- Enc/Dec
- Logprobs
- Prompt Logprobs
- Async Output
- Multi-step
- Multimodal
- MM
- Best-of
- Beam Search
- Guided Decoding
* - Chunked Prefill
-
-
Expand All @@ -35,7 +54,10 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
* - APC
-
-
-
* - APC [#apc]_
- ✅
-
-
Expand All @@ -48,6 +70,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - LoRA
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L1558>`__
- ✅
Expand All @@ -61,6 +86,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - Prompt Adapter
- ✅
- ✅
Expand All @@ -74,7 +102,10 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
* - Speculative decoding
-
-
-
* - SD [#sd]_
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L1200>`__ `[T] <https://github.com/vllm-project/vllm/issues/5016>`__
- ✅
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/spec_decode/spec_decode_worker.py#L86-L87>`__
Expand All @@ -87,6 +118,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - CUDA Graphs
- ✅
- ✅
Expand All @@ -100,13 +134,19 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
* - Encoder/Decoder
-
-
-
* - Enc/Dec [#enc_dec]_
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L25>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L17>`__ `[T] <https://github.com/vllm-project/vllm/issues/7366>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L35>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L55>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L47>`__ `[T] <https://github.com/vllm-project/vllm/issues/7366>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L51>`__ `[T] <https://github.com/vllm-project/vllm/issues/7447>`__
- ✅
-
-
-
-
-
-
Expand All @@ -126,6 +166,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - Prompt Logprobs
- ✅
- ✅
Expand All @@ -139,6 +182,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - Async Output
- ✅
- ✅
Expand All @@ -152,6 +198,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
-
-
-
* - Multi-step
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/7de49aa86c7f169eb0962b6db29ad53fff519ffb/vllm/engine/arg_utils.py#L944>`__
- ✅
Expand All @@ -165,19 +214,97 @@ The table below shows mutually exclusive features along with support for some de
- ✅
-
-
* - Multimodal
-
-
-
* - MM [#mm]_
- ✗ `[T] <https://github.com/vllm-project/vllm/pull/8346>`__
- ✗ `[T] <https://github.com/vllm-project/vllm/pull/8348>`__
- ✗ `[T] <https://github.com/vllm-project/vllm/pull/7199>`__
- ?
- ?
- ✅
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/main/vllm/inputs/preprocess.py#L300>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/260d40b5ea48df9421325388abcc8d907a560fc5/vllm/inputs/preprocess.py#L300>`__
- ✅
- ✅
- ?
- ?
-
-
-
-
* - Best-of
- ✅
- ✅
- ✅
- ✅
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/6137>`__
- ✅
- ✅
- ✅
- ✅
- ?
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/7968>`__
- ?
-
-
-
* - Beam Search
- ✅
- ✅
- ✅
- ✅
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/6137>`__
- ✅
- ✅
- ✅
- ✅
- ?
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/7968>`__
- ?
- ✅
-
-
* - Guided Decoding
- ✅
- ✅
- ?
- ?
- ✅
- ✅
- ?
- ✅
- ✅
- ✅
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/multi_step_model_runner.py#L655>`__
- ?
- ✅
- ✅
-

Feature x Hardware
^^^^^^^^^^^^^^^^^^

.. list-table::
:header-rows: 1
:widths: auto

* - Feature
- Chunked Prefill
- APC
- LoRA
- Prompt Adapter
- SD
- CUDA Graphs
- Enc/Dec
- Logprobs
- Prompt Logprobs
- Async Output
- Multi-step
- MM
- Best-of
- Beam Search
- Guided Decoding
* - NVIDIA
- ✅
- ✅
Expand All @@ -191,6 +318,9 @@ The table below shows mutually exclusive features along with support for some de
- ✅
- ✅
- ✅
- ✅
- ✅
- ✅
* - CPU
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/executor/cpu_executor.py#L337>`__
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/executor/cpu_executor.py#L345>`__
Expand All @@ -204,6 +334,9 @@ The table below shows mutually exclusive features along with support for some de
- ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L370>`__
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/8477>`__
- ✅
- ✅
- ✅
- ✅
* - AMD
- ✅
- ✅
Expand All @@ -217,16 +350,26 @@ The table below shows mutually exclusive features along with support for some de
- ✅
- ✗ `[T] <https://github.com/vllm-project/vllm/issues/8472>`__
- ✅
- ✅
- ✅
- ✅

For NVIDIA this table is valid for Ampere architecture and newer (compute capability ≥ 8). Older versions were not tested and it is not guaranteed they work.

Note:
^^^^^

- [C] stands for code checks, that is, there is a checking on running that verify if the combinations is valid and raises and error or log a warning disabling the feature.
- [T] stands for tracking issues or pull requests on vLLM Repo.
- APC stands for Automatic Prefix Caching.
- Async output processing needs CUDA Graphs activated to work, there is a code check in the table to inform that. It is the only ✅ with a [C].
- Encoder/decoder currently does not work with CUDA Graphs, therefore it is not compatible with Async output processing as well.

Legend:
^^^^^^^

.. [#apc] Automatic Prefix Caching
.. [#sd] Speculative Decoding
.. [#enc_dec] Encoder/Decoder Models
.. [#mm] Multimodal Models
..
TODO: Add support for remaining devices.

0 comments on commit 19e1482

Please sign in to comment.