[Doc] split table for features and devices and added best-of, beam se…

…arch and guided decoding Signed-off-by: Wallas Santos <[email protected]>
vllm-project · Sep 25, 2024 · 19e1482 · 19e1482
1 parent f89582d
commit 19e1482
Showing 1 changed file with 155 additions and 12 deletions.
diff --git a/docs/source/serving/compatibility_matrix.rst b/docs/source/serving/compatibility_matrix.rst
@@ -3,25 +3,44 @@
 Compatibility Matrix
 ====================
 
-The table below shows mutually exclusive features along with support for some device types. 
+The tables below show mutually exclusive features and the support on some hardware. 
+
+Feature x Feature
+^^^^^^^^^^^^^^^^^^
+
+.. raw:: html
+
+    <style>
+      /* Make smaller to try to improve readability  */
+      td {
+        font-size: 0.8rem;
+      }
+
+      th {
+        font-size: 0.8rem;
+      }
+    </style>
 
 .. list-table::
    :header-rows: 1
-   :widths: 20 8 8 8 8 8 8 8 8 8 8 8 8
+   :widths: auto
 
    * - Feature
      - Chunked Prefill
      - APC
      - LoRA
      - Prompt Adapter
-     - Speculative decoding
+     - SD
      - CUDA Graphs
-     - Encoder/Decoder
+     - Enc/Dec
      - Logprobs
      - Prompt Logprobs
      - Async Output
      - Multi-step
-     - Multimodal
+     - MM
+     - Best-of
+     - Beam Search
+     - Guided Decoding
    * - Chunked Prefill
      - 
      - 
@@ -35,7 +54,10 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
-   * - APC
+     - 
+     - 
+     - 
+   * - APC [#apc]_
      - ✅
      - 
      - 
@@ -48,6 +70,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - LoRA
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L1558>`__ 
      - ✅
@@ -61,6 +86,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - Prompt Adapter
      - ✅
      - ✅
@@ -74,7 +102,10 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
-   * - Speculative decoding
+     - 
+     - 
+     - 
+   * - SD [#sd]_
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L1200>`__  `[T] <https://github.com/vllm-project/vllm/issues/5016>`__ 
      - ✅
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/spec_decode/spec_decode_worker.py#L86-L87>`__ 
@@ -87,6 +118,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - CUDA Graphs
      - ✅
      - ✅
@@ -100,13 +134,19 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
-   * - Encoder/Decoder
+     - 
+     - 
+     - 
+   * - Enc/Dec [#enc_dec]_
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L25>`__ 
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L17>`__ `[T] <https://github.com/vllm-project/vllm/issues/7366>`__ 
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L35>`__ 
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L55>`__ 
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L47>`__ `[T] <https://github.com/vllm-project/vllm/issues/7366>`__ 
-     - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/utils.py#L51>`__ `[T] <https://github.com/vllm-project/vllm/issues/7447>`__ 
+     - ✅
+     - 
+     - 
+     - 
      - 
      - 
      - 
@@ -126,6 +166,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - Prompt Logprobs
      - ✅
      - ✅
@@ -139,6 +182,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - Async Output
      - ✅
      - ✅
@@ -152,6 +198,9 @@ The table below shows mutually exclusive features along with support for some de
      - 
      - 
      - 
+     - 
+     - 
+     - 
    * - Multi-step
      - ✗ `[C] <https://github.com/vllm-project/vllm/blob/7de49aa86c7f169eb0962b6db29ad53fff519ffb/vllm/engine/arg_utils.py#L944>`__ 
      - ✅
@@ -165,19 +214,97 @@ The table below shows mutually exclusive features along with support for some de
      - ✅
      - 
      - 
-   * - Multimodal
+     - 
+     - 
+     - 
+   * - MM [#mm]_
      - ✗ `[T] <https://github.com/vllm-project/vllm/pull/8346>`__ 
      - ✗ `[T] <https://github.com/vllm-project/vllm/pull/8348>`__ 
      - ✗ `[T] <https://github.com/vllm-project/vllm/pull/7199>`__ 
      - ?
      - ?
      - ✅
-     - ✗ `[C] <https://github.com/vllm-project/vllm/blob/main/vllm/inputs/preprocess.py#L300>`__ 
+     - ✗ `[C] <https://github.com/vllm-project/vllm/blob/260d40b5ea48df9421325388abcc8d907a560fc5/vllm/inputs/preprocess.py#L300>`__ 
+     - ✅
+     - ✅
+     - ?
+     - ?
+     - 
+     - 
+     - 
+     - 
+   * - Best-of
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ✗ `[T] <https://github.com/vllm-project/vllm/issues/6137>`__ 
+     - ✅
+     - ✅
      - ✅
      - ✅
      - ?
+     - ✗ `[T] <https://github.com/vllm-project/vllm/issues/7968>`__ 
      - ?
      - 
+     - 
+     - 
+   * - Beam Search
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ✗ `[T] <https://github.com/vllm-project/vllm/issues/6137>`__ 
+     - ✅
+     - ✅
+     - ✅
+     - ✅
+     - ?
+     - ✗ `[T] <https://github.com/vllm-project/vllm/issues/7968>`__ 
+     - ?
+     - ✅
+     - 
+     - 
+   * - Guided Decoding
+     - ✅
+     - ✅
+     - ?
+     - ?
+     - ✅
+     - ✅
+     - ?
+     - ✅
+     - ✅
+     - ✅
+     - ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/worker/multi_step_model_runner.py#L655>`__ 
+     - ?
+     - ✅
+     - ✅
+     - 
+
+Feature x Hardware
+^^^^^^^^^^^^^^^^^^
+
+.. list-table::
+   :header-rows: 1
+   :widths: auto
+
+   * - Feature
+     - Chunked Prefill
+     - APC
+     - LoRA
+     - Prompt Adapter
+     - SD
+     - CUDA Graphs
+     - Enc/Dec
+     - Logprobs
+     - Prompt Logprobs
+     - Async Output
+     - Multi-step
+     - MM
+     - Best-of
+     - Beam Search
+     - Guided Decoding
    * - NVIDIA
      - ✅
      - ✅
@@ -191,6 +318,9 @@ The table below shows mutually exclusive features along with support for some de
      - ✅
      - ✅
      - ✅
+     - ✅
+     - ✅
+     - ✅
    * - CPU
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/executor/cpu_executor.py#L337>`__ 
      - ✗  `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/executor/cpu_executor.py#L345>`__ 
@@ -204,6 +334,9 @@ The table below shows mutually exclusive features along with support for some de
      - ✗ `[C] <https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/vllm/config.py#L370>`__ 
      - ✗ `[T] <https://github.com/vllm-project/vllm/issues/8477>`__ 
      - ✅
+     - ✅
+     - ✅
+     - ✅
    * - AMD
      - ✅
      - ✅
@@ -217,16 +350,26 @@ The table below shows mutually exclusive features along with support for some de
      - ✅
      - ✗ `[T] <https://github.com/vllm-project/vllm/issues/8472>`__ 
      - ✅
+     - ✅
+     - ✅
+     - ✅
 
+For NVIDIA this table is valid for Ampere architecture and newer (compute capability ≥ 8). Older versions were not tested and it is not guaranteed they work.
 
 Note:
+^^^^^
 
 - [C] stands for code checks, that is, there is a checking on running that verify if the combinations is valid and raises and error or log a warning disabling the feature. 
 - [T] stands for tracking issues or pull requests on vLLM Repo.
-- APC stands for Automatic Prefix Caching.
 - Async output processing needs CUDA Graphs activated to work, there is a code check in the table to inform that. It is the only ✅ with a [C].
 - Encoder/decoder currently does not work with CUDA Graphs, therefore it is not compatible with Async output processing as well. 
 
+Legend:
+^^^^^^^
 
+.. [#apc] Automatic Prefix Caching
+.. [#sd] Speculative Decoding
+.. [#enc_dec] Encoder/Decoder Models
+.. [#mm] Multimodal Models
 ..
   TODO: Add support for remaining devices.