diff --git a/docs/source/serving/compatibility_matrix.rst b/docs/source/serving/compatibility_matrix.rst
index 42480d849e9f8..18ee1f9403cde 100644
--- a/docs/source/serving/compatibility_matrix.rst
+++ b/docs/source/serving/compatibility_matrix.rst
@@ -3,25 +3,44 @@
Compatibility Matrix
====================
-The table below shows mutually exclusive features along with support for some device types.
+The tables below show mutually exclusive features and the support on some hardware.
+
+Feature x Feature
+^^^^^^^^^^^^^^^^^^
+
+.. raw:: html
+
+
.. list-table::
:header-rows: 1
- :widths: 20 8 8 8 8 8 8 8 8 8 8 8 8
+ :widths: auto
* - Feature
- Chunked Prefill
- APC
- LoRA
- Prompt Adapter
- - Speculative decoding
+ - SD
- CUDA Graphs
- - Encoder/Decoder
+ - Enc/Dec
- Logprobs
- Prompt Logprobs
- Async Output
- Multi-step
- - Multimodal
+ - MM
+ - Best-of
+ - Beam Search
+ - Guided Decoding
* - Chunked Prefill
-
-
@@ -35,7 +54,10 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
- * - APC
+ -
+ -
+ -
+ * - APC [#apc]_
- ✅
-
-
@@ -48,6 +70,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - LoRA
- ✗ `[C] `__
- ✅
@@ -61,6 +86,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - Prompt Adapter
- ✅
- ✅
@@ -74,7 +102,10 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
- * - Speculative decoding
+ -
+ -
+ -
+ * - SD [#sd]_
- ✗ `[C] `__ `[T] `__
- ✅
- ✗ `[C] `__
@@ -87,6 +118,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - CUDA Graphs
- ✅
- ✅
@@ -100,13 +134,19 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
- * - Encoder/Decoder
+ -
+ -
+ -
+ * - Enc/Dec [#enc_dec]_
- ✗ `[C] `__
- ✗ `[C] `__ `[T] `__
- ✗ `[C] `__
- ✗ `[C] `__
- ✗ `[C] `__ `[T] `__
- - ✗ `[C] `__ `[T] `__
+ - ✅
+ -
+ -
+ -
-
-
-
@@ -126,6 +166,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - Prompt Logprobs
- ✅
- ✅
@@ -139,6 +182,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - Async Output
- ✅
- ✅
@@ -152,6 +198,9 @@ The table below shows mutually exclusive features along with support for some de
-
-
-
+ -
+ -
+ -
* - Multi-step
- ✗ `[C] `__
- ✅
@@ -165,19 +214,97 @@ The table below shows mutually exclusive features along with support for some de
- ✅
-
-
- * - Multimodal
+ -
+ -
+ -
+ * - MM [#mm]_
- ✗ `[T] `__
- ✗ `[T] `__
- ✗ `[T] `__
- ?
- ?
- ✅
- - ✗ `[C] `__
+ - ✗ `[C] `__
+ - ✅
+ - ✅
+ - ?
+ - ?
+ -
+ -
+ -
+ -
+ * - Best-of
+ - ✅
+ - ✅
+ - ✅
+ - ✅
+ - ✗ `[T] `__
+ - ✅
+ - ✅
- ✅
- ✅
- ?
+ - ✗ `[T] `__
- ?
-
+ -
+ -
+ * - Beam Search
+ - ✅
+ - ✅
+ - ✅
+ - ✅
+ - ✗ `[T] `__
+ - ✅
+ - ✅
+ - ✅
+ - ✅
+ - ?
+ - ✗ `[T] `__
+ - ?
+ - ✅
+ -
+ -
+ * - Guided Decoding
+ - ✅
+ - ✅
+ - ?
+ - ?
+ - ✅
+ - ✅
+ - ?
+ - ✅
+ - ✅
+ - ✅
+ - ✗ `[C] `__
+ - ?
+ - ✅
+ - ✅
+ -
+
+Feature x Hardware
+^^^^^^^^^^^^^^^^^^
+
+.. list-table::
+ :header-rows: 1
+ :widths: auto
+
+ * - Feature
+ - Chunked Prefill
+ - APC
+ - LoRA
+ - Prompt Adapter
+ - SD
+ - CUDA Graphs
+ - Enc/Dec
+ - Logprobs
+ - Prompt Logprobs
+ - Async Output
+ - Multi-step
+ - MM
+ - Best-of
+ - Beam Search
+ - Guided Decoding
* - NVIDIA
- ✅
- ✅
@@ -191,6 +318,9 @@ The table below shows mutually exclusive features along with support for some de
- ✅
- ✅
- ✅
+ - ✅
+ - ✅
+ - ✅
* - CPU
- ✗ `[C] `__
- ✗ `[C] `__
@@ -204,6 +334,9 @@ The table below shows mutually exclusive features along with support for some de
- ✗ `[C] `__
- ✗ `[T] `__
- ✅
+ - ✅
+ - ✅
+ - ✅
* - AMD
- ✅
- ✅
@@ -217,16 +350,26 @@ The table below shows mutually exclusive features along with support for some de
- ✅
- ✗ `[T] `__
- ✅
+ - ✅
+ - ✅
+ - ✅
+For NVIDIA this table is valid for Ampere architecture and newer (compute capability ≥ 8). Older versions were not tested and it is not guaranteed they work.
Note:
+^^^^^
- [C] stands for code checks, that is, there is a checking on running that verify if the combinations is valid and raises and error or log a warning disabling the feature.
- [T] stands for tracking issues or pull requests on vLLM Repo.
-- APC stands for Automatic Prefix Caching.
- Async output processing needs CUDA Graphs activated to work, there is a code check in the table to inform that. It is the only ✅ with a [C].
- Encoder/decoder currently does not work with CUDA Graphs, therefore it is not compatible with Async output processing as well.
+Legend:
+^^^^^^^
+.. [#apc] Automatic Prefix Caching
+.. [#sd] Speculative Decoding
+.. [#enc_dec] Encoder/Decoder Models
+.. [#mm] Multimodal Models
..
TODO: Add support for remaining devices.
\ No newline at end of file