From 19e14828f36bf06e61441295ba7bfa95be4fc3c7 Mon Sep 17 00:00:00 2001 From: Wallas Santos Date: Wed, 25 Sep 2024 19:55:52 -0300 Subject: [PATCH] [Doc] split table for features and devices and added best-of, beam search and guided decoding Signed-off-by: Wallas Santos --- docs/source/serving/compatibility_matrix.rst | 167 +++++++++++++++++-- 1 file changed, 155 insertions(+), 12 deletions(-) diff --git a/docs/source/serving/compatibility_matrix.rst b/docs/source/serving/compatibility_matrix.rst index 42480d849e9f8..18ee1f9403cde 100644 --- a/docs/source/serving/compatibility_matrix.rst +++ b/docs/source/serving/compatibility_matrix.rst @@ -3,25 +3,44 @@ Compatibility Matrix ==================== -The table below shows mutually exclusive features along with support for some device types. +The tables below show mutually exclusive features and the support on some hardware. + +Feature x Feature +^^^^^^^^^^^^^^^^^^ + +.. raw:: html + + .. list-table:: :header-rows: 1 - :widths: 20 8 8 8 8 8 8 8 8 8 8 8 8 + :widths: auto * - Feature - Chunked Prefill - APC - LoRA - Prompt Adapter - - Speculative decoding + - SD - CUDA Graphs - - Encoder/Decoder + - Enc/Dec - Logprobs - Prompt Logprobs - Async Output - Multi-step - - Multimodal + - MM + - Best-of + - Beam Search + - Guided Decoding * - Chunked Prefill - - @@ -35,7 +54,10 @@ The table below shows mutually exclusive features along with support for some de - - - - * - APC + - + - + - + * - APC [#apc]_ - ✅ - - @@ -48,6 +70,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - LoRA - ✗ `[C] `__ - ✅ @@ -61,6 +86,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - Prompt Adapter - ✅ - ✅ @@ -74,7 +102,10 @@ The table below shows mutually exclusive features along with support for some de - - - - * - Speculative decoding + - + - + - + * - SD [#sd]_ - ✗ `[C] `__ `[T] `__ - ✅ - ✗ `[C] `__ @@ -87,6 +118,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - CUDA Graphs - ✅ - ✅ @@ -100,13 +134,19 @@ The table below shows mutually exclusive features along with support for some de - - - - * - Encoder/Decoder + - + - + - + * - Enc/Dec [#enc_dec]_ - ✗ `[C] `__ - ✗ `[C] `__ `[T] `__ - ✗ `[C] `__ - ✗ `[C] `__ - ✗ `[C] `__ `[T] `__ - - ✗ `[C] `__ `[T] `__ + - ✅ + - + - + - - - - @@ -126,6 +166,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - Prompt Logprobs - ✅ - ✅ @@ -139,6 +182,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - Async Output - ✅ - ✅ @@ -152,6 +198,9 @@ The table below shows mutually exclusive features along with support for some de - - - + - + - + - * - Multi-step - ✗ `[C] `__ - ✅ @@ -165,19 +214,97 @@ The table below shows mutually exclusive features along with support for some de - ✅ - - - * - Multimodal + - + - + - + * - MM [#mm]_ - ✗ `[T] `__ - ✗ `[T] `__ - ✗ `[T] `__ - ? - ? - ✅ - - ✗ `[C] `__ + - ✗ `[C] `__ + - ✅ + - ✅ + - ? + - ? + - + - + - + - + * - Best-of + - ✅ + - ✅ + - ✅ + - ✅ + - ✗ `[T] `__ + - ✅ + - ✅ - ✅ - ✅ - ? + - ✗ `[T] `__ - ? - + - + - + * - Beam Search + - ✅ + - ✅ + - ✅ + - ✅ + - ✗ `[T] `__ + - ✅ + - ✅ + - ✅ + - ✅ + - ? + - ✗ `[T] `__ + - ? + - ✅ + - + - + * - Guided Decoding + - ✅ + - ✅ + - ? + - ? + - ✅ + - ✅ + - ? + - ✅ + - ✅ + - ✅ + - ✗ `[C] `__ + - ? + - ✅ + - ✅ + - + +Feature x Hardware +^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :header-rows: 1 + :widths: auto + + * - Feature + - Chunked Prefill + - APC + - LoRA + - Prompt Adapter + - SD + - CUDA Graphs + - Enc/Dec + - Logprobs + - Prompt Logprobs + - Async Output + - Multi-step + - MM + - Best-of + - Beam Search + - Guided Decoding * - NVIDIA - ✅ - ✅ @@ -191,6 +318,9 @@ The table below shows mutually exclusive features along with support for some de - ✅ - ✅ - ✅ + - ✅ + - ✅ + - ✅ * - CPU - ✗ `[C] `__ - ✗ `[C] `__ @@ -204,6 +334,9 @@ The table below shows mutually exclusive features along with support for some de - ✗ `[C] `__ - ✗ `[T] `__ - ✅ + - ✅ + - ✅ + - ✅ * - AMD - ✅ - ✅ @@ -217,16 +350,26 @@ The table below shows mutually exclusive features along with support for some de - ✅ - ✗ `[T] `__ - ✅ + - ✅ + - ✅ + - ✅ +For NVIDIA this table is valid for Ampere architecture and newer (compute capability ≥ 8). Older versions were not tested and it is not guaranteed they work. Note: +^^^^^ - [C] stands for code checks, that is, there is a checking on running that verify if the combinations is valid and raises and error or log a warning disabling the feature. - [T] stands for tracking issues or pull requests on vLLM Repo. -- APC stands for Automatic Prefix Caching. - Async output processing needs CUDA Graphs activated to work, there is a code check in the table to inform that. It is the only ✅ with a [C]. - Encoder/decoder currently does not work with CUDA Graphs, therefore it is not compatible with Async output processing as well. +Legend: +^^^^^^^ +.. [#apc] Automatic Prefix Caching +.. [#sd] Speculative Decoding +.. [#enc_dec] Encoder/Decoder Models +.. [#mm] Multimodal Models .. TODO: Add support for remaining devices. \ No newline at end of file