Add support for nvidia modelopt fp8 kv cache #3223

Edwardf0t1 · 2025-01-30T22:53:40Z

Motivation

Add support for modelopt fp8 kv cache, so SGLang can run modelopt's quantized models with fp8 kv cache enabled.

Modifications

Add a ModelOptFp8KVCacheMethod class.
Handle kv cache scalers name matching.

Test

nvidia/Llama-3.1-8B-Instruct-FP8 will be updated with FP8 kv cache enabled.

import sglang as sgl

def main():
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = {"temperature": 0.8, "top_p": 0.95}
    llm = sgl.Engine(model_path="nvidia/Llama-3.1-8B-Instruct-FP8", quantization="modelopt")

    outputs = llm.generate(prompts, sampling_params)
    for prompt, output in zip(prompts, outputs):
        print("===============================")
        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

if __name__ == "__main__":
    main()

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling.

Edwardf0t1 marked this pull request as ready for review January 30, 2025 22:54

Edwardf0t1 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners January 30, 2025 22:54

Edwardf0t1 added 3 commits February 1, 2025 00:08

fix format

ca781bd

fix format

a8babf8

fix format

12f49e1

Edwardf0t1 force-pushed the zhiyu/enable-modelopt-fp8-kv-cache branch from eb0d651 to 12f49e1 Compare February 1, 2025 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for nvidia modelopt fp8 kv cache #3223

Add support for nvidia modelopt fp8 kv cache #3223

Edwardf0t1 commented Jan 30, 2025

Add support for nvidia modelopt fp8 kv cache #3223

Are you sure you want to change the base?

Add support for nvidia modelopt fp8 kv cache #3223

Conversation

Edwardf0t1 commented Jan 30, 2025

Motivation

Modifications

Test

Checklist