Skip to content

Commit

Permalink
chore: bump v0.4.1.post5 (#2840)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhyncs authored Jan 11, 2025
1 parent f0e15dc commit f624901
Show file tree
Hide file tree
Showing 6 changed files with 13 additions and 11 deletions.
4 changes: 3 additions & 1 deletion benchmark/deepseek_v3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ The SGLang and DeepSeek teams collaborated to get DeepSeek V3 FP8 running on NVI

Special thanks to Meituan's Search & Recommend Platform Team and Baseten's Model Performance Team for implementing the model, and DataCrunch for providing GPU resources.

For optimizations made on the DeepSeek series models regarding SGLang, please refer to https://sgl-project.github.io/references/deepseek.html

## Hardware Recommendation
- 8 x NVIDIA H200 GPUs

Expand All @@ -29,7 +31,7 @@ For high QPS scenarios, add the `--enable-dp-attention` argument to boost throug
### Using pip
```bash
# Installation
pip install "sglang[all]>=0.4.1.post3" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer
pip install "sglang[all]>=0.4.1.post5" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer

# Launch
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Usage (to build SGLang ROCm docker image):
# docker build --build-arg SGL_BRANCH=v0.4.1.post4 -t v0.4.1.post4-rocm620 -f Dockerfile.rocm .
# docker build --build-arg SGL_BRANCH=v0.4.1.post5 -t v0.4.1.post5-rocm620 -f Dockerfile.rocm .

# default base image
ARG BASE_IMAGE="rocmshared/vllm-rocm:20241031-tuned"
Expand Down
4 changes: 2 additions & 2 deletions docs/developer/setup_github_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ docker pull nvidia/cuda:12.1.1-devel-ubuntu22.04
# Nvidia
docker run --shm-size 128g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.1.1-devel-ubuntu22.04 /bin/bash
# AMD
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.4.1.post4-rocm620 /bin/bash
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.4.1.post5-rocm620 /bin/bash
# AMD just the last 2 GPUs
docker run --rm --device=/dev/kfd --device=/dev/dri/renderD176 --device=/dev/dri/renderD184 --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.4.1.post4-rocm620 /bin/bash
docker run --rm --device=/dev/kfd --device=/dev/dri/renderD176 --device=/dev/dri/renderD184 --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.4.1.post5-rocm620 /bin/bash
```

### Step 2: Configure the runner by `config.sh`
Expand Down
10 changes: 5 additions & 5 deletions docs/start/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Note: Please check the [FlashInfer installation doc](https://docs.flashinfer.ai/
## Method 2: From source
```
# Use the last release branch
git clone -b v0.4.1.post4 https://github.com/sgl-project/sglang.git
git clone -b v0.4.1.post5 https://github.com/sgl-project/sglang.git
cd sglang
pip install --upgrade pip
Expand All @@ -26,7 +26,7 @@ Note: To AMD ROCm system with Instinct/MI GPUs, do following instead:

```
# Use the last release branch
git clone -b v0.4.1.post4 https://github.com/sgl-project/sglang.git
git clone -b v0.4.1.post5 https://github.com/sgl-project/sglang.git
cd sglang
pip install --upgrade pip
Expand All @@ -51,7 +51,7 @@ docker run --gpus all \
Note: To AMD ROCm system with Instinct/MI GPUs, it is recommended to use `docker/Dockerfile.rocm` to build images, example and usage as below:

```bash
docker build --build-arg SGL_BRANCH=v0.4.1.post4 -t v0.4.1.post4-rocm620 -f Dockerfile.rocm .
docker build --build-arg SGL_BRANCH=v0.4.1.post5 -t v0.4.1.post5-rocm620 -f Dockerfile.rocm .

alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \
--shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
Expand All @@ -60,11 +60,11 @@ alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/d
drun -p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
v0.4.1.post4-rocm620 \
v0.4.1.post5-rocm620 \
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

# Till flashinfer backend available, --attention-backend triton --sampling-backend pytorch are set by default
drun v0.4.1.post4-rocm620 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8
drun v0.4.1.post5-rocm620 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8
```

## Method 4: Using docker compose
Expand Down
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "sglang"
version = "0.4.1.post4"
version = "0.4.1.post5"
description = "SGLang is yet another fast serving framework for large language models and vision language models."
readme = "README.md"
requires-python = ">=3.8"
Expand Down
2 changes: 1 addition & 1 deletion python/sglang/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.4.1.post4"
__version__ = "0.4.1.post5"

0 comments on commit f624901

Please sign in to comment.