Skip to content

Commit

Permalink
Merge branch 'main' into dev/onevision
Browse files Browse the repository at this point in the history
  • Loading branch information
kcz358 committed Aug 14, 2024
2 parents 2f3a03f + 67c0d83 commit 9350ddd
Show file tree
Hide file tree
Showing 22 changed files with 869 additions and 541 deletions.
3 changes: 2 additions & 1 deletion .github/ISSUE_TEMPLATE/1-bug-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ body:
- label: 2. The bug has not been fixed in the latest version.
- label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- label: 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 5. Please use English, otherwise it will be closed.
- type: textarea
attributes:
label: Describe the bug
Expand All @@ -31,7 +32,7 @@ body:
attributes:
label: Environment
description: |
Please provide necessary environment information here with `python3 -m sglang.check_env`.
Please provide necessary environment information here with `python3 -m sglang.check_env`. Otherwise the issue will be closed.
placeholder: Environment here.
validations:
required: true
6 changes: 6 additions & 0 deletions .github/ISSUE_TEMPLATE/2-feature-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ description: Suggest an idea for this project
title: "[Feature] "

body:
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 2. Please use English, otherwise it will be closed.
- type: textarea
attributes:
label: Motivation
Expand Down
13 changes: 7 additions & 6 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
Thank you for your contribution, we really appreciate it. The following instructions will help improve your pull request and make it easier to receive feedback. If there are any items you don't understand, don't worry. Just submit the pull request and ask the maintainers for help.
<!-- Thank you for your contribution, we really appreciate it. The following instructions will help improve your pull request and make it easier to receive feedback. If there are any items you don't understand, don't worry. Just submit the pull request and ask the maintainers for help. -->

## Motivation

Please explain the motivation behind this PR and the goal you aim to achieve with it.
<!-- Please explain the motivation behind this PR and the goal you aim to achieve with it. -->

## Modification

Briefly describe the changes made in this PR.
<!-- Briefly describe the changes made in this PR. -->

## Checklist

1. Ensure pre-commit `pre-commit run --all-files` or other linting tools are used to fix potential lint issues.
2. Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
3. Modify documentation as needed, such as docstrings or example tutorials.
- [ ] Before submitting a PR for review, make sure it has passed verification in your local development environment **at least**.
- [ ] Ensure pre-commit `pre-commit run --all-files` or other linting tools are used to fix potential lint issues.
- [ ] Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
- [ ] Modify documentation as needed, such as docstrings or example tutorials.
2 changes: 1 addition & 1 deletion .github/workflows/accuracy-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ jobs:
run: |
cd test/srt
python3 test_eval_accuracy_large.py
timeout-minutes: 20
timeout-minutes: 10
3 changes: 3 additions & 0 deletions .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,16 @@ jobs:
run: |
cd test/srt
python3 -m unittest test_serving_throughput.TestServingThroughput.test_default
timeout-minutes: 10

- name: Benchmark Serving Throughput (w/o RadixAttention)
run: |
cd test/srt
python3 -m unittest test_serving_throughput.TestServingThroughput.test_default_without_radix_cache
timeout-minutes: 10

- name: Benchmark Serving Throughput (w/ ChunkedPrefill)
run: |
cd test/srt
python3 -m unittest test_serving_throughput.TestServingThroughput.test_default_with_chunked_prefill
timeout-minutes: 10
13 changes: 9 additions & 4 deletions .github/workflows/moe-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,12 @@ jobs:
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall

- name: Benchmark MOE Serving Throughput
run: |
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
uses: nick-fields/retry@v3
with:
timeout_minutes: 15
max_attempts: 2
retry_on: error
command: |
cd test/srt
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default
python3 -m unittest test_moe_serving_throughput.TestServingThroughput.test_default_without_radix_cache
2 changes: 2 additions & 0 deletions .github/workflows/unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,10 @@ jobs:
run: |
cd test/srt
python3 run_suite.py --suite minimal
timeout-minutes: 15

- name: Test Frontend Language
run: |
cd test/lang
python3 run_suite.py --suite minimal
timeout-minutes: 10
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ docker run --gpus all \
2. Execute the command `docker compose up -d` in your terminal.

### Common Notes
- If you cannot install FlashInfer, check out its [installation](https://docs.flashinfer.ai/installation.html#) page. If you still cannot install it, you can use the slower Triton kernels by adding `--disable-flashinfer` when launching the server.
- [FlashInfer](https://github.com/flashinfer-ai/flashinfer) is currently one of the dependencies that must be installed for SGLang. If you are using NVIDIA GPU devices below sm80, such as T4, you can't use SGLang for the time being. We expect to resolve this issue soon, so please stay tuned. If you encounter any FlashInfer-related issues on sm80+ devices (e.g., A100, L40S, H100), consider using Triton's kernel by `--disable-flashinfer --disable-flashinfer-sampling` and raise a issue.
- If you only need to use the OpenAI backend, you can avoid installing other dependencies by using `pip install "sglang[openai]"`.

## Backend: SGLang Runtime (SRT)
Expand Down
3 changes: 3 additions & 0 deletions benchmark/gsm8k/bench_sglang.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ def few_shot_gsm8k(s, question):
for i in range(len(states)):
preds.append(get_answer_value(states[i]["answer"]))

# print(f"{preds=}")
# print(f"{labels=}")

# Compute accuracy
acc = np.mean(np.array(preds) == np.array(labels))
invalid = np.mean(np.array(preds) == INVALID)
Expand Down
1 change: 1 addition & 0 deletions python/sglang/bench_latency.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ def correctness_test(

# Prepare inputs
input_ids, reqs = prepare_inputs_for_correctness_test(bench_args, tokenizer)
rank_print(f"{input_ids=}")

if bench_args.cut_len > 0:
# Prefill
Expand Down
1 change: 0 additions & 1 deletion python/sglang/srt/layers/activation.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
"""Fused operators for activation layers."""

import torch
import torch.nn as nn
import torch.nn.functional as F
from flashinfer.activation import silu_and_mul
from vllm.model_executor.custom_op import CustomOp
Expand Down
1 change: 1 addition & 0 deletions python/sglang/srt/layers/fused_moe/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from sglang.srt.layers.fused_moe.layer import FusedMoE, FusedMoEMethodBase
Loading

0 comments on commit 9350ddd

Please sign in to comment.