Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

Open
5 tasks done
pseudotensor opened this issue Jan 28, 2025 · 4 comments
Assignees
Labels
deepseek help wanted Extra attention is needed

Comments

@pseudotensor
Copy link

pseudotensor commented Jan 28, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Much of the time it is fine, but there is a abrupt termination of the streaming with:

httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

using the OpenAI API endpoint. E.g. I see about 250 of those failures over course of 12 hours (even though many more fail because we have 3 retries in exponential backoff). Interestingly, these events occur in a cluster, suggesting the entire sglang is hung-up with the 8 simultaneous requests.

Perhaps even worse, sometimes the response just gets totally stuck and hangs for an hour.

Reproduction

image: lmsysorg/sglang:v0.4.2-rocm620

command:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --host 0.0.0.0 --port 5000 --trust-remote-code  --context-length 65536 --tp 8 --random-seed 1234 --download-dir /root/.cache/huggingface/hub/

There's no easy repro. The pattern of usage is ~14k system prompt + query and good number of chat turns afterwards. Also in some cases large context is filled to do RAG etc.

But I shared logs. These are the entire logs from start to finish over which there are these issues.

logs.zip

Environment

root@ef5e23d28c0e:/sgl-workspace# python3 -m sglang.check_env
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
WARNING 01-28 22:04:48 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.2.41133-dd7f95766
ROCM Driver Version: 6.7.0
PyTorch: 2.5.0+git13a0629
flashinfer: Module Not Found
triton: 3.0.0
transformers: 4.46.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.10.10
fastapi: 0.115.4
hf_transfer: 0.1.9
huggingface_hub: 0.26.2
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.1
psutil: 6.1.0
pydantic: 2.9.2
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.32.0
uvloop: 0.21.0
vllm: 0.6.3.post2.dev1+g1ef171e0.d20250114
openai: 1.60.1
anthropic: 0.45.0
decord: 0.6.0
AMD Topology:


============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU1   XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU2   XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI
GPU3   XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI
GPU4   XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI
GPU5   XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI
GPU6   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI
GPU7   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0
================================== End of ROCm SMI Log ===================================

ulimit soft: 1048576
@zhaochenyang20
Copy link
Collaborator

cc @zhyncs

@zhaochenyang20 zhaochenyang20 self-assigned this Jan 29, 2025
@zhaochenyang20 zhaochenyang20 added deepseek help wanted Extra attention is needed labels Jan 29, 2025
@pseudotensor
Copy link
Author

pseudotensor commented Jan 31, 2025

FYI I used sglang on 8*H200 and see no such issues with crashes or hangs. So this seems to be a purely AMD issue. Also, AMD is 3x slower than H100 or H200. sglang + AMD is no better than vLLM. Given the hardware specs, I'm guessing AMD is very under optimized, but it's probably because it's less supported. No hope for AMD then.

#3196

@pseudotensor pseudotensor changed the title [Bug] constant errors + hangs using sglang + deepseek v3 (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) [Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) Jan 31, 2025
@zhaochenyang20
Copy link
Collaborator

We shall work more on AMD. Stay tuned!

@andyluo7
Copy link

andyluo7 commented Feb 2, 2025

@pseudotensor , can you try "export HSA_NO_SCRATCH_RECLAIM=1" when you launch sglang serve? It will reduce the latency quite a bit. Meanwhile, this is a new PR #3255 which will give ~20% perf improvement on 8xMI300X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants