[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

pseudotensor · 2025-01-28T22:05:13Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Much of the time it is fine, but there is a abrupt termination of the streaming with:

httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

using the OpenAI API endpoint. E.g. I see about 250 of those failures over course of 12 hours (even though many more fail because we have 3 retries in exponential backoff). Interestingly, these events occur in a cluster, suggesting the entire sglang is hung-up with the 8 simultaneous requests.

Perhaps even worse, sometimes the response just gets totally stuck and hangs for an hour.

Reproduction

image: lmsysorg/sglang:v0.4.2-rocm620

command:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --host 0.0.0.0 --port 5000 --trust-remote-code  --context-length 65536 --tp 8 --random-seed 1234 --download-dir /root/.cache/huggingface/hub/

There's no easy repro. The pattern of usage is ~14k system prompt + query and good number of chat turns afterwards. Also in some cases large context is filled to do RAG etc.

But I shared logs. These are the entire logs from start to finish over which there are these issues.

logs.zip

Environment

root@ef5e23d28c0e:/sgl-workspace# python3 -m sglang.check_env
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
WARNING 01-28 22:04:48 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.2.41133-dd7f95766
ROCM Driver Version: 6.7.0
PyTorch: 2.5.0+git13a0629
flashinfer: Module Not Found
triton: 3.0.0
transformers: 4.46.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.10.10
fastapi: 0.115.4
hf_transfer: 0.1.9
huggingface_hub: 0.26.2
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.1
psutil: 6.1.0
pydantic: 2.9.2
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.32.0
uvloop: 0.21.0
vllm: 0.6.3.post2.dev1+g1ef171e0.d20250114
openai: 1.60.1
anthropic: 0.45.0
decord: 0.6.0
AMD Topology:


============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU1   XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU2   XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI
GPU3   XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI
GPU4   XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI
GPU5   XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI
GPU6   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI
GPU7   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0
================================== End of ROCm SMI Log ===================================

ulimit soft: 1048576

The text was updated successfully, but these errors were encountered:

zhaochenyang20 · 2025-01-29T17:21:08Z

cc @zhyncs

pseudotensor · 2025-01-31T00:41:27Z

FYI I used sglang on 8*H200 and see no such issues with crashes or hangs. So this seems to be a purely AMD issue. Also, AMD is 3x slower than H100 or H200. sglang + AMD is no better than vLLM. Given the hardware specs, I'm guessing AMD is very under optimized, but it's probably because it's less supported. No hope for AMD then.

#3196

zhaochenyang20 · 2025-02-01T18:12:13Z

We shall work more on AMD. Stay tuned!

andyluo7 · 2025-02-02T00:41:59Z

@pseudotensor , can you try "export HSA_NO_SCRATCH_RECLAIM=1" when you launch sglang serve? It will reduce the latency quite a bit. Meanwhile, this is a new PR #3255 which will give ~20% perf improvement on 8xMI300X.

zhaochenyang20 self-assigned this Jan 29, 2025

zhaochenyang20 added deepseek help wanted Extra attention is needed labels Jan 29, 2025

zhaochenyang20 mentioned this issue Feb 1, 2025

[Docs] Add docs for running SGLang on AMD #3245

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

pseudotensor commented Jan 28, 2025 •

edited

Loading

zhaochenyang20 commented Jan 29, 2025

pseudotensor commented Jan 31, 2025 •

edited

Loading

zhaochenyang20 commented Feb 1, 2025

andyluo7 commented Feb 2, 2025

[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

[Bug] constant errors + hangs using sglang + deepseek v3 + AMD (httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)) #3198

Comments

pseudotensor commented Jan 28, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

zhaochenyang20 commented Jan 29, 2025

pseudotensor commented Jan 31, 2025 • edited Loading

zhaochenyang20 commented Feb 1, 2025

andyluo7 commented Feb 2, 2025

pseudotensor commented Jan 28, 2025 •

edited

Loading

pseudotensor commented Jan 31, 2025 •

edited

Loading