You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
5. Please use English, otherwise it will be closed.
Describe the bug
Much of the time it is fine, but there is a abrupt termination of the streaming with:
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
using the OpenAI API endpoint. E.g. I see about 250 of those failures over course of 12 hours (even though many more fail because we have 3 retries in exponential backoff). Interestingly, these events occur in a cluster, suggesting the entire sglang is hung-up with the 8 simultaneous requests.
Perhaps even worse, sometimes the response just gets totally stuck and hangs for an hour.
There's no easy repro. The pattern of usage is ~14k system prompt + query and good number of chat turns afterwards. Also in some cases large context is filled to do RAG etc.
But I shared logs. These are the entire logs from start to finish over which there are these issues.
FYI I used sglang on 8*H200 and see no such issues with crashes or hangs. So this seems to be a purely AMD issue. Also, AMD is 3x slower than H100 or H200. sglang + AMD is no better than vLLM. Given the hardware specs, I'm guessing AMD is very under optimized, but it's probably because it's less supported. No hope for AMD then.
@pseudotensor , can you try "export HSA_NO_SCRATCH_RECLAIM=1" when you launch sglang serve? It will reduce the latency quite a bit. Meanwhile, this is a new PR #3255 which will give ~20% perf improvement on 8xMI300X.
Checklist
Describe the bug
Much of the time it is fine, but there is a abrupt termination of the streaming with:
using the OpenAI API endpoint. E.g. I see about 250 of those failures over course of 12 hours (even though many more fail because we have 3 retries in exponential backoff). Interestingly, these events occur in a cluster, suggesting the entire sglang is hung-up with the 8 simultaneous requests.
Perhaps even worse, sometimes the response just gets totally stuck and hangs for an hour.
Reproduction
image: lmsysorg/sglang:v0.4.2-rocm620
command:
There's no easy repro. The pattern of usage is ~14k system prompt + query and good number of chat turns afterwards. Also in some cases large context is filled to do RAG etc.
But I shared logs. These are the entire logs from start to finish over which there are these issues.
logs.zip
Environment
The text was updated successfully, but these errors were encountered: