Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Websockets hang on recv() #487

Open
3 tasks done
dleber opened this issue Jan 23, 2025 · 3 comments
Open
3 tasks done

[BUG] Websockets hang on recv() #487

dleber opened this issue Jan 23, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@dleber
Copy link

dleber commented Jan 23, 2025

Please check the following items and answer all the questions when reporting a bug,
otherwise it will be closed immediately.

  • This is NOT a site-related "bugs", e.g. some site blocks me when using curl_cffi,
    UNLESS it has been verified that the reason is missing pieces in the impersonation.
  • A code snippet that can reproduce this bug is provided, even if it's a one-liner.
  • Version information will be pasted as below.

Describe the bug

Using websockets, when iterating through received messages, websockets hang on recv() most of the time. This occurs for both async and non-async implementations.

When running the code below, call either asyncio.run(ws_curl_cffi_async()) or ws_curl_cffi() multiple times to discover the issue, as occasionally it works as expected (usually on the first run).

The code also includes a snippet using Python's websocket-client library, which reliably works each time.

Separately, it would be great if a message timeout could be configured, such that recv() doesn't hang indefinitely.

To Reproduce

import asyncio

from curl_cffi.requests import AsyncSession, WebSocket

async def ws_curl_cffi_async():
    async with AsyncSession() as s:
        ws = await s.ws_connect("wss://echo.websocket.org")
        await asyncio.gather(*[ws.send_str(f"Hello {i}") for i in range(20)])
        async for message in ws:
            print(message)

def ws_curl_cffi():
    ws = WebSocket().connect("wss://echo.websocket.org")
    for i in range(20):
        ws.send(f"Hello {i}")
    for i in range(20):
        print(ws.recv())
    ws.close()


if __name__ == "__main__":
    # asyncio.run(ws_curl_cffi_async())
    ws_curl_cffi()

The example below uses the websocket-client library to demonstrate the expected behavior. It works reliably each time.

pip install websocket-client==1.6.4

from websocket import create_connection

def ws_websocket():
    ws = create_connection("wss://echo.websocket.org")
    for i in range(20):
        ws.send(f"Hello {i}")
    
    for i in range(20):
        print(ws.recv())

    ws.close()

if __name__ == "__main__":
    ws_websocket()

Expected behavior

All 19 messages should be received and printed.

Versions

  • OS: Linux aarch64 (python:3.12.2 Docker image)
  • curl-cffi==v0.8.1b9

Additional Information

This is possibly an issue related to queuing of response messages. If each message is read immediately after the send, it works more reliably, but this is only applicable to this "echo" example. For example

def ws_websocket():
    ws = create_connection("wss://echo.websocket.org")
    for i in range(20):
        ws.send(f"Hello {i}")
        print(ws.recv())

    ws.close()
@dleber dleber added the bug Something isn't working label Jan 23, 2025
@dleber
Copy link
Author

dleber commented Jan 24, 2025

I've isolated the issue to this line of code

rlist, _, _ = select([sock_fd], [], [], 5.0)

Select starts off detecting some of the received messages, allowing the subsequent calls to self.recv_fragment(). However it usually reaches a point where it fails to detect the remaining messages in the socket buffer, and hangs forever.

If I comment out rlist, _, _ = select([sock_fd], [], [], 5.0) and if rlist:, the previously posted example (ws_curl_cffi()) works correctly every time. I'm in no way suggesting this as a solution, it merely demonstrates that all messages are being received.

I've tried replacing select with other event listeners including select.poll() and selectors.DefaultSelector() (I believe this uses select.epoll() on Linux), however the same issues were present.

I also came across curl_multi_wait but didn't try it.

@lexiforest
Copy link
Owner

We will revisit the ws implementation later, perhaps adding your case to the unit tests, thanks for your experiments.

@dleber
Copy link
Author

dleber commented Jan 25, 2025

Not a problem, looking forward to the updates.

Separately, adding an optional message timeout parameter to the WebSocket class would be very useful. When dealing with the select bug, it would prevent indefinite hanging. More generally, if a server isn't sending data, I wouldn't want my scripts waiting too long. Let me know if you would like me to create a feature request for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants