Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server fails over time (2~3 hours): "RuntimeError: dictionary changed size during iteration" #820

Closed
odorikakeru opened this issue Oct 23, 2022 · 3 comments

Comments

@odorikakeru
Copy link

Description

  • Library Version:
  • ROS Version: ROS2 foxy (similar result observed on noetic, but data not available, will attempt to reproduce in future)
  • Platform / OS: Ubuntu 20.04 (based on Docker image ros:foxy)

Steps To Reproduce

  • Build Docker image using following Dockerfile

    FROM ros:foxy
    
    RUN apt-get update && apt-get upgrade -y && \
      apt-get install -y git python3-pip
    
    RUN pip install setproctitle~=1.3.2
    
    WORKDIR /ros2_ws
    
    RUN git clone https://github.com/RobotWebTools/rosbridge_suite.git
    RUN rosdep install -i -y --from-paths ./rosbridge_suite
    RUN . /opt/ros/${ROS_DISTRO}/setup.sh && \
      colcon build
    
    COPY ./scripts/launcher.sh /
    RUN chmod +x /launcher.sh
    
    CMD ["/launcher.sh"]
    • launcher.sh:
      #!/bin/bash
      set -e
      
      # run rosbridge_server (ROS2)
      unset ROS_DISTRO
      source install/setup.bash
      ros2 launch rosbridge_server rosbridge_websocket_launch.xml
  • Subscribe to /rosout topic through roslibpy (other libraries not tested)

  • Leave running for 2~3 hours

Expected Behavior
Behavior of rosbridge_server unaffected by time

Actual Behavior
Following error eventually occurs:

[rosbridge_websocket-1] ERROR:tornado.application:Exception in callback <function main.<locals>.<lambda> at 0x7fe008a8cd30>
[rosbridge_websocket-1] Traceback (most recent call last):
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 1229, in _run
[rosbridge_websocket-1]     return self.callback()
[rosbridge_websocket-1]   File "/ros2_ws/install/rosbridge_server/lib/rosbridge_server/rosbridge_websocket", line 330, in <lambda>
[rosbridge_websocket-1]     spin_callback = PeriodicCallback(lambda: rclpy.spin_once(node, timeout_sec=0.01), 1)
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/__init__.py", line 171, in spin_once
[rosbridge_websocket-1]     executor.spin_once(timeout_sec=timeout_sec)
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 720, in spin_once
[rosbridge_websocket-1]     raise handler.exception()
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
[rosbridge_websocket-1]     self._handler.send(None)
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 431, in handler
[rosbridge_websocket-1]     await call_coroutine(entity, arg)
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 356, in _execute_subscription
[rosbridge_websocket-1]     await await_or_execute(sub.callback, msg)
[rosbridge_websocket-1]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 118, in await_or_execute
[rosbridge_websocket-1]     return callback(*args)
[rosbridge_websocket-1]   File "/ros2_ws/install/rosbridge_library/lib/python3.8/site-packages/rosbridge_library/internal/subscribers.py", line 214, in callback
[rosbridge_websocket-1]     for callback in callbacks:
[rosbridge_websocket-1] RuntimeError: dictionary changed size during iteration
[rosbridge_websocket-1] [WARN] [1666546846.570707140] [rosbridge_websocket]: WebSocketClosedError: Tried to write to a closed websocket
[rosbridge_websocket-1] ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7fe008015160>, <Future finished exception=WebSocketClosedError()>)
[rosbridge_websocket-1] Traceback (most recent call last):
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 758, in _run_callback
[rosbridge_websocket-1]     ret = callback()
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
[rosbridge_websocket-1]     return fn(*args, **kwargs)
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 779, in _discard_future_result
[rosbridge_websocket-1]     future.result()
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/gen.py", line 326, in wrapper
[rosbridge_websocket-1]     yielded = next(result)
[rosbridge_websocket-1]   File "/ros2_ws/install/rosbridge_server/lib/python3.8/site-packages/rosbridge_server/websocket_handlerne 197, in prewrite_message
[rosbridge_websocket-1]     future = self.write_message(message, binary)
[rosbridge_websocket-1]   File "/usr/lib/python3/dist-packages/tornado/websocket.py", line 259, in write_message
[rosbridge_websocket-1]     raise WebSocketClosedError()
[rosbridge_websocket-1] tornado.websocket.WebSocketClosedError

Can connect to server after error, but service calls fail:
(example from roslibpy)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/ros.py", line 495, in get_params
    result = service.call(ServiceRequest(), callback,
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/core.py", line 368, in call
    call_results = self.ros.call_sync_service(message, timeout)
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/ros.py", line 251, in call_sync_service
    return self.blocking_call_from_thread(get_call_results, timeout)
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/ros.py", line 214, in blocking_call_from_thread
    return self.factory.manager.blocking_call_from_thread(callback, timeout)
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/comm/comm_autobahn.py", line 218, in blocking_call_from_thread
    return threads.blockingCallFromThread(reactor, callback, result_placeholder)
  File "/usr/lib/python3/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 395, in convertCancelled
    return toCall(value, timeout)
  File "/usr/local/lib/python3.8/dist-packages/roslibpy/comm/comm_autobahn.py", line 230, in raise_timeout_exception
    raise Exception('No service response received')
Exception: No service response received
@achim-k achim-k self-assigned this Oct 24, 2022
@achim-k
Copy link
Contributor

achim-k commented Oct 24, 2022

Hi @odorikakeru, which version are you on (commit hash)? It seems that you are not on the latest version which should have a fix for that (added in #803)

ROS Version: ROS2 foxy (similar result observed on noetic, but data not available, will attempt to reproduce in future)

It would be good if you could reproduce the issue, as the noetic version does not include #803

@achim-k achim-k removed their assignment Oct 24, 2022
@odorikakeru
Copy link
Author

Thank you for pointing me to #803 , I will try rebuilding the image and see if the problem is fixed.
This was with commit 1bc9c68

As for noetic, I have been unsuccessful at consistently reproducing the problem. I had assumed it was related as the observed symptoms are similar (after a set amount of time calls from the client side time out, rectified by restarting the server), however no error messages are produced by the server.

@github-actions
Copy link

This issue has been marked as stale because there has been no activity in the past 12 months. Please add a comment to keep it open.

@github-actions github-actions bot added the stale label Oct 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants