Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions: myactuator_rmd.can.BusError and ProtocolException on repeated path-following commands #11

Open
TheIndoorDad opened this issue Jul 15, 2024 · 4 comments

Comments

@TheIndoorDad
Copy link

Hi Tobit,

I am still investigating, but when using closed-loop position control for path-following (with commands every time step), my system tends to accrue many myactuator_rmd.can.BusError exceptions regardless of the time step size used (typically 2 ms but with other time step sizes also). I'm handling the exceptions and it doesn't appear to have a significant effect on path following performance, but after executing a path later individual commands often return a myactuator_rmd_py.ProtocolException: Unexpected response of the kind used by the path following, e.g.:

>>> motor[5].setAcceleration(5000, pos_accel)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
myactuator_rmd_py.ProtocolException: Unexpected response '0xa4'

In this case monitoring the CAN socket reveals a correct (0x43) message and response, and the motor does execute the command to write the new acceleration, but it is problematic when you need to read the correct response (e.g., with getMultiTurnAngle()). It is as though responses to commands which receive a BusError accrue in a buffer and then are inappropriately read by later commands.

Also, though more rarely, a BusError will occur on a command that is not repeated each time step, which I think may cause the command to fail to be sent to the motor (to be confirmed) - potentially disrupting a program. Anecdotally, I think the likelihood of this may increase as more BusError's have been accrued (e.g., following execution of a path).

As I mentioned I'm still investigating, so I'm not certain whether the bus errors are a result of something in my system or something in the code. I haven't yet dug into your code sufficiently to see what might cause the bus errors there, but maybe you have a better idea? In any case, I wonder if there is a way to clear the response buffer so later commands don't get unexpected responses?

I will continue to look into this on my return at the end of the month, but just wanted to bring it up now in case you have a chance to start thinking about it.

Cheers,
Stefan

@2b-t
Copy link
Owner

2b-t commented Jul 15, 2024

Hi Stefan,
What is your setup like? What operating system are you running? Have you tried another CAN adapter? Are you able to provide an example script that allows me to replicate this easily?

I have not been able to generate error messages like these previously. I have tried to go for a very simple blocking design throughout the code: So there should always be a request followed by its response and only then a new request can be sent. The only problem where I could imagine this happening is when trying to access the class from multiple threads contemporarily. The code is currently not thread-safe and if somebody tried to send a position set point (0x43) while changing e.g. the acceleration values, throwing these errors might be expected behaviour. If that was the case you would have to protect the actuator interface with a mutex or similar to make sure only one thread sends a command at a time. I could guarantee thread safety on a level of the CAN node but have not done so yet as I wanted this to be a simple light-weight driver.

@TheIndoorDad
Copy link
Author

Hi Tobit,

I'm running Ubuntu 20.04 with a low-latency kernel patch (5.15.0-117-lowlatency) to send commands through a multi-channel usb-CAN adapter to multiple motors distributed over several channels (3-4 motors to each channel). I don't have another CAN adapter that is socketCAN compatible, but the one I'm using is a performance commercial model. Each channel bus is terminated at each end with 120 ohm, and all motors on the channel are essentially 'daisy-chained' (branch length is negligible). My motors are all either RMD-X8-Pro-H 1:6 (V3) or RMD-X8-Pro 1:9 (V2) by the older naming convention.

The problem is consistent every time I am controlling multiple motors (whether on a single CAN channel or multiple), but less so for a single motor. I can sometimes replicate either the ProtocolException alone or both it and the BusError on a single motor by using a small time step (e.g., 1 ms) in the following script:

import os
import time
import math
import myactuator_rmd_py as rmd

# Define variables:
can_ID = 3 # replace with motor ID.
max_angle = 90 # deg from starting pos.
freq = 0.5 # Hz
t = 0
dt = 0.001 # s

print('Bringing up CAN0...')
os.system("sudo ip link set can0 up type can bitrate 1000000")
time.sleep(0.1) 

can = rmd.CanDriver("can0")
motor = rmd.ActuatorInterface(can, can_ID)

error_no = 0

# Read actuator startup position for later calculaltions of zero offset. 
startup_pos = motor.getMultiTurnAngle()

# Set position planning accelerations to zero for dynamic tracking.
pos_accel = rmd.actuator_state.AccelerationType(0)
pos_decel = rmd.actuator_state.AccelerationType(1)
motor.setAcceleration(0, pos_accel)
motor.setAcceleration(0, pos_decel)

def sin_pos(t):
    return startup_pos + max_angle/2 * (1 - math.cos(2 * math.pi * freq * t))

# Sinusoidal path following until KeyboardInterrupt (^C) or 10 s.
try:
    while t < 10:
        try:
            motor.sendPositionAbsoluteSetpoint(sin_pos(t), 500)
        except rmd.can.BusError:
            error_no += 1
            print("Bus Error:", str(error_no))
        time.sleep(dt)
        t += dt
except KeyboardInterrupt:
    pass

time.sleep(2)

# Reset accelerations and return to startup positions for shutdown.
try:
    motor.setAcceleration(5000, pos_accel)
except rmd.ProtocolException as exc:
    print(exc)
except rmd.can.BusError:
    error_no += 1
    print("Bus Error resetting acceleration:", str(error_no))
try:
    motor.setAcceleration(5000, pos_decel)
except rmd.ProtocolException as exc:
    print(exc)
except rmd.can.BusError:
    error_no += 1
    print("Bus Error resetting deceleration:", str(error_no))
try:
    motor.sendPositionAbsoluteSetpoint(startup_pos, 100)
except rmd.ProtocolException as exc:
    print(exc)
except rmd.can.BusError:
    error_no += 1
    print("Bus Error returning to startup pos:", str(error_no))

When I am controlling multiple motors I make motor, startup_pos, and max_angle dictionaries with the motor IDs as keys and add an id parameter to the sin_pos() function so I can iterate at each time step e.g.:

def sin_pos(t, id):
    return startup_pos[id] + max_angle[id]/2 * (1 - math.cos(2 * math.pi * freq * t))

motor_IDs = list(range(1, 7))
try:
    while t < 10:
        try:
            for id in motor_IDs:
                motor[id].sendPositionAbsoluteSetpoint(sin_pos(t, id), 500)
        except rmd.can.BusError:
            error_no += 1
            print("Bus Error:", str(error_no))
        time.sleep(dt)
        t += dt
except KeyboardInterrupt:
    pass

I don't know much about multi-threading, but I am only running one script at a time. I suppose I was assuming that would mean it would be single-threaded and each request/response would complete before another request was sent as you say, but I could be wrong.

@2b-t
Copy link
Owner

2b-t commented Aug 7, 2024

Hi again Stefan,
I am currently on holidays and can't test your code on my own actuator until 10 days from now.
The code above looks fine to me apart from a minor imprecision that jumps to my eye: In your example code above the time t is not correctly incremented as the commands that interface the motor (e.g. sendPositionAbsoluteSetpoint) are blocking and will take almost 2 ms to return each (see here). The actual time t is therefore the sum of dt and the time it took the motor to respond (ideally determined by measuring the time it takes for the function to respond). I do not think though that this is related to the issue that you are experiencing.
I have not tested the Python bindings much but I have done the very same thing like your code above with the C++ code (which is the underlying code for the bindings and therefore should be exhibit the very same behavior) for several hours and never experienced any issues.
I personally would therefore suspect either the actuator or the CAN adapter to be the problem. Ideally you would test your code on another computer with another CAN adapter to exclude these causes. As it seems as if that is not easily possible, I would test the following (and ideally record a Wireshark log of it):

  • Run the script for a single actuator with a single actuator on the CAN bus for an extended period of time for every single one of your actuators and check whether it happens just with some of your actuators while others work perfectly fine. If that was the case, this would indicate a firmware issue of the actuator itself.
  • Run multiple actuators daisy-chained with the code for a single actuator and test if the behavior only emerges in case of multiple actuators being present on the CAN bus. I could not imagine this being the reason but only having access to a single actuator, I have never tested this. If that was the case it would likely be an issue with my code.
  • Get yourself another USB to CAN adapter and see if it exhibits the same behavior. The Makerbase CANable USB to CAN adapter that I am using goes for around 15 Euros and takes around a week to ship from China.

Best wishes
Tobit

@TheIndoorDad
Copy link
Author

Ah that makes sense about the blocking time. I will have to work to correct that, thanks.

Thank you also for the suggestions, I'll continue troubleshooting.

Enjoy your holiday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants