Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset by peer (Errno::ECONNRESET) sending messages in a loop #29

Open
rgaufman opened this issue May 1, 2018 · 10 comments
Open

Comments

@rgaufman
Copy link

rgaufman commented May 1, 2018

Hi there,

I have this code:

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'paho-mqtt'

$total = ARGV[0].to_i

$client = PahoMqtt::Client.new(username: 'testuser', password: 'testpasswd')
$client.connect('127.0.0.1', 1883)

$count = 0
def send_message
  $count += 1
  i = rand(1..$total)

  topic = "to/timebox#{i}/cameras"
  msg = "message #{$count} #{Time.now} for timebox#{i}"
  # puts "Publishing topic: #{topic} msg: #{msg}"

  # mosquitto_pub -p 1883 -q 2 -u testuser -P testpasswd -t 'to/timebox1/cameras' -m "message for timebox1"
  $client.publish(topic, msg, true, 2)
end

# $client.on_puback do # QOS1
$client.on_pubcomp do # QOS2
  send_message
end
send_message

loop { sleep 1 }

When I run this code, it sends about 80-90 messages and then I get:

$ bin/publisher-main.rb 500
#<Thread:0x00007fdd5912a640@/usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:124 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	8: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:128:in `block in daemon_mode'
	7: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:175:in `mqtt_loop'
	6: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:161:in `loop_read'
	5: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:161:in `times'
	4: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:163:in `block in loop_read'
	3: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/handler.rb:41:in `receive_packet'
	2: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:48:in `read'
	1: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:298:in `read_byte'
/usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:298:in `read': Connection reset by peer (Errno::ECONNRESET)
Traceback (most recent call last):
	8: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:128:in `block in daemon_mode'
	7: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:175:in `mqtt_loop'
	6: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:161:in `loop_read'
	5: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:161:in `times'
	4: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/client.rb:163:in `block in loop_read'
	3: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/handler.rb:41:in `receive_packet'
	2: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:48:in `read'
	1: from /usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:298:in `read_byte'
/usr/local/lib/ruby/gems/2.6.0/gems/paho-mqtt-1.0.7/lib/paho_mqtt/packet/base.rb:298:in `read': Connection reset by peer (Errno::ECONNRESET)
$

Adding persistent: true solves this, but I'm not seeing any errors in mosquitto, so it's strange this would be happening?

Any ideas?

@rgaufman
Copy link
Author

rgaufman commented May 1, 2018

After looking closer at the server logs I am seeing messages like this, maybe this helps point to the issue:

1525137059: Received PUBREL from testuser (Mid: 178)
1525137059: Warning: Received PUBREL from testuser for an unknown packet identifier 178.
1525137059: Sending PUBCOMP to testuser (Mid: 178)
1525137059: Received PUBREL from testuser (Mid: 179)
1525137059: Warning: Received PUBREL from testuser for an unknown packet identifier 179.
1525137059: Sending PUBCOMP to testuser (Mid: 179)
1525137059: Received PUBREL from testuser (Mid: 180)
1525137059: Warning: Received PUBREL from testuser for an unknown packet identifier 180.
1525137059: Sending PUBCOMP to testuser (Mid: 180)
1525137059: Received PUBLISH from testuser (d0, q2, r1, m62983, 'to/timebox888/cameras', ... (55 bytes))
1525137059: New connection from ::1 on port 1883.
1525137059: Received PUBLISH from testuser (d0, q2, r1, m62984, 'to/timebox701/cameras', ... (55 bytes))
1525137059: Client testuser already connected, closing old connection.
1525137059: Client testuser disconnected.
1525137059: New client connected from ::1 as testuser (c0, k60, u'testuser').
1525137059: Sending CONNACK to testuser (0, 0)

@p-goudet
Copy link
Contributor

p-goudet commented May 1, 2018

I had the same warning message when i was doing failure test of server side. To my mind, when the server shutdown, it might lost the id of the packet that where waiting for an acknowledgement (in QoS 2).
When the client is continuously sending publish with Qos= 2, after a reset, the server receive a PUBREL packet which the 3rd step of the transmission, then the server warn that the current session did not handle the 2 first step of transmission(PUBLISH and PUBREC). As the PUBREL packet is valid, following the warning, the server send a PUBCOMP packet to complete the transmission.

@p-goudet
Copy link
Contributor

p-goudet commented May 1, 2018

I try to run this code for 2 second and I could acknowledge about 153 message. I do not understand well what happen, here is my thinking. While you were testing the server did some kind of reset or something that close the socket with the client. That it is why the client notice the connection close event and shutdown. In persistence mode, the client is enable to reconnect to the server when it show up again. However, because the server had reset, it lost the previous transmission ids that the client is still waiting for.
It seems that there is some delay between the acknowledgment packet reception and its flush from the queue of packet that are waiting for acknowledgement. It could explain the client resend more than one packet.
I thinking that I would investigate on this delay for the packet that are waiting to be acknowledged.

@rgaufman
Copy link
Author

rgaufman commented May 1, 2018

Ah, that is possible, I am testing mosquitto in bridge mode, so it could be the bridge recovering maybe, this is the code I wrote to test mosquitto bridges: https://github.com/xanview/mosquitto-test

@rgaufman
Copy link
Author

rgaufman commented May 3, 2018

Yes, this appears to be the case, once I added a random client_id, the problem went away.

However if I used the same ID, killed the paho process, then started it again, it starts failing. Paho should handle this case without throwing an exception, no?

@p-goudet
Copy link
Contributor

p-goudet commented May 6, 2018

Ah sorry my meaning was little different, I was talking about packet id, (used in transmission), not about client id (used in session).
Could you tell me when do you generate a new client id? I would like to know if you generate a new client id for each session or if you generate it at the test beginning and then keep it the same.

@rgaufman
Copy link
Author

rgaufman commented May 9, 2018

The generating ID was just me testing stuff. e.g.

client_id = "client_#{(rand*1_000_000).to_i}"
$client = PahoMqtt::Client.new(username: 'testuser', password: 'testpasswd', client_id: client_id)

@p-goudet
Copy link
Contributor

I think that I could find the cause of those warning.
In case of server failure, the client tries to reconnect if persistent mode is enabled. If the client reconnect, the client resend the packets that were not acknowledgement properly. However, the packet are resent twice, which generate warning in the mosquitto log.

I will release a new version (1.0.9) soon that should correct this bug. With this change, the code that you provide does not generate warning in my environment, and packets are resent properly.
When released, could you tell me then if the version 1.0.9 solves your issue?
Thank you again for raising this bug.

@rgaufman
Copy link
Author

Will do, thank you :)

@p-goudet
Copy link
Contributor

The version 1.0.9 has just been released !

Could you try it and tell me if it fix this issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants