Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IP changes of statsd hostname #284

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Goltergaul
Copy link

@Goltergaul Goltergaul commented Sep 27, 2023

we have the issue that the ip address of our statsd host is changing regularly on our kubernetes cluster due to some automated updates. In this case the current implementation continues to send UDP packets to the old IP address.

This pr addresses this issue by resolving the hostname to its ip all 60 seconds. in case the ip changed, it simply reconnects the UDP socket.

This is a bit of a draft to get your input. Please let me know if this is something you would be willing to merge

FYI this currently failes the allocation specs as I did not adapt those yet

@Goltergaul Goltergaul changed the title Check all 60 seconds if the statsd hostname resolves to a different i… Support IP changes of statsd hostname Sep 27, 2023
@Goltergaul
Copy link
Author

Todo: Disable this by default and make interval configurable

@TheWudu
Copy link

TheWudu commented Sep 28, 2023

Looks good to me :-)

@erikpaperik
Copy link

Looks good 👍

@remeh
Copy link
Contributor

remeh commented Sep 28, 2023

Hey @Goltergaul

Thanks for you contribution.

A problem with this implementation is that the Resolv.getaddress call can be slow (even if returning the same IP). It will block the thread during the DNS resolve, and thus the application calling the library. Even if fast, the DNS resolve could start to fail for some reasons and nothing would indicate an error with the current implementation + the IP would not be known anymore by the client instance (while still being possibly reachable).

A mitigation to your problem which can be considered with the dogstatsd-ruby client and which would mimick your implementation would be that every 60s, your application calls the flush method and then re-creates the client instance. WDYT?

@andreaseger
Copy link

andreaseger commented Sep 28, 2023

my 2 cents, IMHO this should be handled (if enabled) in the statsd library. statsd is often used indirectly by other libraries to report metrics and for example not directly in our applications.
But I can see your concern about slow dns resolution. Wondering if the DNS resolve could maybe be moved into a fiber(more lightweight than a thread and should be ideal here as we're waiting for IO) so it won't block the send_message operation.

Also the other concerns you mention like DNS errors while the old statsd ip is still reachable can be handled with some added error handling.

@Goltergaul
Copy link
Author

Goltergaul commented Sep 28, 2023

@remeh Thanks for your comment. While your proposed mitigation works I would expect from such a library that it can reconnect itself automatically in such a case, as library user I don't want to take care of this myself.

As for the problems:
For when the dns resolution fails, it could be ignored and the socket left intact. Then retry the resolution only again at the next interval. So it would not change in terms of behavior compared to what it is doing now.

As for speed considerations when the dns resolution is slow. Maybe add a low timeout, e.g.

Resolv::DNS.open do |dns|
  dns.timeouts = 0.1
  ip = dns.getaddress(hostname)
end

This would mean the thread is blocked at max 100ms every minute (not sure if that is enough though). Also this behavior could be made configurable with custom timeouts or even disable it completely. Additionally i assume the dns lookup via resolv should also profit from some low level dns caching? Someone here knows more about this? At least it is pretty fast if I try to resolve 10000 times the same domain

Or as suggested above, we could make it async with a Fiber or so (sounds like the most robust one) I'm not an expert on Fibers, but I think this will work only with ruby 3.

@Goltergaul Goltergaul force-pushed the reconnect-on-ip-change branch from f922b42 to 9378e6d Compare October 10, 2023 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants