-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few optimizations to lower CPU usage #7898
base: RC_2_0
Are you sure you want to change the base?
Conversation
- Prevents unneccessary `std::copy`ing - Moves `session_settings::get_bool` out of nested loops - Uses a `std::map` instead of a `std::vector` for socket endpoints TODO: This (searching the map) is still one of the biggest CPU hotspots, is there any way to use a `std::unordered_map` or even avoid searching so much, since, at least on my machine, all trackers have the same 5 endpoints.
I can get a bigger boost in Lines 9922 to 9930 in c31d90b
with std::vector<timer_state>::iterator start = listen_socket_states.begin();
for (auto const& aep : t.endpoints)
{
std::vector<timer_state>::iterator aep_state_iter;
if (start == listen_socket_states.end())
{
listen_socket_states.emplace_back(aep.socket);
aep_state_iter = listen_socket_states.end() - 1;
start = listen_socket_states.end();
}
else if (start->socket == aep.socket)
{
aep_state_iter = start++;
}
else
{
aep_state_iter = std::find_if(start + 1, listen_socket_states.end()
, [&](timer_state const& s) { return s.socket == aep.socket; });
if (aep_state_iter == listen_socket_states.end())
{
listen_socket_states.emplace_back(aep.socket);
aep_state_iter = listen_socket_states.end() - 1;
start = listen_socket_states.begin();
}
} It works by shifting the search's starting point whenever an endpoint was found, meaning it doesn't keep searching the same endpoints each time. In the best case, the tracker's endpoints are in the same order as This assumes that a tracker will never have duplicate endpoints. Should I add this to the PR instead of the change to I'd also like to add something similar in |
Also, the calls to |
Another thing I've found is that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some preliminary review comments. The project author may have different opinions though.
@@ -400,12 +401,21 @@ namespace libtorrent { | |||
|
|||
bool is_i2p_url(std::string const& url) | |||
{ | |||
static std::unordered_map<std::string, bool> cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure my concerns are valid but anyway:
- What do you think about putting an upper limit on how many entries in the cache? IMO it isn't ideal for it to be unlimited. I suppose
1024
would suffice for most cases. Also this implies it needs some kind of cache replacement policy, for example LRU. Maybe libtorrent/boost provides such container? - If I understand correctly, using
static
means it will be shared across all libtorrent sessions which isn't ideal either. It should be local to a specific torrent/session or whatever scope that make sense. - Maybe this function requires to be thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It looks like libtorrent implements an LRU cache in src/file_view_pool.cpp, but that's specific to the file pool. Boost also has an implementation, but it's under ::compute, so it can't be used here. I could implement one, but I'm not sure if it's needed - my qbittorrent uses 242 tracker urls in total, being ~8kb in text. Round it up to 10kb to account for
std::unordered_map
's overhead, and even 1024 urls should be only ~40kb.
2 & 3. I hadn't considered multiple threads, but I think I could just use thread_local
instead of static
.
I think we could get rid of this cache if announce_entry
stored the result of is_i2p_url
along with it's url, although I'm not sure if that could be added, since it looks like announce_entry
is part of the ABI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this cache is the only controversial part of this patch. It seems more natural to record whether a tracker is i2p or not once, and not call this function repeatedly. It's only 1 bit of information.
The ABI issue is a challenge though. in master
the announce entries have been separated into internal and exported types, so it's easy to add internal fields without affecting the ABI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only one bit, could it be put in padding, like verified
? Or could that cause some issues?
src/torrent.cpp
Outdated
bool announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers); | ||
bool announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this repo policy on const
but I would suggest adding them anyway:
bool announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers); | |
bool announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers); | |
bool const announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers); | |
bool const announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I'm used to rust, where things are const by default.
Currently, I'm seeing a fair amount of CPU usage in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from the is_i2p_url()
cache, I think it needs some more considerations
@@ -12310,7 +12312,10 @@ namespace { | |||
{ | |||
announce_with_tracker(r.event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this call fails, update_tracker_timer()
won't be called, with your patch. I'm not sure exactly what the consequence of that is, but it's possible the tracker will be tried over and over again in quick succession.
I would feel better about this change if you would check the return value here and call update_tracker_timer()
if it fails. Or perhaps even better, make announce_with_tracker()
update the timer unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torrent::announce_with_tracker
returns void
, so there's no way to know if it fails.
In all cases where torrent::announce_with_tracker
returns early (fails), there are either no trackers, or libtorrent isn't announcing. In both cases, torrent::update_tracker_timer
would do nothing anyway, so I don't think it matters.
@@ -400,12 +401,21 @@ namespace libtorrent { | |||
|
|||
bool is_i2p_url(std::string const& url) | |||
{ | |||
static std::unordered_map<std::string, bool> cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this cache is the only controversial part of this patch. It seems more natural to record whether a tracker is i2p or not once, and not call this function repeatedly. It's only 1 bit of information.
The ABI issue is a challenge though. in master
the announce entries have been separated into internal and exported types, so it's easy to add internal fields without affecting the ABI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last commit appears to be independent of the other changes. It should probably be a separate PR. Also, the same pattern repeats 3 times, it would be good to factor that out somehow
src/torrent.cpp
Outdated
@@ -2915,21 +2915,36 @@ namespace { | |||
|
|||
// update the endpoint list by adding entries for new listen sockets | |||
// and removing entries for non-existent ones | |||
std::vector<aux::announce_endpoint>::iterator start = aeps.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<aux::announce_endpoint>::iterator start = aeps.begin(); | |
auto start = aeps.begin(); |
src/torrent.cpp
Outdated
} | ||
else | ||
{ | ||
for (std::vector<aux::announce_endpoint>::iterator aep = start + 1; aep != aeps.end(); ++aep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (std::vector<aux::announce_endpoint>::iterator aep = start + 1; aep != aeps.end(); ++aep) | |
for (auto aep = start + 1; aep != aeps.end(); ++aep) |
Good point, I've dropped the commit, I'll open a new PR with it when this one closes. |
As mentioned in #7894, libtorrent's CPU usage is higher than it could be. This PR will get rid of some inefficiencies:
is_i2p_url
is called thousands of times per second, with the same urls, mainly bytorrent::announce_with_tracker
andrefresh_endpoint_list
.is_i2p_url
parses the given url, just to check the end of the hostname. This PR speeds it up by ~70% for cached urls (most of them).Note:
torrent::announce_with_tracker
callsrefresh_endpoint_list
right before callingis_i2p_url
itself, essentially meaningis_i2p_url
is always called twice. We could halve it's CPU usage again by just calling it once and passing the result torefresh_endpoint_list
, although this would require adding an argument that only exists withTORRENT_USE_I2P
(I'm not sure how this repo feels about that, but it would increase efficiency).Speaking of
refresh_endpoint_list
, I noticed that the firststd::swap
was often (pretty much all of the time) called to swap an element with itself. This caused a fairly large amount of CPU usage, since it doesn't look likestd::swap
checks for this case. I've added a simple check to prevent this.Another thing mentioned in the issue is that
torrent::announce_with_tracker
andtorrent::update_tracker_timer
callsession_settings::get_bool
many times, nested 3 loops deep. This was horrible, sincesession_settings::get_bool
locks and unlocks a mutex on every call. I've moved these to before the loop to remedy this.Finally, the biggest issue now is in
torrent::update_tracker_timer
andtorrent::announce_with_tracker
, which search throughlisten_socket_states
for every endpoint of every tracker. I've made this slightly better by using astd::map
(can't use astd::unordered_map
sincestd::weak_ptr
can't be hashed) instead of astd::vector
. It's still the biggest issue, and I'd like to see if there's a way to reduce the amount of searching that goes on, since, at least on my machine, every tracker has the same 5 endpoints anyway.