Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few optimizations to lower CPU usage #7898

Open
wants to merge 5 commits into
base: RC_2_0
Choose a base branch
from

Conversation

bolshoytoster
Copy link

@bolshoytoster bolshoytoster commented Mar 1, 2025

As mentioned in #7894, libtorrent's CPU usage is higher than it could be. This PR will get rid of some inefficiencies:

is_i2p_url is called thousands of times per second, with the same urls, mainly by torrent::announce_with_tracker and refresh_endpoint_list. is_i2p_url parses the given url, just to check the end of the hostname. This PR speeds it up by ~70% for cached urls (most of them).

Note: torrent::announce_with_tracker calls refresh_endpoint_list right before calling is_i2p_url itself, essentially meaning is_i2p_url is always called twice. We could halve it's CPU usage again by just calling it once and passing the result to refresh_endpoint_list, although this would require adding an argument that only exists with TORRENT_USE_I2P (I'm not sure how this repo feels about that, but it would increase efficiency).

Speaking of refresh_endpoint_list, I noticed that the first std::swap was often (pretty much all of the time) called to swap an element with itself. This caused a fairly large amount of CPU usage, since it doesn't look like std::swap checks for this case. I've added a simple check to prevent this.

Another thing mentioned in the issue is that torrent::announce_with_tracker and torrent::update_tracker_timer call session_settings::get_bool many times, nested 3 loops deep. This was horrible, since session_settings::get_bool locks and unlocks a mutex on every call. I've moved these to before the loop to remedy this.

Finally, the biggest issue now is in torrent::update_tracker_timer and torrent::announce_with_tracker, which search through listen_socket_states for every endpoint of every tracker. I've made this slightly better by using a std::map (can't use a std::unordered_map since std::weak_ptr can't be hashed) instead of a std::vector. It's still the biggest issue, and I'd like to see if there's a way to reduce the amount of searching that goes on, since, at least on my machine, every tracker has the same 5 endpoints anyway.

- Prevents unneccessary `std::copy`ing
- Moves `session_settings::get_bool` out of nested loops
- Uses a `std::map` instead of a `std::vector` for socket endpoints
  TODO: This (searching the map) is still one of the biggest CPU hotspots,
  		is there any way to use a `std::unordered_map` or even avoid searching
		so much, since, at least on my machine, all trackers have the same 5
		endpoints.
@bolshoytoster
Copy link
Author

bolshoytoster commented Mar 1, 2025

I can get a bigger boost in torrent::update_tracker_timer/torrent::announce_with_tracker by sticking to a std::vector and playing on the fact that most trackers use the same list of endpoints in the same order, by replacing

libtorrent/src/torrent.cpp

Lines 9922 to 9930 in c31d90b

for (auto const& aep : t.endpoints)
{
auto aep_state_iter = std::find_if(listen_socket_states.begin(), listen_socket_states.end()
, [&](timer_state const& s) { return s.socket == aep.socket; });
if (aep_state_iter == listen_socket_states.end())
{
listen_socket_states.emplace_back(aep.socket);
aep_state_iter = listen_socket_states.end() - 1;
}

with

			std::vector<timer_state>::iterator start = listen_socket_states.begin();
			for (auto const& aep : t.endpoints)
			{
				std::vector<timer_state>::iterator aep_state_iter;
				if (start == listen_socket_states.end())
				{
					listen_socket_states.emplace_back(aep.socket);
					aep_state_iter = listen_socket_states.end() - 1;
					start = listen_socket_states.end();
				}
				else if (start->socket == aep.socket)
				{
					aep_state_iter = start++;
				}
				else
				{
					aep_state_iter = std::find_if(start + 1, listen_socket_states.end()
						, [&](timer_state const& s) { return s.socket == aep.socket; });
					if (aep_state_iter == listen_socket_states.end())
					{
						listen_socket_states.emplace_back(aep.socket);
						aep_state_iter = listen_socket_states.end() - 1;
						start = listen_socket_states.begin();
					}
				}

It works by shifting the search's starting point whenever an endpoint was found, meaning it doesn't keep searching the same endpoints each time. In the best case, the tracker's endpoints are in the same order as listen_socket_states, and only runs the else if block. In the worst case, the endpoints are different, and it falls back to std::find__if.

This assumes that a tracker will never have duplicate endpoints.

Should I add this to the PR instead of the change to std::map, or is the code too messy?

I'd also like to add something similar in refresh_endpoint_list, since the aep.socket != s is fairly expensive and this could cut down on those.

@bolshoytoster
Copy link
Author

bolshoytoster commented Mar 1, 2025

Also, the calls to listen_socket_handle::is_ssl() in refresh_endpoint_list are quite expensive, due to the std::weak_ptr::lock. Perhaps listen_socket_t::ssl could be cached in listen_socket_handle if it never changes, or maybe listen_socket_handle could store a std::shared_ptr instead.

@bolshoytoster
Copy link
Author

Another thing I've found is that torrent::tracker_request_error calls torrent::announce_with_tracker then torrent::update_tracker_timer even though torrent::announce_with_tracker already calls torrent::update_tracker_timer. Effectively meaning it's called twice for no reason. I've prevented this by only calling torrent::update_tracker_timer if torrent::announce_with_tracker wasn't called.

Copy link
Contributor

@Chocobo1 Chocobo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary review comments. The project author may have different opinions though.

@@ -400,12 +401,21 @@ namespace libtorrent {

bool is_i2p_url(std::string const& url)
{
static std::unordered_map<std::string, bool> cache;
Copy link
Contributor

@Chocobo1 Chocobo1 Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure my concerns are valid but anyway:

  1. What do you think about putting an upper limit on how many entries in the cache? IMO it isn't ideal for it to be unlimited. I suppose 1024 would suffice for most cases. Also this implies it needs some kind of cache replacement policy, for example LRU. Maybe libtorrent/boost provides such container?
  2. If I understand correctly, using static means it will be shared across all libtorrent sessions which isn't ideal either. It should be local to a specific torrent/session or whatever scope that make sense.
  3. Maybe this function requires to be thread-safe.

Copy link
Author

@bolshoytoster bolshoytoster Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It looks like libtorrent implements an LRU cache in src/file_view_pool.cpp, but that's specific to the file pool. Boost also has an implementation, but it's under ::compute, so it can't be used here. I could implement one, but I'm not sure if it's needed - my qbittorrent uses 242 tracker urls in total, being ~8kb in text. Round it up to 10kb to account for std::unordered_map's overhead, and even 1024 urls should be only ~40kb.

2 & 3. I hadn't considered multiple threads, but I think I could just use thread_local instead of static.

I think we could get rid of this cache if announce_entry stored the result of is_i2p_url along with it's url, although I'm not sure if that could be added, since it looks like announce_entry is part of the ABI.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cache is the only controversial part of this patch. It seems more natural to record whether a tracker is i2p or not once, and not call this function repeatedly. It's only 1 bit of information.
The ABI issue is a challenge though. in master the announce entries have been separated into internal and exported types, so it's easy to add internal fields without affecting the ABI

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's only one bit, could it be put in padding, like verified? Or could that cause some issues?

src/torrent.cpp Outdated
Comment on lines 3117 to 3118
bool announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers);
bool announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this repo policy on const but I would suggest adding them anyway:

Suggested change
bool announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers);
bool announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers);
bool const announce_to_all_tiers = settings().get_bool(settings_pack::announce_to_all_tiers);
bool const announce_to_all_trackers = settings().get_bool(settings_pack::announce_to_all_trackers);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I'm used to rust, where things are const by default.

@bolshoytoster
Copy link
Author

bolshoytoster commented Mar 2, 2025

Currently, I'm seeing a fair amount of CPU usage in announce_infohash::can_announce, mostly because it's having to convert between time_point and time_point32. This could be improved if we just chose one and switched all APIs to it, but I recognize that may not be possible.

Copy link
Owner

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from the is_i2p_url() cache, I think it needs some more considerations

@@ -12310,7 +12312,10 @@ namespace {
{
announce_with_tracker(r.event);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this call fails, update_tracker_timer() won't be called, with your patch. I'm not sure exactly what the consequence of that is, but it's possible the tracker will be tried over and over again in quick succession.
I would feel better about this change if you would check the return value here and call update_tracker_timer() if it fails. Or perhaps even better, make announce_with_tracker() update the timer unconditionally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torrent::announce_with_tracker returns void, so there's no way to know if it fails.

In all cases where torrent::announce_with_tracker returns early (fails), there are either no trackers, or libtorrent isn't announcing. In both cases, torrent::update_tracker_timer would do nothing anyway, so I don't think it matters.

@@ -400,12 +401,21 @@ namespace libtorrent {

bool is_i2p_url(std::string const& url)
{
static std::unordered_map<std::string, bool> cache;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cache is the only controversial part of this patch. It seems more natural to record whether a tracker is i2p or not once, and not call this function repeatedly. It's only 1 bit of information.
The ABI issue is a challenge though. in master the announce entries have been separated into internal and exported types, so it's easy to add internal fields without affecting the ABI

Copy link
Owner

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last commit appears to be independent of the other changes. It should probably be a separate PR. Also, the same pattern repeats 3 times, it would be good to factor that out somehow

src/torrent.cpp Outdated
@@ -2915,21 +2915,36 @@ namespace {

// update the endpoint list by adding entries for new listen sockets
// and removing entries for non-existent ones
std::vector<aux::announce_endpoint>::iterator start = aeps.begin();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<aux::announce_endpoint>::iterator start = aeps.begin();
auto start = aeps.begin();

src/torrent.cpp Outdated
}
else
{
for (std::vector<aux::announce_endpoint>::iterator aep = start + 1; aep != aeps.end(); ++aep)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (std::vector<aux::announce_endpoint>::iterator aep = start + 1; aep != aeps.end(); ++aep)
for (auto aep = start + 1; aep != aeps.end(); ++aep)

@bolshoytoster
Copy link
Author

the last commit appears to be independent of the other changes. It should probably be a separate PR. Also, the same pattern repeats 3 times, it would be good to factor that out somehow

Good point, I've dropped the commit, I'll open a new PR with it when this one closes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants