CPU usage/efficiency #7894

bolshoytoster · 2025-02-27T17:44:07Z

I'm a user of qBittorrent, and I've been experiencing high CPU usage (30-50%) recently, mostly from libtorrent's network thread.

I decided to investigate this with callgrind, and I found that ~17% of the total CPU usage was in parse_url_components, specifically in the calls to std::find, which, it seems aren't being properly inlined/optimized (yes, I'm using -O3).

I modified this to use raw pointers and memchr instead of iterators and std::find, and I've been able to get a 1.6x speed improvement (according to my crude benchmarking):

std::tuple<std::string, std::string, std::string, int, std::string>
	parse_url_components(std::string url, error_code& ec)
{
	std::string hostname; // hostname only
	std::string auth; // user:pass
	std::string protocol; // http or https for instance
	int port = -1;

	const char *at;
	const char *colon;
	const char *slash;
	const char *question_mark;
	const char *hash;
	const char *port_pos;

	// PARSE URL
	const char *start = url.data();
	const char *str_end = start + url.length();
	// remove white spaces in front of the url
	while (start != str_end && is_space(*start))
		++start;
	const char *end = (const char *) memchr(start, ':', str_end - start);
	protocol.assign(start, end);

	if (!end)
	{
		ec = errors::unsupported_url_protocol;
		goto exit;
	}
	++end;
	if (end == str_end || *end != '/')
	{
		ec = errors::unsupported_url_protocol;
		goto exit;
	}
	++end;
	if (end == str_end || *end != '/')
	{
		ec = errors::unsupported_url_protocol;
		goto exit;
	}
	++end;
	start = end;

	colon = (const char *) memchr(start, ':', str_end - start);
	at = colon ? (const char *) memchr(colon, '@', str_end - colon) : 0;
	slash = (const char *) memchr(start, '/', str_end - start);
	question_mark = (const char *) memchr(start, '?', str_end - start);
	hash = (const char *) memchr(start, '#', str_end - start);
	end = slash
		? question_mark
			? hash
				? std::min({slash, question_mark, hash})
				: std::min(slash, question_mark)
			: hash
				? std::min(slash, hash)
				: slash
		: question_mark
			? hash
				? std::min(question_mark, hash)
				: question_mark
			: hash
				? hash
				: str_end;

	if (at && at < end)
	{
		auth.assign(start, at);
		start = at;
		++start;
	}

	// this is for IPv6 addresses
	if (start != str_end && *start == '[')
	{
		port_pos = (const char *) memchr(start, ']', str_end - start);
		if (!port_pos)
		{
			ec = errors::expected_close_bracket_in_address;
			goto exit;
		}
		// strip the brackets
		hostname.assign(start + 1, port_pos);
		port_pos = (const char *) memchr(port_pos, ':', str_end - port_pos);
	}
	else
	{
		port_pos = (const char *) memchr(start, ':', str_end - start);
		if (port_pos && port_pos < end) hostname.assign(start, port_pos);
		else hostname.assign(start, end);
	}

	if (port_pos && port_pos < end)
	{
		++port_pos;
		for (auto i = port_pos; i < end; ++i)
		{
			if (is_digit(*i)) continue;
			ec = errors::invalid_port;
			goto exit;
		}
		port = std::atoi(std::string(port_pos, end).c_str());
	}

	start = end;
exit:
	std::string path_component(start, str_end);
	if (path_component.empty()
		|| path_component.front() == '?'
		|| path_component.front() == '#')
	{
		path_component.insert(path_component.begin(), '/');
	}

	return std::make_tuple(std::move(protocol)
		, std::move(auth)
		, std::move(hostname)
		, port
		, path_component);
}

There are probably faster implementations than this, but it brings down parse_url_components to ~6% of the total (still room for improvement).

Another thing I noticed is that parse_url_components is called hundreds of times per second (mostly by is_i2p_url), with the same urls, so would it be worth caching the result/optimizing is_i2p_url, since it doesn't actually require parsing the full url.

I'd like to know if this would be welcome as a PR, perhaps along with an optimization to is_i2p_url.

Another performance issue I'm looking into is comparing digest32s in the std::find_if in dht::routing_table::node_failed, which looks like it could be fixed if routing_table_node::live_nodes were a hashmap instead of a std::vector.

The text was updated successfully, but these errors were encountered:

Chocobo1 · 2025-02-28T08:34:20Z

I decided to investigate this with callgrind, and I found that ~17% of the total CPU usage was in parse_url_components, specifically in the calls to std::find, which, it seems aren't being properly inlined/optimized (yes, I'm using -O3).

Interesting! If I were to improve it, I would try these:

Try std::string::find() rather than std::find()
IMO raw pointer is a bit too low-level. It should be the last option after trying all other alternatives.
If C++17 is available, use std::string_view
Use regex. But I heard std::regex is horrible and I have no experience with boost::regex.

bolshoytoster · 2025-02-28T23:19:45Z

@Chocobo1

Try std::string::find() rather than std::find()

I just tried that, and I must've done something wrong, because somehow, my std::string::find() implementation is actually ~40% slower than the original version. Maybe because it uses indices instead of iterators.

If C++17 is available, use std::string_view

I also tried this, but the performance gain is negligible.

Use regex. But I heard std::regex is horrible and I have no experience with boost::regex.

Regex seems a bit overkill for this.

I have, however, been able to speed up is_i2p_url by ~25% by manually inlining parse_url_components and removing unneccessary bits. But the real performance improvement is from caching the results of is_i2p_url, giving a ~70% improvement to calls with cached urls (most of them).

I would look into why is_i2p_url is called so often, but 70% is good enough for me.

I could open a PR with this caching, and it would remove the need to speed up parse_url_components.

Also, is there a reason the nodes in routing_table_node can't/shouldn't be hashmaps? I suppose it would make iteration slightly slower, and it would no longer be ordered. But I've just changed bucket_t to a std::unordered_map<node_id, node_entry>, and updated everything relevant and it seems fine (although I think my implementation is a bit buggy).

bolshoytoster · 2025-02-28T23:43:25Z

After optimizing those, another issue has appeared: ~20% of the remaining total CPU usage is in session_settings::get_bool, specifically with the locking that it does. The locking in itself should be fine, but session_settings::get_bool is called several times within three loops in update_tracker_timer. There's some low hanging fruit to just move them to before the loops.

Chocobo1 · 2025-03-01T07:06:33Z

I don't have much to add and I just found this interesting post: https://gms.tf/stdfind-and-memchr-optimizations.html#is-stdfind-faster-as-memchr

@bolshoytoster
Could you share your compiler version?

I'd like to know if this would be welcome as a PR

IMO feel free to submit it. The important bits can be discussed in the PR.

bolshoytoster · 2025-03-01T13:48:19Z

Could you share your compiler version?

$ g++ --version
g++ (GCC) 14.2.1 20250207

And I build libtorrent with b2 optimization=speed inlining=on

I've opened a PR with a few more optimizations: #7898

bolshoytoster mentioned this issue Mar 1, 2025

A few optimizations to lower CPU usage #7898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU usage/efficiency #7894

CPU usage/efficiency #7894

bolshoytoster commented Feb 27, 2025 •

edited

Loading

Chocobo1 commented Feb 28, 2025 •

edited

Loading

bolshoytoster commented Feb 28, 2025 •

edited

Loading

bolshoytoster commented Feb 28, 2025 •

edited

Loading

Chocobo1 commented Mar 1, 2025 •

edited

Loading

bolshoytoster commented Mar 1, 2025 •

edited

Loading

CPU usage/efficiency #7894

CPU usage/efficiency #7894

Comments

bolshoytoster commented Feb 27, 2025 • edited Loading

Chocobo1 commented Feb 28, 2025 • edited Loading

bolshoytoster commented Feb 28, 2025 • edited Loading

bolshoytoster commented Feb 28, 2025 • edited Loading

Chocobo1 commented Mar 1, 2025 • edited Loading

bolshoytoster commented Mar 1, 2025 • edited Loading

bolshoytoster commented Feb 27, 2025 •

edited

Loading

Chocobo1 commented Feb 28, 2025 •

edited

Loading

bolshoytoster commented Feb 28, 2025 •

edited

Loading

bolshoytoster commented Feb 28, 2025 •

edited

Loading

Chocobo1 commented Mar 1, 2025 •

edited

Loading

bolshoytoster commented Mar 1, 2025 •

edited

Loading