Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use get_slicec instead of get_slice for single character splitters #99321

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AThousandShips
Copy link
Member

This method exists for this dedicated purpose and is faster for these kinds of input

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me.

@AThousandShips AThousandShips force-pushed the use_get_slicec branch 2 times, most recently from ee50098 to 8d2cbf7 Compare January 1, 2025 12:39
Copy link
Contributor

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@syntaxerror247 syntaxerror247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@AThousandShips AThousandShips requested a review from a team as a code owner January 8, 2025 18:36
@AThousandShips AThousandShips modified the milestones: 4.4, 4.5 Jan 28, 2025
@Repiteo
Copy link
Contributor

Repiteo commented Feb 14, 2025

I don't doubt that this is a performance improvement, as this is swapping to the function specialized for this input, but we should really make a habit of including metrics with performance PRs (and/or the metrics of the PR adding this function)

@AThousandShips
Copy link
Member Author

AThousandShips commented Feb 14, 2025

These are difficult to measure in situ, as they are scattered over the codebase, we lack proper tools to perform such performance tests, and proper documentation for the process, so until that exists that's hard to request (the tools and documentation has been requested on multiple occasions but hasn't surfaced yet)

If such existed and was provided it would be trivial to do so and would do so

@AThousandShips AThousandShips force-pushed the use_get_slicec branch 2 times, most recently from 8dc5b1b to cac6b15 Compare February 14, 2025 14:33
@Ivorforce
Copy link
Contributor

Ivorforce commented Feb 14, 2025

we lack proper tools to perform such performance tests

You could always show the general performance difference between get_slice and get_slicec with a short benchmark. Personally, I think that's sufficient to show what kind of difference to expect :)

@AThousandShips
Copy link
Member Author

AThousandShips commented Feb 14, 2025

For that the test is trivial and the test is trivial, and runs with a clear improvement:

String.get_slice():  2678840 usec for 5000000 iterations
String.get_slicec(): 1557136 usec for 5000000 iterations

Using MRP:
test-string-get_slicec.zip

PC specifications:
  • OS: Windows 11 (build 22631)
  • GPU: AMD Radeon RX 6750 XT (Advanced Micro Devices, Inc.; 31.0.24002.92)
  • CPU: AMD Ryzen 9 7900X 12-Core Processor (24 threads)
  • RAM: 32 GB

Tested with an editor build, will test with a release build as well

For a release build (using beta3 release templates) the result is:

String.get_slice():  398462 usec for 5000000 iterations
String.get_slicec(): 345584 usec for 5000000 iterations

@Ivorforce
Copy link
Contributor

Ivorforce commented Feb 14, 2025

Well, this benchmark isn't entirely fair because you're calling from GDScript, i.e. String::get_slice(String) instead of String::get_slice(char*) which you're replacing in this PR. A lot of the difference from your test can be attributed to String allocation. I'm expecting more of a 1.2x difference to be realistic.

@AThousandShips
Copy link
Member Author

This doesn't involve any string allocations, as it is a string literal in GDScript no string should be allocated on calling this (if it is that's a major bug in GDScript)

@Ivorforce
Copy link
Contributor

Ivorforce commented Feb 14, 2025

This doesn't involve any string allocations, as it is a string literal in GDScript no string should be allocated on calling this (if it is that's a major bug in GDScript)

Interesting, that changes things!
Still, I'd caution to call code directly from C++ - or rather, in the exact same way as is replaced in the PR. Otherwise, you always risk unaccounted factors. I usually put code in test_main.cpp like so:

// Setup
String s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

auto t0 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 100000; i ++) {
	String s1 = s.get_slice(",", 0);
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";

@AThousandShips
Copy link
Member Author

I don't have such a setup prepared at the time being so can't test it until I set something up that would be reasonably isolated

@Ivorforce
Copy link
Contributor

Ivorforce commented Feb 14, 2025

I don't have such a setup prepared at the time being so can't test it until I set something up that would be reasonably isolated

We don't need to discuss your setup here - but I would be amiss not to mention that anywhere you can compile and run code is fine to benchmark on!

Anyway, I'm happy to fill in - i ran this benchmark (debug build, -O2):

Test Code
	String s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

	// warmup
	s.get_slice(",", 0);

	{
		auto t0 = std::chrono::high_resolution_clock::now();
		for (int i = 0; i < 20000000; i ++) {
			String s1 = s.get_slice(",", 0);
		}
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}

	// warmup
	s.get_slicec(',', 0);

	{
		auto t0 = std::chrono::high_resolution_clock::now();
		for (int i = 0; i < 20000000; i ++) {
			String s1 = s.get_slicec(',', 0);
		}
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}

I got 1291ms for get_slice and 986ms for get_slicec. So the performance diff for in-situ code can be about 1.3x :)

@AThousandShips
Copy link
Member Author

Thank you for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants