Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot constant fold functions from <cwchar> #100929

Open
philnik777 opened this issue Jul 28, 2024 · 4 comments
Open

Cannot constant fold functions from <cwchar> #100929

philnik777 opened this issue Jul 28, 2024 · 4 comments

Comments

@philnik777
Copy link
Contributor

Clang has builtins for a lot of the <cwchar> functions and can constant evaluate them when using the __builtin_ variants, but the backend can't constant fold them. There doesn't seem to be much of a technical reason, since their <cstring> counterparts can often be constant folded, and some even have passes dedicated to them.

For example,

#include <cstddef>

auto test(wchar_t *ptr, size_t count) {
  wchar_t buffer[] = L"Banane";
  return __builtin_wmemchr(buffer, L'a', __builtin_wcslen(buffer)) != nullptr;
}

static_assert([] {
  wchar_t buffer[] = L"Banane";
  return __builtin_wmemchr(buffer, L'a', __builtin_wcslen(buffer)) != nullptr;
}());

compiles just fine, but test generates horrible code compared to what should be possible: https://godbolt.org/z/xP3vM9jPn

@AaronBallman
Copy link
Collaborator

Given how much code on Windows uses wchar_t, improving this would probably have some nice performance wins in practice.

@nikic
Copy link
Contributor

nikic commented Nov 3, 2024

From a quick look, the only wchar_t function which is optimized right now is wcslen:

Value *LibCallSimplifier::optimizeWcslen(CallInst *CI, IRBuilderBase &B) {
Module &M = *CI->getModule();
unsigned WCharSize = TLI->getWCharSize(M) * 8;
// We cannot perform this optimization without wchar_size metadata.
if (WCharSize == 0)
return nullptr;
return optimizeStringLength(CI, B, WCharSize);
}

@nikic
Copy link
Contributor

nikic commented Nov 3, 2024

@s-barannikov It occurs to me that this may also be a good way to phase in support for non-8-bit bytes in SLC. If we support wchar functions, which take 16 or 32 bit characters, then supporting 16/32-bit char would be a trivial change on top of that.

@s-barannikov
Copy link
Contributor

s-barannikov commented Nov 4, 2024

The main issue with constant folding wcs* functions is that it is often required to examine contents of the string and we don't have a data structure for this. For char we use StringRef assuming that host and target chars have the same bitwidth, see getConstantStringInfo.

To support folding wcs* functions we need this function to return a view with arbitrary-sized elements (APStringRef / TargetStringRef / TStringRef?). This data structure can be used for supporting folding the ordinary str* functions in the case of CHAR_BIT != 8. I implemented it downstream but it wasn't good enough to put it for review. In #106541 / #106542 I just bail out when it comes to examining the contents of a string.

... then supporting 16/32-bit char would be a trivial change on top of that.

Supporting wchar_t needs more work in SimplifyLibCalls.cpp than supporting non-8-bit chars: we need to add wide char functions to TLI so that we can fold e.g., wcschr -> wmemchr. Look for calls to various emit* functions in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants