-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multibyte support #25
Comments
Take a look at my fork, https://github.com/msteveb/linenoise, which has support for utf-8 |
Do you really need all those functions? I'm not quite familiar with the stuff but I easily fixed some of the weird problems by using mbstowcs() instead of strlen() where the length of the string is assumed equivalent to the number of characters in the string. But I couldn't find way to fix deleting of wide characters with backspace.. |
The approach here is to avoid any reliance on system support for utf-8. For example, I have systems running uClibc without locale support which can still happily run a utf-8 console over a serial port. Of course you are welcome to take a different approach. |
I have a similar issue; I tried out line-noise for a shell implementation. If I want coloured prompts, the escape-codes end up being included in the length calculation. A simpler, easier fix is to eirther:
|
I find this from mongo shell's code. I'm always annoyed by more and more CLI tools (mongo, redis-cli, node)) I use whose cursor moves weiredly when there are multibyte characters. I don't know if the others are using linenoise or something else, but I'd like to see this get fixed :-) |
I've made a modified linenoise that lets you specify the width yourself, so it's extra work for the application, but at least possible; I've been using it for about 3 months with no problems. I'll turn it into a pull request, perhaps. |
'utf-8 support' branch on my fork fixed the following UTF-8 problems that appear in the latest linenoise version 1.0:
I first tried https://github.com/msteveb/linenoise. But it is not based on the latest linenoise which supports the fantastic multiline mode. Also it doesn't support CJK wide characters and multi-code characters... |
Hello, I'm thinking about going the following route with this issue:
This way we obtain that linenoise simplicity remains almost untouched, but optionally it is both possible to support multi byte chars both with Makes sense to you? Thanks. |
@antirez, Thanks for paying attention to the multi-byte code users! The idea that you presented totally makes sense to me. I am even happier because if the linenoise library itself can give the extensibility, we could easily add other multi-byte encoding support. As you can see in my fork, the most important concept for enabling 'multi byte' support is to make a clear distinction between 'byte position/width' in text buffer and 'column position/width' on screen. Here are some examples in UTF-8:
Once we come to know the difference, it's pretty easy to handle multi-byte code correctly. You can grasp the idea from changes in the 1st commit. I applied the same principle to prompt text in the 2nd commit as well. The only place where we need to be careful is the multiline mode handling code. For instance, when the last character is wide and there is only 1 column left on the current row, that wide character doesn't fit the remaining space. So the wide character must be displayed at the beginning of the next line. This code handles it. One more thing that I did is to skip all the ANSI escape sequence characters when calculating column position/width in the 3rd commit. This change enables us to use color in the prompt text. I am really excited to see the new API in the near future. Please let me know if you have any questions on this matter. I am sure that you will do a fantastic job!! |
After researching more about dependencies between the linenoise code and the UTF-8 encoding code according to your design goal, I realized that only three functions are needed when adding other encoding support. Based on the research, I have updated my branch. Here is the diff between the linenoise head and the utf8-support branch. As you could see there, I got rid of all UTF-8 specific code completely from Here is a snippet of my current experimental API: typedef size_t (linenoisePrevCharLen)(const char *buf, size_t buf_len, size_t pos, size_t *col_len);
typedef size_t (linenoiseNextCharLen)(const char *buf, size_t buf_len, size_t pos, size_t *col_len);
typedef size_t (linenoiseReadCode)(int fd, char *buf, size_t buf_len, int* c);
void linenoiseSetEncodingFunctions(
linenoisePrevCharLen *prevCharLenFunc,
linenoiseNextCharLen *nextCharLenFunc,
linenoiseReadCode *readCodeFunc);
If users don't call Hope the post will be helpful when you design the new encoding API. I am really looking forward to it!! |
@yhirose that's a fantastic work!!! :-) I'm going to check the code and merge it. Thank you for this. |
Not merged yet? |
@antirez any progress on merging it? |
I have modified my fork (https://github.com/yhirose/linenoise/tree/utf8-support) to catch up with the recent changes made in the original linenoise such as 'hints' feature. |
Thank you very much @yhirose. You have made good code better! and my On Mon, 27 Jun 2016 18:56:45 -0700, yhirose wrote:
(https://github.com/yhirose/linenoise/tree/utf8-support) to catch up |
My fork (https://github.com/yhirose/linenoise/tree/utf8-support) now supports Unicode 9.0. |
@antirez Will you have free time in the near future to merge @yhirose's multi-byte support? Or should we switch https://github.com/hoelzro/lua-linenoise to use @yhirose's fork until then? ✌️ |
My fork (https://github.com/yhirose/linenoise/tree/utf8-support) now supports Unicode 11.0 and includes all the recent changes made in antirez/linenoise. |
My fork (https://github.com/yhirose/linenoise/tree/utf8-support) now supports Unicode 12.1. |
My fork (https://github.com/yhirose/linenoise/tree/utf8-support) now supports Unicode 13.0. |
@yhirose can jgriffiths solution for Win32 support in #8 be merged into the utf-8 support branch?? Also, you may consider merging the UTF-8 support into your main branch or moving the project into a different repository. A lot of us use it! |
@mcfriend99, thanks for your suggestion, but I am not interested in merging the Win32 specific code into this branch. My intention of this patch is to make the current linenoise code UTF8 compatible with the smallest possible effort and keeping the original linenoise code structure as much as possible. As for moving to main branch, I'll take a look at it. |
My fork (https://github.com/yhirose/linenoise/tree/utf8-support) now supports Unicode 16.0 and includes all the recent changes made in antirez/linenoise. |
Current code doesn't have support for multibyte strings, e.g. strings having unicode characters beyond ASCII range. The column shifts for refreshLine are calculated using strlen() which returns 2 instead of 1 for a 2-byte character like 'Ş' in Turkish.
The library should use mbstowcs() or other functions to get the number of characters instead of number of bytes for column processing (up, down arrows, erasing a character, etc.).
And also as those functions are LC_CTYPE dependent, either you or the applications using linenoise should call setlocale(LC_ALL, "") to set the application's locale to the system locale.
Thanks.
The text was updated successfully, but these errors were encountered: