-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong cursor position with non-ascii input #1
Comments
@jsteemann Thanks for the bug report. I have tested with the original linenoise library, and confirmed that it has also the same problem on my Mac OS X. |
@yhirose : yes, original linenoise is broken in this regard as well (maybe it was even intentional to reduce lines of code in linenoise). I think it's quite complex to solve this issue properly. There is a fork of linenoise that supports UTF-8, but AFAIK it does not include all the multiline functionality from the original repository. I looked into all this a bit and think it would be quite a challenge to get these two repos together. Feel free to close this issue if you like. |
@jsteemann, I made a fork of the original linenoise and added UTF-8 support to the utf-support branch on my fork. I have tested with Here is the diff between the original code and mine: I would really appreciate it if you could please test the branch on your Linux boxes. If you think it works on Linux, I'll port it to my cpp-linenoise as well (Since cpp-linenoise has to support Windows that doesn't support UTF-8 as default encoding in the command prompt, I have to port it carefully). I have also posted my comment regarding this change on antirez/linenoise#25. You gave me quite a challenge, but it was really enjoyable. :) |
I have tried the utf8-support branch. Entering
|
By the way, thanks for all the work you did! I have also looked into what MongoDB did with linenoise. They have rewritten it in C++, and their version supports multiline-editing and UTF-8 and seems to work. They have made an extension to linenoise for handling Unicode that's licensed under AGPL. Here's the directory contained their version: https://github.com/mongodb/mongo/tree/master/src/mongo/shell To try it out on Linux, one can simply use the files linenoise* and mk_* and glue them together. That produces a working linenoise with UTF-8 support and multiline editing. It should also work on Windows, however, I did not try this myself. |
@jsteemann , Thanks for taking time and testing the branch. I am also trying to reproduce the problem with Here are the steps that I took: Here is a screen shot that I got after the step 7 took place. Do I miss something to reproduce the problem? I would really appreciate it if you could give me whatever information which you think could be helpful. Thanks for your help again! The MongoDB version looks very good. I haven't tested it yet, but the code handles UTF-8 encoding very well even including wide characters and composed characters (characters with diacritics). I am actually amazed with their code quality. Only my concern is that the version doesn't look like the original C-based linenoise code anymore, since they rewrote the original a lot with using C++. Of course I understand their circumstances, because MongoDB is a C++ project. |
I can confirm it works for me. I have probably been on the wrong branch before though I had intended to checkout the utf8-support branch. But as there have been no new commits on the utf8-support branch since Sunday, I really seem to have been on the wrong path. Sorry for that! To summarize: the branch works fine with the above example I provided, and also with several other scripts I tried (Western European Hangul, Cyrillic, Chinese and some more Japanese texts I pasted from some tests we had). あなたの素晴らしい仕事のためにありがとうございました |
Thanks for testing! When I port the UTF-8 support to cpp-linenoise, I'll post another message on this thread. |
Thanks! |
I have ported it. It works on Linux and Mac OS X at this point. When I have time, I'll try to support it on Windows as well. |
Thanks. |
That is a really impressive work!!! I am thinking of abandoning the current old codebase (antirez/linenoise, adoxa/ansicon, MSOpenTech/redis) in cpp-linenoise completely and remaking it based on this linenoise-ng in the future. Thanks for sharing the good news again. |
Whenever the input contains non-ASCII characters that consist of multibyte characters, then the cursor position is wrong.
For example, when entering
ö
(an Umlaut character), the cursor will go to the right by 2 positions, but it should advance only one character. The reason for this is that theö
consists of two bytes in UTF-8, and entering this characters causes two calls toread()
insidielinenoiseEdit()
and the linenoise version used does not seem to be Unicode-aware at all.So for UTF-8 multibyte characters it is broken, at least on all the Linuxes I tried.
The text was updated successfully, but these errors were encountered: