make unicode_width() understand more Unicode characters #679

rolandwalker · 2017-07-19T01:27:25Z

Synced up to a newer version of Markus Kuhn's wcwidth().

several more width-2 characters
many more width-0 characters
change control characters to width-0
don't change NUL but make it explicit with notes

Example improvements

︖ "PRESENTATION FORM FOR VERTICAL QUESTION MARK" was width 1, now 2
ט֑ Tet composed with "HEBREW ACCENT ETNAHTA" was width 2, now 1

jonas · 2017-07-19T02:15:10Z

It looks like this also "further" fixes the emoji test. (OK, I didn't QA that one very well).

BTW, I looked into switching to https://github.com/JuliaLang/utf8proc at some point. Do you know that one? While it is quite large/heavy it would also improve support for islower and isupper etc.

rolandwalker · 2017-07-19T02:24:02Z

Odd that I can't duplicate the Travis failure on OS X. Force-pushing a hack now where control characters are left as before.

I haven't used utf8proc. libicu is very complete, which is nice because Unicode is entirely made of edge cases. But libicu is mostly not UTF8-oriented; you have to convert to/from UTF16.

* several more width-2 characters * many more width-0 characters * change control characters to width-0 * don't change NUL but make it explicit with notes * doc some apparent bugs

rolandwalker · 2017-07-19T16:05:21Z

Your emoji test is great because it catches the issue which is now worked around and commented BUG. It may be a difficult bug to solve but it shouldn't be hard to narrow it down to a TODO test. I have noted some platforms, but it could easily be related to libiconv version, locale environment vars, etc.

In the meantime this patch should only improve correctness, where correctness is guessing what the terminal is going to do.

jonas · 2017-07-20T12:50:48Z

This is amazing. Most of the Unicode/UTF-8 code was copied from ELinks with very minimal changes. Very nice to have this improved.

rolandwalker force-pushed the unicode-width-update branch from 10b6b24 to 42173c7 Compare July 19, 2017 02:14

rolandwalker force-pushed the unicode-width-update branch 2 times, most recently from c7c9091 to e38be72 Compare July 19, 2017 03:38

rolandwalker changed the title ~~make unicode_width() understand more Unicode characters~~ WIP make unicode_width() understand more Unicode characters Jul 19, 2017

make unicode_width() understand more characters

9c80109

* several more width-2 characters * many more width-0 characters * change control characters to width-0 * don't change NUL but make it explicit with notes * doc some apparent bugs

rolandwalker force-pushed the unicode-width-update branch from e38be72 to 9c80109 Compare July 19, 2017 15:35

rolandwalker changed the title ~~WIP make unicode_width() understand more Unicode characters~~ make unicode_width() understand more Unicode characters Jul 19, 2017

jonas merged commit a090093 into jonas:master Jul 20, 2017

rolandwalker deleted the unicode-width-update branch July 20, 2017 12:56

This was referenced Jul 21, 2017

Remove variation selector workaround #682

Merged

incorporate xterm wcwidth.c to compute unicode widths #691

Merged

rolandwalker mentioned this pull request Aug 12, 2017

RFC libicu Unicode support #722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make unicode_width() understand more Unicode characters #679

make unicode_width() understand more Unicode characters #679

Uh oh!

rolandwalker commented Jul 19, 2017

Uh oh!

jonas commented Jul 19, 2017

Uh oh!

rolandwalker commented Jul 19, 2017

Uh oh!

rolandwalker commented Jul 19, 2017

Uh oh!

jonas commented Jul 20, 2017

Uh oh!

Uh oh!

make unicode_width() understand more Unicode characters #679

make unicode_width() understand more Unicode characters #679

Uh oh!

Conversation

rolandwalker commented Jul 19, 2017

Uh oh!

jonas commented Jul 19, 2017

Uh oh!

rolandwalker commented Jul 19, 2017

Uh oh!

rolandwalker commented Jul 19, 2017

Uh oh!

jonas commented Jul 20, 2017

Uh oh!

Uh oh!