Skip to content

mk_wcwidth will return outdated widths when glibc 2.26 (unicode 9.0) is out #720

@dequis

Description

@dequis

Unicode 9.0 changes the width of characters with emoji presentation to 2. The transition is going to suck in general, but it's not too bad for us. glibc 2.26 implements it, will be out in august or so.

mk_wcwidth implements unicode 5.0, but returning width of 1 for unknown characters, which is a great guess and an important improvement over glibc's wcwidth. Since there were no new characters with EastAsianWidth=2 in the recent versions (AFAIK, haven't checked everything), this works fine up to unicode 8.0.

The few things that depend on width calculation will be wrong if those characters are present. What I've seen is unaligned /names lists when using bitlbee-discord with utf8_nicks on (given big enough discord servers you'll get a handful of nicks with emoji, every time). Not a big deal. I haven't checked if this affects sideways splits.

We could:

  • Make this a setting to let people pick between both implementations.
  • Do a test call of the libc wcwidth() with a character that should return 2 in unicode 9.0 and 1 in 8.0 and lower, and if that's the case use that wcwidth(), wrapped to turn -1 (unknown character) into 1 (to be like mk_wcwidth)
  • Both, with "auto" as the default setting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions