Record unicode cmap encodings when one glyph is in multiple slots #4586

skef · 2021-01-21T10:06:50Z

The problem addressed by this PR is referenced in multiple bugs, including #1534, #1831, #3085 and possibly #3080. However, the scope of the problem and the complete fix is as yet unclear.

skef · 2021-01-21T10:16:27Z

@HinTak

I know there is a complex and perhaps unpleasant history with this family of issues but I'm hoping you can help me out with taking a look at a potential fix. Keep in mind that I know next to nothing about CID mapped fonts so you might have to slowly walk me through steps like verifying problems and/or fixes in the UI.

Since you've made at least one PR I assume you can compile FontForge. The code for this PR is in this branch: https://github.com/skef/fontforge/tree/bg1831 , which you can look at by cloning https://github.com/skef/fontforge and doing a git checkout bg1831 "inside" the repository.

Based on preliminary testing (of Open/Generate pairs) this seems to resolve the noted problem in #1831 and I suspect it gets most or all of the way with #1534. But again I'm very green with this CID stuff.

HinTak · 2021-01-21T16:23:21Z

The code seems to be going in the right direction. You seems to have gone towards recording the multiples in a one-way linked list. That's probably good enough as these cases are supposed to be rare.

It is a complicated problem though as some operations would de-duplicate - especially given that fontforge is a glyph-level editor; e.g. If one wants to create a new glyph for one of a pair; and the reverse (one want to drop one of the glyphs and code two code points to the same other glyph). And how to present this in the UI level.

After you have made the recording / playback and simple transcoding robust, you can start thinking about how one might modifying such info in a scripting way, then onto GUI?

I just realize while writing the above that even simple transcoding is not simple: new encoding may be smaller and not have one of the mappings, so do you keep a dangling slot linked...

There is also how you might save this info in sfd(?).

skef · 2021-01-22T00:07:34Z

@HinTak I should have supplied more context about this fix.

The "altuni" functionality and interface already exists. The main UI is in the "Alternate Unicode Encodings" section of the "Unicode" (first) tab of the "Glyph Info..." dialog. The python interface is described here. (I don't know if there is a native FontForge scripting interface -- at first glance there doesn't seem to be.)

All this PR does is add the code to make use of it on ttf/otf import. I don't know why it wasn't doing so before except that looking in the repository history the code was very early and perhaps GW never got around to addressing the case.

I'm pretty sure there's already some "deduplication" code present, much like there is for single-encoding cases. You'll have to play with it a bit. Regardless, most of what you've mentioned is already in the code base.

skef · 2021-01-22T00:14:58Z

There's more info about the UI here. I may not currently be setting all the variables correctly. (It looks like 0 would have been a better choice than -1 for Variation Selector? Or maybe not given that this is a CID font being loaded? It's a little hard to say because it's a bit of one and a bit of the other.).

(Also, to make this a complete response altuni data are already stored in the sfd file.)

HinTak · 2021-01-22T14:27:24Z

Okay... A few years ago when I looked into this I saw the altuni struct and couldn't figure out what it is supposed to do. Maybe it is just unfinished code.

skef · 2021-01-31T16:00:06Z

@HinTak These CID/altuni problems are unlikely to ever be fixed without test cases, or at least detailed guidance, from someone with more familiarity with the (practical) issues and problems. I've already run into at least one gap in the altuni implementation ( #4598 ) but I don't have a good picture for what more, if anything, is missing.

You seem to be the main interested party so far. If we work together I think it's likely we could make some progress. Otherwise this is likely to sit in the PR queue and eventually get deleted when @ctrlcctrlv does a cleanup pass.

HinTak · 2021-02-01T11:44:37Z

It is relatively simple to test the correctness of the change: the Adobe source cjk fonts are known to be exhibit this kind of behavior, and you should be able to get at the multiple mappings via the new code and compare what my freetype-py script does. I just don't have either the time nor the inclination to build fontforge from source at the moment.

frank-trampe · 2021-04-14T17:18:27Z

@skef, this pull request seems to fix real problems, and I see no significant risk of regression from it. Is there any reason we couldn't merge it now?

skef · 2021-04-14T20:58:56Z

Yeah, I guess the likelihood of this causing problems is low. The risk is not tracking what else is needed but that's always a risk.

It would be much better if we had a full idea of how the altuni system should work with test cases.

HinTak · 2021-04-14T22:22:07Z

It should be possible to rewrite part of, and compare with the other freetype-py dependent script I wrote without freetype-py .

skef · 2021-11-03T04:40:47Z

I looked at this a bit more. Fixing the altuni stuff more completely probably requires an overhaul of the Encoding code. At the very least it would require going through that stuff with a fine tooth comb. So by the "improves things and doesn't cause obvious harm" standard this could probably go in.

HinTak · 2021-11-03T09:16:44Z

As I replied earlier some months ago, it should be possible to re-write my freetype-py dependent script with the new functionality and remove the freetype-py dependency, and also run the two side by side to verify that the new functionality works correctly (if you provide a python binding to the new functionality).

skef · 2021-11-03T09:30:41Z

@HinTak The current issues have less to do with bottom-level glyph state than with having the GUI update correctly when manipulated via scripting.

fontforge/parsettf.c

jtanx · 2022-12-31T10:21:24Z

Change looks fine but needs to be rebased to fix conflicts with master

skef · 2022-12-31T11:10:24Z

Rebased

skef mentioned this pull request Feb 27, 2021

(Generally) improved cmap "format 4" heuristics #4585

Merged

skef marked this pull request as ready for review November 3, 2021 04:41

jtanx reviewed Dec 31, 2022

View reviewed changes

fontforge/parsettf.c Outdated Show resolved Hide resolved

jtanx approved these changes Dec 31, 2022

View reviewed changes

skef added 4 commits December 31, 2022 02:28

Record unicode cmap encodings when one glyph is in multiple slots

b0d2f1b

Add comment about glyph names and encodings

41902be

Correct guard and check for mapping duplicates

8c34fd6

Improve unicodeenc setting logic

e5c1b8f

skef force-pushed the bg1831 branch from b891eb9 to e5c1b8f Compare December 31, 2022 10:40

jtanx merged commit 5311711 into fontforge:master Dec 31, 2022

Record unicode cmap encodings when one glyph is in multiple slots #4586

Record unicode cmap encodings when one glyph is in multiple slots #4586

Uh oh!

Conversation

skef commented Jan 21, 2021

Uh oh!

skef commented Jan 21, 2021

Uh oh!

HinTak commented Jan 21, 2021

Uh oh!

skef commented Jan 22, 2021

Uh oh!

skef commented Jan 22, 2021

Uh oh!

HinTak commented Jan 22, 2021

Uh oh!

skef commented Jan 31, 2021

Uh oh!

HinTak commented Feb 1, 2021 via email

Uh oh!

frank-trampe commented Apr 14, 2021

Uh oh!

skef commented Apr 14, 2021

Uh oh!

HinTak commented Apr 14, 2021

Uh oh!

skef commented Nov 3, 2021

Uh oh!

HinTak commented Nov 3, 2021 via email

Uh oh!

skef commented Nov 3, 2021

Uh oh!

Uh oh!

jtanx commented Dec 31, 2022

Uh oh!

skef commented Dec 31, 2022

Uh oh!

Uh oh!