-
Notifications
You must be signed in to change notification settings - Fork 44
Reimplement fontconfig backend with FFI #378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm less keen on requiring non-Rust deps, but using the standard examples/generic_families.rs
That's a lot more matches than before, which means more possible fallbacks. I'm not convinced that's a good thing though; it depends on whether there are performance implications for fallback. I suspect (provided that there is some level of filtering) most of these additional fonts won't normally do anything (neither matching glyphs nor causing overhead) anyway, but the latter is contigent on having some other level of filtering (like #377). Additional note: Testing shows that #377 merges cleanly with this, yielding more possible matches but in a manner that's okay. E.g. here's the matches for arrows (shorter than the matches for Latin):
This does solve matching |
Wanted to note that this doesn't necessarily have to be instead of the existing linux backend. We could have both and let people choose. Not saying we necessarily want to to do that in this case, but just raising it as a possibility. We could probably even abstract the backends out behind a trait if we wanted to (making it easier for users to bring their own custom backends). |
|
The fontconfig bindings themselves are entirely Rust; they don't require integrating C into the build process. You also don't need to ship fontconfig with whatever application uses Parley; it's a platform library that we can expect to be available on all graphical Linux platforms. It's similar to DirectWrite or CoreText in that regard.
Fontconfig has an option when sorting fonts by Unicode coverage to remove any fonts later in the list that don't provide any more coverage than earlier fonts. That option is already enabled in this PR, so I think all the matched fonts do genuinely provide at least a bit more Unicode coverage.
I don't want to do that in this particular case; the previous backend was very half-baked. One possibility is using the existing folder-scanning implementation (used for the Android and macOS backends) to implement a much simpler Linux backend. I don't think there is any benefit to an implementation that parses fontconfig files but deliberately avoids calling into fontconfig; AFAIK, its ABI has been stable since its introduction in the 90s and there aren't really going to be any systems that have fontconfig files but no fontconfig library. I do have a 20%-finished Rust reimplementation of fontconfig on the backburner, but even then I think dynamically linking to the platform's fontconfig is better, because there's no risk of fontconfig adding a new config option that the application's bundled implementation can't parse, or updating their cache file format. Of particular note is that the version of |
I think that this makes sense, as lame as it is aesthetically (in the abstract I mean; your code is fine!). Thank you for providing the patch. I'll give it my own review but since it is significant I'd like to see more than one reviewer. :+ ) |
I don’t have time at the moment to do a deep review but the integration looks nice and clean at a glance. I have no objections to landing this once others, particularly those that actively use *nix platforms, are satisfied with the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I've run this on a few systems and it seems to improve font selection generally, especially for scripts that I have specific preferences for.
The code all looks about right, and I appreciate the attention to detail in implementing real iterators and such.
fontique/src/backend/fontconfig.rs
Outdated
// TODO(valadaptive): this seems incomplete. check fcname.c for their | ||
// constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this, I think they are also matched case-insensitive in fcname.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was the weights, slants, and widths, then it would be something like
const SUFFIXES: &[&str] = &[
// Weights.
" Thin",
" ExtraLight",
" UltraLight",
" Light",
" Book",
" Regular",
" Medium",
" DemiBold",
" SemiBold",
" Bold",
" ExtraBold",
" UltraBold",
" Black",
" Heavy",
// Slants.
" Roman",
" Italic",
" Oblique",
// Widths.
" UltraCondensed",
" ExtraCondensed",
" Condensed",
" SemiCondensed",
" Normal",
" SemiExpanded",
" Expanded",
" ExtraExpanded",
" UltraExpanded",
];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And name.strip_suffix
would probably end up being name.trim_end_matches
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm starting to question if this is really the right approach. For instance, Monaspace Krypton's extra-light weight is named "ExLight", which isn't anywhere in fontconfig's list. Sinkin Sans includes the numeric weight in the RBIZ suffix. Lato has a "Hairline" weight.
Fontconfig provides the FC_STYLE
property on fonts which should provide the RBIZ suffix, but for e.g. Lato Hairline Italic, the family names returned from fontconfig are "Lato" and "Lato Hairline", but the FC_STYLE
s are "Hairline Italic" and "Italic"--there is nothing there that lets us strip just the "Hairline" suffix.
Looking at Lato Hairline Italic in otexplorer, its name table seems to have multiple FAMILY_NAME
s that differ by platform/encoding/language ID:
│ platform_id: 1 1 │ 00 01 │
│ encoding_id: 0 │ 00 00 │
│ language_id: 0 │ 00 00 │
│ name_id: FAMILY_NAME
│ length: 4 │ 00 04 │
│ string_offset: +195 │ 00 C3 │
│ Lato │ │
│ platform_id: 3 18 │ 00 03 │
│ encoding_id: 1 │ 00 01 │
│ language_id: 1033 │ 04 09 │
│ name_id: FAMILY_NAME
│ length: 26 │ 00 1A │
│ string_offset: +2300 │ 08 FC │
│ Lato Hairline │ │
That level of detail doesn't help us with fontconfig, which just returns both names with no other information, but does explain why it does so. I am not well-versed in OpenType but I think what we really want is the "typographic family name", which fontconfig doesn't give us.
Any ideas or preferences for what we should do here? I'm leaning towards just writing up this state of affairs in the TODO comment and punting on it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think writing up is the correct move, especially if there's no regression. FontConfig is slowly being rewritten in Rust with Fontations, so this roll cake is being eaten two slices at a time. Eventually we'll meet in the middle I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FontConfig is slowly being rewritten in Rust with Fontations
Well that's interesting, surprising and good news! Is there an announcement/plan of this anywhere, or is it just something that's quietly happening?
This isn’t a complete replacement. We’re simply providing an alternative to the FreeType component that is used to build the font database. The motivation is that we’d like to completely remove the FreeType dependency from Chromium.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing out CSS font matching in Chrome and Firefox,
font-family: "Lato Heavy"
works butfont-family: "Lato Bold"
doesn't match any fonts. Maybe we should just strip "Bold" and "Italic"? I can't find a list of which suffixes are actually stripped or which code does so.
I don’t think it’s necessary to strip Bold and Italic since those already fit into the RBIZ model.
Since browsers seem to use whatever fontconfig gives them perhaps we should do the same and just remove this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed strip_rbiz
, but noticed that FamilyNameMap::add_alias
doesn't actually alias the given family name to the given ID--it creates a new FamilyId
for the map entry. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That looks like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put up a separate PR for that.
Discovered this as part of #378; splitting it off because it affects all the fontique backends. When adding font aliases, the code was not actually using the family ID we passed in, but creating a fresh family ID each time. This meant that font aliases were all mapped to nonexistent family IDs, and all queries for a font by one of its aliases would fail. I'm a bit concerned that this has the potential to cause regressions or other general weirdness, since all font aliases went unused due to this bug. Not sure how to test that other than to just give it a spin.
fontique/src/backend/fontconfig.rs
Outdated
|
||
/// Fontconfig seems to force RBIZ (regular, bold, italic, bold italic) when | ||
/// categorizing fonts. This removes those suffixes from family names so that | ||
/// we can match on all attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don’t think this comment is true. The first family name in both font files below is the typographic family name (name ID 16) and second one is the RBIZ family name (name ID 1), and similarly the first style is the typographic style name and the second one is the RBIZ style name:
$ fc-query Lato-Hairline.ttf
Pattern has 28 elts (size 32)
family: "Lato"(s) "Lato Hairline"(s)
style: "Hairline"(s) "Regular"(s)
fullname: "Lato Hairline"(s)
$ fc-query Lato-HairlineItalic.ttf
Pattern has 28 elts (size 32)
family: "Lato"(s) "Lato Hairline"(s)
style: "Hairline Italic"(s) "Italic"(s)
fullname: "Lato Hairline Italic"(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense actually. I guess I'd assumed fontconfig added all the family names with name ID 1 (see the comment thread above). But looking into fontconfig's source, it seems it adds the typographic family first and then the RBIZ family. I think in the case of multiple names under a given ID, it adds them all and deduplicates them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the case of multiple names under a given ID, it adds them all and deduplicates them?
This usually happens with localized names, but I haven’t seen duplicated names with the same language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think waiting for a second thorough review here is viable.
I've given this a test with Xilem, and it doesn't appear to have caused any regressions. Thanks!
Yup, this should be merged. |
I added this abstraction later and forgot to clean this up
Web browsers let you use named fonts with various suffixes. For instance, `font-family: "Lato Hairline"` works just fine. We should match that behavior, which lets us get rid of this hack.
Track duplicate families by ID instead of name. Also use filter_map to add all the generic families in one go. This also lets us use the question mark operator in the iterator body, leading to much cleaner code.
GenericFamily::SystemUi
should fall back tosans-serif
#323.This was discussed on Zulip. Essentially, the current fontconfig backend only roughly approximates fontconfig's actual font matching behavior, which is very intricate and difficult to implement. This has resulted in many fonts failing to match.
This PR reimplements it entirely using FFI calls to the fontconfig library, under the assumption that any platform with fontconfig configuration files probably also has the fontconfig library installed. For bindings, I'm using
yeslogic-fontconfig-sys
with my own binding layer on top of it. Yeslogic also provides a higher-levelfontconfig
crate but it's very barebones and not useful here.I've enabled the
dlopen
feature ofyeslogic-fontconfig-sys
; this should mean that Parley-using applications should work even if fontconfig is not installed. We could probably enable the fontconfig backend on other Unixes (various BSDs, illumos, etc). The bindings are not generated at build time, so I don't think there are any issues. I am, however, unsure ifdlopen
ing libraries runs into any sandboxing concerns with Flatpak or something.I've implemented family and fallback matching to the best of my ability (I've done some manual testing, and
("Hani", None)
returns a Japanese font whereas("Hani", Some("zh"))
returns a Chinese font). However, I'm still not sure I fully understand fallback matching. If, for instance, you pass a nonexistent script or locale, it'll just return whatever font ends up as the global super-duper-fallback (Bitstream Vera on my machine).In fact, I believe Fontconfig matching will always return at least some font. For family matches, I've filtered out all families whose names don't match, but for fallback matches, I believe it will always return something. I'm not sure if there's logic somewhere else that expects nonexistent scripts/locales to return nothing and breaks if we return a bogus font instead.
All that being said, I think this is a good improvement over the current implementation.