Skip to content

Conversation

tomasz1986
Copy link
Member

@tomasz1986 tomasz1986 commented Aug 29, 2024

gui: Fix incorrect UI language auto detection (fixes #9668)

Currently, the code only checks whether the detected language partially
matches one of the available languages. This means that if the detected
language is "fi" but among the available languages there is only "fil"
and no "fi", then it will match "fi" with "fil", even though the two are
completely different languages.

With this change, the matching is only done when there is a hyphen in
the language code, e.g. "en" will match with "en-US".

Signed-off-by: Tomasz Wilczyński twilczynski@naver.com

Currently, the code only checks whether the detected language partially
matches one of the available languages. This means that if the detected
language is "fi" but among the available languages there is only "fil"
and no "fi", then it will match "fi" with "fil", even though the two are
completely different languages.

With this change, the matching is only done when there is a hyphen in
the language code, e.g. "en" will match with "en-US", and the other way
round, "en-US" will match with "en" (if "en-US" is not available).

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Copy link
Member

@acolomb acolomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to just strip off anything following a - character, and then doing an exact comparison?

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
@tomasz1986
Copy link
Member Author

Wouldn't it be easier to just strip off anything following a - character, and then doing an exact comparison?

Done, I think so.

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
acolomb
acolomb previously approved these changes Aug 29, 2024
@tomasz1986 tomasz1986 marked this pull request as draft August 29, 2024 11:32
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
@acolomb
Copy link
Member

acolomb commented Aug 29, 2024

Until now, we did not fall back to a lesser specified variant automatically. For example, if a browser is configured to prefer "en-US", then we should not fall back to "en" automatically, just as we don't fall back from one to another "zh-*" variant. I'd expect browsers to either automatically specify the broader codes ("en") in addition, or the user to configure them next to the more specific variant. This avoids wrong conclusions and allows an actual preference ordering. We should not introduce this fallback direction IMHO.

At least that's how I understand the existing code comment: Find the first language in the list provided by the user's browser that is a prefix of a language we have available. The actual bug is that "prefix" here means strictly a substring, while the first token before a dash should always be considered in full.

I do have some ideas how to simplify the code, but first we need to agree on that goal. Should our "en" language be matched by any "en-XX" variant that the browser sent? I think it should not.

@tomasz1986
Copy link
Member Author

tomasz1986 commented Aug 29, 2024

Until now, we did not fall back to a lesser specified variant automatically.

The current code is broken though, i.e. it does specifically that. The browser requests en-GB, but the GUI displays en (aka en-US).

For example, if a browser is configured to prefer "en-US", then we should not fall back to "en" automatically, just as we don't fall back from one to another "zh-*" variant.

It already does fall back to en (as the default language). With zh-*, the situation is a bit different, but if there is no fall back, it will just display the UI in English.

I do have some ideas how to simplify the code, but first we need to agree on that goal. Should our "en" language be matched by any "en-XX" variant that the browser sent? I think it should not.

Why not? This should happen only if the specific variant of en-XX isn't available. On the other hand, I can see the point why zh-TW should not fall back to zh-CN (as the current iteration of the PR does).

Edit: Basically, what happens in real life is that the browser requests ko, but our UI is available in ko-KR only. I think it should fall back to ko-KR in this case. In Chromium browsers, ko is the only available variant of Korean, so the user cannot specify ko-KR even if they wanted to.

@acolomb acolomb dismissed their stale review August 29, 2024 21:38

Rethink the goal

@acolomb
Copy link
Member

acolomb commented Aug 29, 2024

The current code is broken though, i.e. it does specifically that. The browser requests en-GB, but the GUI displays en (aka en-US).

That should be fixed while we're at it, IMHO.

Edit: Basically, what happens in real life is that the browser requests ko, but our UI is available in ko-KR only. I think it should fall back to ko-KR in this case.

Yes, this direction is fine. The preference basically says "Korean, but I don't care about the exact variant", and we give it the one we have.

If the preference were fr-CA, which we don't have, it's questionable whether any other French variant should be preferred over the last-resort English fallback.

If the browser says the user prefers Canadian French (fr-CA) but also accepts any other French variation, then it's okay to serve fror fr-FR if we don't have fr-CA specifically. But that choice should be left to the user (browser configuration) whether the broader variant (and therefore any other specific variant as well) is acceptable.

I for example have configured English (US) and German in Firefox. The resulting preference includes en in 2nd place automatically, after en-US. According to https://manytools.org/http-html-text/browser-language/

See here for the related specification: https://www.rfc-editor.org/rfc/rfc4647.html#section-3.3.1

@tomasz1986
Copy link
Member Author

tomasz1986 commented Aug 29, 2024

Just to get things straight then:

  1. Requested en-AU which is not available. en is available but do not fall back to it automatically. Similarly, do not fall back from zh-HK to zh, etc.
  2. Requested ko which is not available but ko-KR is available. Fall back to ko-KR automatically. If multiple variants available, go by the alphabetical order.

Is this direction correct? From less specific to more specific only?

I for example have configured English (US) and German in Firefox. The resulting preference includes en in 2nd place automatically, after en-US.

Yeah, it does this for English specifically but not for other languages. For example, if you select Chinese (Hong Kong), it requests just zh-HK. Does it also request German in Firefox when configured like that? In Chromium, only the first language from the list is requested.

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
@tomasz1986
Copy link
Member Author

I've rewritten the whole thing. Now it only matches en-GB with en-GB exactly or en with en-GB (if no en found). It doesn't match en-GB with just en.

I've also changed some variable names, mainly lang to browserLang, and matching to matchingLang, as with the previous names, I was constantly confused which one was which… I hope this is fine.

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
@tomasz1986 tomasz1986 marked this pull request as ready for review August 30, 2024 11:29
Copy link
Member

@acolomb acolomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting too complicated with the reverse sort order and all. Let's take it back to a prototype and test concrete scenarios outside of the Syncthing GUI, then discuss any results that need changing. Here's my experimentation code, tested on https://www.jsplayground.dev/

function foo(langs) {
    // Find the first language in the list provided by the user's browser
    // that is a prefix of a language we have available. That is, "en"
    // sent by the browser will match "en" or "en-US", while "zh-TW" will
    // match only "zh-TW" and not "zh" or "zh-CN".

    var i,
        browserLang,
        matching,
        locale = _defaultLocale;  // Fallback if nothing matched

    for (i = 0; i < langs.length; i++) {
        browserLang = langs[i];
        if (browserLang.length < 2) {
            continue;
        }
        matching = _availableLocales.find(function (possibleLang) {
            // The langs returned by the /svc/langs call will be in lower
            // case. We compare to the lowercase version of the language
            // code we have as well.
            possibleLang = possibleLang.toLowerCase();
            if (possibleLang.indexOf(browserLang) !== 0) {
                // Prefix does not match
                return false;
            }
            console.log("compare", browserLang, possibleLang);
            if (possibleLang.length > browserLang.length) {
                // Must match up to the next hyphen separator
                return possibleLang[browserLang.length] === '-';
            }
            console.log("same length, exact match");
            // Same length, exact match
            return true;
        });

        if (matching) {
            console.log("matching", browserLang, matching);
            locale = matching;
            break;
        }
    }
    useLocale(locale);
}

var _availableLocales = ["ar","bg","ca","ca@valencia","cs","da","de","el","en","en-GB","es","eu","fil","fr","fy","ga","hi","hu","id","it","ja","ko-KR","lt","nl","pl","pt-BR","pt-PT","ro-RO","ru","sk","sl","sv","tr","uk","zh-CN","zh-HK","zh-TW"];
var _defaultLocale = "en";
function useLocale(locale) { console.log("Result:", locale, "\n") }
foo(["zh-tw", "ko", "en-gb!"]);
foo(["zh!", "ko", "en-us"]);
foo(["zh", "ko!", "en-us"]);
foo(["pt", "!en", "de"]);

I can make a separate PR with these changes if you like and find it functional in your testing. Otherwise, please provide parameter sets that fail and what you think the result should be. Of course, you can integrate that variant into this PR as well.

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
Copy link
Member

@acolomb acolomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'm not super happy with using .filter() because it seems enough to abort on first match. But in this case at least it's easier to see all possible matches and validate the algorithm works as intended.

Some comment fixes, see below.

Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com>
@acolomb acolomb merged commit 9aa2d2c into syncthing:main Aug 31, 2024
21 checks passed
@tomasz1986 tomasz1986 deleted the tomasz86/prs/gui/fix-language-matching-#9668 branch September 3, 2024 07:19
calmh added a commit to calmh/syncthing that referenced this pull request Sep 11, 2024
* main:
  gui, man, authors: Update docs, translations, and contributors
  gui: Actually filter scope ID out of IPv6 address when using Remote GUI (ref syncthing#8084) (syncthing#9688)
  Revert "lib/fs: Put the caseFS as the outermost layer (syncthing#9648)"
  lib/api: Correct ordering of Accept-Language codes by weight (fixes syncthing#9670) (syncthing#9671)
  gui, man, authors: Update docs, translations, and contributors
  gui: Fix incorrect UI language auto detection (fixes syncthing#9668) (syncthing#9669)
  lib/model, lib/protocol: Index sending/receiving debugging (syncthing#9657)
  lib/upgrade: Send OS version header to upgrade server (syncthing#9663)
  lib/upgrade: Send OS version header to upgrade server (syncthing#9663)
  gui, man, authors: Update docs, translations, and contributors
@calmh calmh added this to the v1.27.13 milestone Sep 11, 2024
@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Sep 1, 2025
@syncthing syncthing locked and limited conversation to collaborators Sep 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Web UI defaults to Filipino on Finnish browsers.
4 participants