Skip to content

Fix language detection for UI #20376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 22, 2023
Merged

Fix language detection for UI #20376

merged 4 commits into from
Feb 22, 2023

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Feb 17, 2023

Description:

We had been using Common::extractLanguageCodeFromBrowserLanguage in some places for detecting the language. As this never includes the region, this kind of disabled detection for languages like zh-cn, pt-br,...
Using Common::extractLanguageAndRegionCodeFromBrowserLanguage solves that problem.

In addition I've also improved the extracting of language & region from language codes. Before we were not handling codes including the script at all. So if someone only provided the language code zh_Hans_CN, it would have falled back to en, as zh only is not available. The script will now be ignored/removed when detecting languages, so zh_Hans_CN results in zh-cn

fixes #20374
fixes #20388

Review

@sgiehl sgiehl added not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Needs Review PRs that need a code review labels Feb 17, 2023
@sgiehl sgiehl added this to the 4.13.4 milestone Feb 17, 2023
@sgiehl sgiehl requested a review from tsteur February 17, 2023 08:47
Copy link
Member

@tsteur tsteur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. Thanks for this 🎉

I assume https://github.com/matomo-org/matomo/blob/4.13.3/core/Common.php#L945-L950 the extractLanguageCodeFromBrowserLanguage method needs to stay and only return 2 letter code? Do we maybe need to check the users of that method if they maybe also need to call a different method or maybe we did already? (I see you already changed it in few places). Is that that in tracking we only log 2 character language code?

Be also good to add various tests.

I've got a few potential cases below. May be able to provide next week more if needed

            ['en', 'en-US,en;q=0.8,de-DE;q=0.6,de;q=0.4'],
            ['de', 'de-DE;q=0.6,de;q=0.4'],
            ['zh-cn', 'zh-CN,zh;q=0.9'],
            ['fr', 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3'],
            ['fr', 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7'],
            ['de', 'de,en-US;q=0.7,en;q=0.3'],
            ['pt-br', 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7,ru;q=0.6'],
            ['hi', 'hi;en-US,en;q=0.8,q=0.6'],
            ['ta', 'ta-IN,ta'],
            ['hi', 'hi-in,hi,en-us,en'],
            ['th', 'th-TH,th;q=0.9,en;q=0.8'],
            ['zh-tw', 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7'],
            ['fa', 'fa,en-US;q=0.9,en;q=0.8,fa-IR;q=0.7'],
            ['nl', 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7'],
            ['it', 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7'],

@tsteur
Copy link
Member

tsteur commented Feb 17, 2023

Also not sure if we need to make the browser language string lowercase maybe?

@sgiehl
Copy link
Member Author

sgiehl commented Feb 20, 2023

Yes extractLanguageCodeFromBrowserLanguage is still used for archiving. As we only use two letter codes for the language reports.

Where ever the language is detected for UI I changed it to use extractLanguageAndRegionCodeFromBrowserLanguage instead. That should fix any issues around language codes containing a region, like pt-BR or zh-TW

@sgiehl sgiehl requested a review from tsteur February 20, 2023 13:22
Copy link
Contributor

@bx80 bx80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the method description for Common::extractLanguageCodeFromBrowserLanguage() incorrectly states that it returns the language and region string. Could this method perhaps be deprecated or a note added suggesting that extractLanguageAndRegionCodeFromBrowserLanguage() should normally be used instead?

core/Common.php Outdated
@@ -982,10 +982,6 @@ public static function extractLanguageAndRegionCodeFromBrowserLanguage($browserL
if (in_array($langIso639 . $regionIso3166, $validLanguages)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the testExtractLanguageAndRegionCodeFromBrowserLanguage tests are currently failing because the test data doesn't include the language.region in the valid language array, so the method is unable on this line.

For example, data from the first test is:

// browser language, valid languages, expected
['fr-ca',            ['fr'],          'fr-ca'],

The test should return 'fr-ca' but is unable to because this is not in the valid languages array, so it returns 'fr' instead and the test fails.

The test data could be updated to ['fr-ca', ['fr', 'fr-ca'], 'fr-ca'] to ensure it choose the region version correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the use cases of the method extractLanguageAndRegionCodeFromBrowserLanguage are a bit weird.
It accepts a validLanguages parameters, but still might return any of the provided languages with an additional region.
I've now changed that behaviour. If valid languages are provided only a code can be returned that exactly matches one of the entries. Otherwise it returns a code if the language is in the list of all valid languages, possibly extended by the region.

@sgiehl sgiehl requested a review from bx80 February 21, 2023 10:41
Copy link
Contributor

@bx80 bx80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgiehl sgiehl merged commit 98e1c3a into 4.x-dev Feb 22, 2023
@sgiehl sgiehl deleted the fixlangdetection branch February 22, 2023 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Development

Successfully merging this pull request may close these issues.

@$_SERVER['HTTP_ACCEPT_LANGUAGE'] cause error in php >8 Language is not correctly detected automatically by Matomo
3 participants