-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix language detection for UI #20376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one. Thanks for this 🎉
I assume https://github.com/matomo-org/matomo/blob/4.13.3/core/Common.php#L945-L950 the extractLanguageCodeFromBrowserLanguage
method needs to stay and only return 2 letter code? Do we maybe need to check the users of that method if they maybe also need to call a different method or maybe we did already? (I see you already changed it in few places). Is that that in tracking we only log 2 character language code?
Be also good to add various tests.
I've got a few potential cases below. May be able to provide next week more if needed
['en', 'en-US,en;q=0.8,de-DE;q=0.6,de;q=0.4'],
['de', 'de-DE;q=0.6,de;q=0.4'],
['zh-cn', 'zh-CN,zh;q=0.9'],
['fr', 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3'],
['fr', 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7'],
['de', 'de,en-US;q=0.7,en;q=0.3'],
['pt-br', 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7,ru;q=0.6'],
['hi', 'hi;en-US,en;q=0.8,q=0.6'],
['ta', 'ta-IN,ta'],
['hi', 'hi-in,hi,en-us,en'],
['th', 'th-TH,th;q=0.9,en;q=0.8'],
['zh-tw', 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7'],
['fa', 'fa,en-US;q=0.9,en;q=0.8,fa-IR;q=0.7'],
['nl', 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7'],
['it', 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7'],
Also not sure if we need to make the browser language string lowercase maybe? |
Yes Where ever the language is detected for UI I changed it to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the method description for Common::extractLanguageCodeFromBrowserLanguage()
incorrectly states that it returns the language and region string. Could this method perhaps be deprecated or a note added suggesting that extractLanguageAndRegionCodeFromBrowserLanguage()
should normally be used instead?
core/Common.php
Outdated
@@ -982,10 +982,6 @@ public static function extractLanguageAndRegionCodeFromBrowserLanguage($browserL | |||
if (in_array($langIso639 . $regionIso3166, $validLanguages)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the testExtractLanguageAndRegionCodeFromBrowserLanguage
tests are currently failing because the test data doesn't include the language.region in the valid language array, so the method is unable on this line.
For example, data from the first test is:
// browser language, valid languages, expected
['fr-ca', ['fr'], 'fr-ca'],
The test should return 'fr-ca' but is unable to because this is not in the valid languages array, so it returns 'fr' instead and the test fails.
The test data could be updated to ['fr-ca', ['fr', 'fr-ca'], 'fr-ca']
to ensure it choose the region version correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the use cases of the method extractLanguageAndRegionCodeFromBrowserLanguage
are a bit weird.
It accepts a validLanguages
parameters, but still might return any of the provided languages with an additional region.
I've now changed that behaviour. If valid languages are provided only a code can be returned that exactly matches one of the entries. Otherwise it returns a code if the language is in the list of all valid languages, possibly extended by the region.
e16452d
to
568c911
Compare
568c911
to
c154de2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Functional review done
- Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
- Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
- Security review done
- n/a Wording review done
- Code review done
- Tests updated Tests were added if useful/possible
- none Reviewed for breaking changes
- n/a Developer changelog updated if needed
- n/a Documentation added if needed
- n/a Existing documentation updated if needed
Description:
We had been using
Common::extractLanguageCodeFromBrowserLanguage
in some places for detecting the language. As this never includes the region, this kind of disabled detection for languages likezh-cn
,pt-br
,...Using
Common::extractLanguageAndRegionCodeFromBrowserLanguage
solves that problem.In addition I've also improved the extracting of language & region from language codes. Before we were not handling codes including the script at all. So if someone only provided the language code
zh_Hans_CN
, it would have falled back toen
, aszh
only is not available. The script will now be ignored/removed when detecting languages, sozh_Hans_CN
results inzh-cn
fixes #20374
fixes #20388
Review