-
Notifications
You must be signed in to change notification settings - Fork 119
support Python3 #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support Python3 #242
Conversation
Hi everybody. I am very happy that Problem DefinitionAs it is, when a user has a mismatched version of python available under SolutionMost python developers that I have seen will always specify the python version in the shebang: Doing it like this automatically leads to a relevant error message when the needed version of python is not available. And as far as I know, all distributions provide files named SuggestionMy suggestion is to use |
@sjbronner Thanks for your detailed comment. Honestly I think that the problem you are describing isn't really one.
No, the python3 script is written in a way so that you can run in with python2 and will see the proper error message (which of course needs to be worded better). And as much as I am a fan of writing linux-scripts, your solution doesn't solve the issue you describe as:
Of course adding a shebang still won't hurt, but that would be a seperate change and not have anything to do with python2/python3 |
Just had an idea on how to avoid the weird dict-related bugs: |
Thanks for your work, @Findus23! Only a bit more than 3 months left until Python 2 support is dropped: https://pythonclock.org/ After that, there will be no more security updates for Python 2, which makes running it on a production server a (growing) security risk. Regarding the dicts: Using OrderedDict for ordered dictionaries sounds like a good plan 🙂 |
this allows multiple log_writers[] in the config.ini.php
I think the script has now reached a point where it should work for all common usecases. Notable changes: Two non-python-3 related bugs were fixed (as otherwise I couldn't test it): aaf0a2d configparser is strict by default which breaks when there are multiple identical keys in the config.ini.php (like 67b7b1f the updatetoken.php is relative to config.ini.php and not relative to the python script (which can be stored anywhere) |
Okay, I am utterly confused why this test is now randomly failing: |
Do you want some help on this PR. Maybe we could add more environments test(ubuntu, mac, windows) using github actions instead of travis-ci. |
urllib2.HTTPSHandler(**https_handler_args)) | ||
opener = urllib.request.build_opener( | ||
self.RedirectHandlerWithLogging(), | ||
urllib.request.HTTPSHandler(**https_handler_args)) | ||
response = opener.open(request, timeout = timeout) | ||
result = response.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to .decode() the response, as otherwise it would be returned as bytes, which e.g. json.loads() cannot use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbay are you able to point out a test case where the current implementation would fail? Tests are currently running, so would be good to add a test if possible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current test suite only tests a small part of the import script. It doesn't actually try to import hits to a Matomo instance, which is where the issue lies.
Actually adding such tests would be a massive task, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I see. Would that be covered by the tests done on Matomo side? It uses the log importer in some test fixtures to import log files: https://github.com/matomo-org/matomo/blob/3.x-dev/tests/PHPUnit/Fixtures/ManySitesImportedLogs.php#L110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so!
I've also tested the script myself today (with the 2 aforementioned changes), it seems to work fine, at least for my use case. It has successfully parsed/recorded several millions lines. Using Python 3.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome. Would you mind to add your changes as suggestions, so we can simply apply them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know you could do suggestions on code reviews! That's nice.
Due to https://bugs.python.org/issue13989, it won't work, you can change it to:
|
|
||
self.code = code | ||
|
||
class RedirectHandlerWithLogging(urllib2.HTTPRedirectHandler): | ||
|
||
class MatomoHttpUrllib(MatomoHttpBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why making 2 classes (MatomoHttpBase
and MatomoHttpUrllib
)? No other class inherits MatomoHttpBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I planned to create a second (far simpler) Http class that uses requests for sending HTTP requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's much to gain having 2 similar implementations. In any case, inheriting just for the Error
class seems pointless, that class could be global anyway.
try: | ||
line = line.decode(config.options.encoding) | ||
except UnicodeDecodeError: | ||
invalid_line(line, 'invalid encoding') | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep that code and open the files in binary mode instead, which lets us keep the encoding
option and the invalid encoding
statistic.
Co-Authored-By: Cyril B. <cbay@alwaysdata.com>
guess we can close this one, as all changes are already included in 4.x-dev. Not sure if we plan to support Python3 for older Matomo version as well... |
@sgiehl I think it should be possible to send data from the latest 4.x-dev log analytics script to a 3.x Matomo instance, so one can use python3 with Matomo 3. |
followup to #205 (done from scratch)
fixes #3
fixes #243
Making the code python3-compatible is easy: 2to3 does most of the tedious things, some minor changes have to be applied and nosetests gets replaced with pytest
One could of course say "all tests are passing, so this can be merged", but I feel like this may get much more complex and could surface some hard-to-notice bugs:
For example somewhere in this function it is assumed that dictionaries are always returned in the same order.
But unlike in PHP, python dictionaries are always unordered and therefore this code may print
1\n2
, but may also print2\n1
.While this difference was mostly in theory for python2, it also happens in practice with python3 and therefore this test randomly fails to correctly decode the query string in 3.5.
Funfact: Starting with 3.7 dicts are officially ordered in python.
I cought this issue randomly as it occurs in the test on travis, but I have no idea how many similar cases there are in the rest of the code.
As I have never used loganalytics and I honestly don't understand large portions of the code, this needs to be tested extensively (with all 69 options, as things may break in subtle ways as e.g. GET parameters tracked slightly incorrect) and ideally read by someone with far more python (and especially python2) experience.
BTW: I targeted Python 3.5, 3.6 and 3.7 as these are the versions mostly used and are easy to support at the same time.