Skip to content

support Python3 #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from
Closed

support Python3 #242

wants to merge 36 commits into from

Conversation

Findus23
Copy link
Collaborator

@Findus23 Findus23 commented Feb 14, 2019

followup to #205 (done from scratch)
fixes #3
fixes #243

Making the code python3-compatible is easy: 2to3 does most of the tedious things, some minor changes have to be applied and nosetests gets replaced with pytest
One could of course say "all tests are passing, so this can be merged", but I feel like this may get much more complex and could surface some hard-to-notice bugs:

For example somewhere in this function it is assumed that dictionaries are always returned in the same order.
But unlike in PHP, python dictionaries are always unordered and therefore this code may print 1\n2, but may also print 2\n1.

example = {1: "one", 2: "two"}
for number, name in example.items():
    print(number)

While this difference was mostly in theory for python2, it also happens in practice with python3 and therefore this test randomly fails to correctly decode the query string in 3.5.

Funfact: Starting with 3.7 dicts are officially ordered in python.

I cought this issue randomly as it occurs in the test on travis, but I have no idea how many similar cases there are in the rest of the code.

As I have never used loganalytics and I honestly don't understand large portions of the code, this needs to be tested extensively (with all 69 options, as things may break in subtle ways as e.g. GET parameters tracked slightly incorrect) and ideally read by someone with far more python (and especially python2) experience.

BTW: I targeted Python 3.5, 3.6 and 3.7 as these are the versions mostly used and are easy to support at the same time.

@sjbronner
Copy link

sjbronner commented Feb 20, 2019

Hi everybody. I am very happy that python3 support is seeing progress. Thanks @Findus23 for taking that on. I would like to offer a suggestion to solving the "shebang problem".

Problem Definition

As it is, when a user has a mismatched version of python available under /usr/bin/python, python will complain loudly about syntax errors or other supposed issues with the script.

Solution

Most python developers that I have seen will always specify the python version in the shebang: #!/usr/bin/python3 or #!/usr/bin/python2. The reasoning behind that is that python 3 and python 2 are considered to be different languages. While they are similar, and one grew out of the other, they are entirely incompatible and thus different languages. That leads to the notion that specifying just #!/usr/bin/python does not actually specify the language that the script should be interpreted as. It is not a sufficient specification of the interpreter needed to run the script.

Doing it like this automatically leads to a relevant error message when the needed version of python is not available. And as far as I know, all distributions provide files named /usr/bin/python2 and /usr/bin/python3. Both Ubuntu and Arch Linux do, anyway.

Suggestion

My suggestion is to use #!/usr/bin/python3 as the shebang in this script and to change the shebang in the old script intended for use in python 2 to #!/usr/bin/python2.

@Findus23
Copy link
Collaborator Author

@sjbronner Thanks for your detailed comment.

Honestly I think that the problem you are describing isn't really one.

As it is, when a user has a mismatched version of python available under /usr/bin/python, python will complain loudly about syntax errors or other supposed issues with the script.

No, the python3 script is written in a way so that you can run in with python2 and will see the proper error message (which of course needs to be worded better).

And as much as I am a fan of writing linux-scripts, your solution doesn't solve the issue you describe as:

  • it doesn't stop anyone from running the python3 script with python2 (the shebang is only read by the system, not by python)
  • it assumes python can be found in ``/usr/bin/python3` (there is nothing that forces this)
  • a common workaround is to instead use /usr/bin/env python3, but this also assumes that env is installed (in this path)
  • this also assumes that this is the python version the user wants to use (instead of a venv somewhere else)

Of course adding a shebang still won't hurt, but that would be a seperate change and not have anything to do with python2/python3

@Findus23
Copy link
Collaborator Author

Just had an idea on how to avoid the weird dict-related bugs:
Since python 3.1 the OrderedDict data-type is supported. So we could simply everywhere where the logic depends on dicts being ordered, replace the dicts with ordereddicts.

@dbrgn
Copy link

dbrgn commented Sep 5, 2019

Thanks for your work, @Findus23!

Only a bit more than 3 months left until Python 2 support is dropped: https://pythonclock.org/ After that, there will be no more security updates for Python 2, which makes running it on a production server a (growing) security risk.

Regarding the dicts: Using OrderedDict for ordered dictionaries sounds like a good plan 🙂

@Findus23
Copy link
Collaborator Author

I think the script has now reached a point where it should work for all common usecases.

Notable changes:
f5a1e28 Test on Python 3.8 beta (should work in theory)
777ff8f use ordereddict in the one case where the order of the dict is important
e9dedce, c20c576, 145ca5c, 14075e8, 60732cd comment updates and str <-> bytes fixes
3156367 major change! optparse isn't really maintained anymore and the modern replacement argparse supports some things that were originally hacks out of the box. The --help output is a bit nicer formatted by default. The arguments accepting dictionaries (like --w3c-map-field) are a mess and hardly documented. But they still work (at least getting a dict when specifying --w3c-map-field "a=1" --w3c-map-field "b=2")

Two non-python-3 related bugs were fixed (as otherwise I couldn't test it):

aaf0a2d configparser is strict by default which breaks when there are multiple identical keys in the config.ini.php (like log_writers[])

67b7b1f the updatetoken.php is relative to config.ini.php and not relative to the python script (which can be stored anywhere)

@Findus23
Copy link
Collaborator Author

Okay, I am utterly confused why this test is now randomly failing:
https://travis-ci.org/matomo-org/matomo-log-analytics/jobs/584257747#L228-L229
Especially as I changed nothing close to it.
Maybe this hack isn't that reliable:
https://github.com/Findus23/piwik-log-analytics/blob/67b7b1f1d1160aea483957f4ebd6e91082d5dd7f/import_logs.py#L2416-L2425

@benabel
Copy link

benabel commented Dec 11, 2019

Python 2.7 will not be maintained past 2020. Originally, there was no official date. Recently, that date has been updated to January 1, 2020.

Do you want some help on this PR. Maybe we could add more environments test(ubuntu, mac, windows) using github actions instead of travis-ci.

urllib2.HTTPSHandler(**https_handler_args))
opener = urllib.request.build_opener(
self.RedirectHandlerWithLogging(),
urllib.request.HTTPSHandler(**https_handler_args))
response = opener.open(request, timeout = timeout)
result = response.read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to .decode() the response, as otherwise it would be returned as bytes, which e.g. json.loads() cannot use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbay are you able to point out a test case where the current implementation would fail? Tests are currently running, so would be good to add a test if possible...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current test suite only tests a small part of the import script. It doesn't actually try to import hits to a Matomo instance, which is where the issue lies.

Actually adding such tests would be a massive task, I'm afraid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I see. Would that be covered by the tests done on Matomo side? It uses the log importer in some test fixtures to import log files: https://github.com/matomo-org/matomo/blob/3.x-dev/tests/PHPUnit/Fixtures/ManySitesImportedLogs.php#L110

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so!

I've also tested the script myself today (with the 2 aforementioned changes), it seems to work fine, at least for my use case. It has successfully parsed/recorded several millions lines. Using Python 3.4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. Would you mind to add your changes as suggestions, so we can simply apply them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you could do suggestions on code reviews! That's nice.

@cbay
Copy link
Contributor

cbay commented Jan 13, 2020

open_func = gzip.open

Due to https://bugs.python.org/issue13989, it won't work, you can change it to:

open_func = lambda filename, mode: io.TextIOWrapper(gzip.open(filename))

@sgiehl sgiehl mentioned this pull request Jan 13, 2020
@tsteur tsteur changed the base branch from master to 3.x-dev January 13, 2020 22:45

self.code = code

class RedirectHandlerWithLogging(urllib2.HTTPRedirectHandler):

class MatomoHttpUrllib(MatomoHttpBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making 2 classes (MatomoHttpBase and MatomoHttpUrllib)? No other class inherits MatomoHttpBase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I planned to create a second (far simpler) Http class that uses requests for sending HTTP requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much to gain having 2 similar implementations. In any case, inheriting just for the Error class seems pointless, that class could be global anyway.

Comment on lines -2370 to -2375
try:
line = line.decode(config.options.encoding)
except UnicodeDecodeError:
invalid_line(line, 'invalid encoding')
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep that code and open the files in binary mode instead, which lets us keep the encoding option and the invalid encoding statistic.

@sgiehl
Copy link
Member

sgiehl commented Aug 26, 2020

guess we can close this one, as all changes are already included in 4.x-dev. Not sure if we plan to support Python3 for older Matomo version as well...

@Findus23
Copy link
Collaborator Author

@sgiehl I think it should be possible to send data from the latest 4.x-dev log analytics script to a 3.x Matomo instance, so one can use python3 with Matomo 3.

@Findus23 Findus23 closed this Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python2's bz2-module does not support multi-stream files import_logs.py needs upgrade to Python 3 or updated shebang line
6 participants