-
Notifications
You must be signed in to change notification settings - Fork 119
Closed
Description
Hi, thanks for providing this great tool.
In trying to use it to import a bunch of Apache log files (common/ncsa_extended), I noticed a very large number of lines were being ignored (Using Matamo 3.10.0):
$ python import_logs.py --url="https://###" --idsite=5 --log-hostname='###' --enable-http-errors --enable-http-redirects --recorders=2 --log-format-name=ncsa_extended --strip-query-string ~/domains/###/logs/*2019.tar*
This would import the following files:
Aug-2019.tar.gz Aug-2019.tar.gz.13 Aug-2019.tar.gz.2 Aug-2019.tar.gz.7 Sep-2019.tar.gz.10 Sep-2019.tar.gz.15 Sep-2019.tar.gz.2 Sep-2019.tar.gz.4 Sep-2019.tar.gz.9
Aug-2019.tar.gz.1 Aug-2019.tar.gz.14 Aug-2019.tar.gz.3 Aug-2019.tar.gz.8 Sep-2019.tar.gz.11 Sep-2019.tar.gz.16 Sep-2019.tar.gz.20 Sep-2019.tar.gz.5
Aug-2019.tar.gz.10 Aug-2019.tar.gz.15 Aug-2019.tar.gz.4 Aug-2019.tar.gz.9 Sep-2019.tar.gz.12 Sep-2019.tar.gz.17 Sep-2019.tar.gz.21 Sep-2019.tar.gz.6
Aug-2019.tar.gz.11 Aug-2019.tar.gz.16 Aug-2019.tar.gz.5 Sep-2019.tar.gz Sep-2019.tar.gz.13 Sep-2019.tar.gz.18 Sep-2019.tar.gz.22 Sep-2019.tar.gz.7
Aug-2019.tar.gz.12 Aug-2019.tar.gz.17 Aug-2019.tar.gz.6 Sep-2019.tar.gz.1 Sep-2019.tar.gz.14 Sep-2019.tar.gz.19 Sep-2019.tar.gz.3 Sep-2019.tar.gz.8
Logs import summary
-------------------
1114 requests imported successfully
317 requests were downloads
53042 requests ignored:
0 HTTP errors
0 HTTP redirects
42658 invalid log lines
0 filtered log lines
0 requests did not match any known site
0 requests did not match any --hostname
9950 requests done by bots, search engines...
434 requests to static resources (css, js, images, ico, ttf...)
0 requests to file downloads did not match any --download-extensions
After setting the --dry-run
and -ddd
options, I realised this was because it was not uncompressing the files. It took me a while to figure this out because some of the lines did make it through.
Based on the logic around import_logs.py#L2299, I assume this is because the file names don't cleanly end with .gz
. There is a numerical suffix after it.
It'd be great if these were supported as well, or if there was a way for me to specify that it should open them as gzip files.
Metadata
Metadata
Assignees
Labels
No labels