Skip to content

Allow wildcards for importing logs #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 22, 2020
Merged

Conversation

geographika
Copy link
Contributor

@geographika geographika commented Jun 26, 2017

Related to #104 - the docs say this is possible, but I can't see from the code how this is implemented.

@mattab mattab added this to the Current sprint milestone Jun 26, 2017
@mattab
Copy link
Member

mattab commented Jun 26, 2017

@geographika I thought it is implemented by:


        for excluded_path in config.options.excluded_paths:
            if fnmatch.fnmatch(hit.path, excluded_path):
return False

in https://github.com/piwik/piwik-log-analytics/blob/master/import_logs.py#L1990-L1992 or was i maybe mistaken?

@geographika
Copy link
Contributor Author

geographika commented Jun 26, 2017

Thanks for the quick reply! Apologies for the first incorrect pull request (I left a print statement in).

That only seems to be checking the --excluded_paths option, whereas the docs seem to imply you can do the following to import all .log files in a folder (which didn't work for me prior to the changes in this pull request):

python import_logs.py *.log

The files to process are gathered by:

self.options, self.filenames = option_parser.parse_args(sys.argv[1:])

And then looped through later on with:

        for filename in config.filenames:
            parser.parse(filename)

The check_path function (that uses excluded_path) seems to be called when parsing the lines in a file.

@mattab
Copy link
Member

mattab commented Jun 26, 2017

I see - it was probably working already here because linux bash will expand the *log and write the actual file names.

Would it be possible to use the fnmatch instead of glob for consistency with the other wildcards?

@geographika
Copy link
Contributor Author

That would explain it. Yes I'm running this on a Windows server.

I'd prefer glob as it has the advantage of being able to get files in subdirectories, whereas this would all need to be coded with fnmatch and os.lisdir and some recursion (glob uses fnmatch internally).

The check for a * wildcard is a bit hacky though. Could any log filenames on linux contain a *? This isn't possible on Windows, but is technically possible on Unix - https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words

import_logs.py Outdated
@@ -824,7 +825,10 @@ def _parse_args(self, option_parser):
if not self.filenames:
print(option_parser.format_help())
sys.exit(1)


if len(self.filenames) == 1 and '*' in self.filenames[0]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why writing len(self.filenames) == 1, could we apply the glob when there are are more than 1 pattern specified?

@tsteur tsteur changed the base branch from master to 3.x-dev January 13, 2020 22:46
@diosmosis diosmosis merged commit 7150e74 into matomo-org:3.x-dev Jun 22, 2020
@innocraft-automation innocraft-automation removed this from the Current sprint milestone Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants