Skip to content

Improve regex for date and time in w3c formats #180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 2, 2020
Merged

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Jun 30, 2017

currently date and time needs to be directly behind each other, as otherwise the regex would break.
This change allows to have date and time somewhere in there field order, and also allows to have it wrapped in ".

might help with #179

@dinasty02091994
Copy link

I've tried this commit and receives the following after making the changes:

Traceback (most recent call last): File "/var/www/html/piwik/misc/log-analytics/import_logs_new.py", line 2392, in <module> main() File "/var/www/html/piwik/misc/log-analytics/import_logs_new.py", line 2363, in main parser.parse(filename) File "/var/www/html/piwik/misc/log-analytics/import_logs_new.py", line 2181, in parse full_path=format.get('path'), File "/var/www/html/piwik/misc/log-analytics/import_logs_new.py", line 214, in get raise BaseFormatException("Cannot find group '%s'." % key) __main__.BaseFormatException: Cannot find group 'path'.

@sgiehl
Copy link
Member Author

sgiehl commented Jul 4, 2017

@dinasty02091994 Would you mind sharing a few lines of your log file?

@dinasty02091994
Copy link

dinasty02091994 commented Jul 4, 2017

Sure no problems,

here's a couple of lines from the log, please note the the #Date line will be matched as well so what i've done for testing purposes is deleting that line from the log.

#Software: Incapsula LOGS API #Version: 1.1 #Date: 27/Jun/2017 03:17:01 #Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(DEer-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-statDE s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName "2017-06-27" "03:12:49" "7ad149a8-27b8-41cb-9013-6531c2c381ef" "Feedly" "Feed Fetcher" "false" "false" "125.125.125.125" "" "e6600383e6c84e650133b864ba218e8730f56f3140fadb7dd84bc7a9a7e2201c33d2d3f67e6080c4ae2fbc8ca16df94a800d87ed90268942fe8c43f9307a3c7b35a9cd871db32aa4fc7b8a9463366cc4" "NA" "774502" "Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)" "623000200429048660" "44850949" "DE" "HH" "Stuttgart" "www.example.de" "44.7157" "44.7157" "example" "www.example.se/Filter/RssFeed" "" "HTTP" "REQ_CACHED_VALIDATED" "784516273901863242" "" "125.125.125.125" "80" "GET" "filterType=&preFilteredCategories=0" "304" "125.125.125.125" "0" "1498533169758" "" "" "" "" "" "2017-06-27" "03:13:10" "5ffb2089-9843-4beb-8674-d07b5edd177e" "Microsoft Internet Security and Acceleration Server" "Someone Behind Proxy" "false" "false" "125.125.125.125" "" "dbf0f4349cf3d78de49f594c45d18aea7847e1f721f430076071545f8f44159522fd7aed92a4cc0ef2b32938efa0f7cac69c39c41779836a96f4ef83d38e261d22e4a42cd214641a4d14dd7cabb5cc83" "NA" "774502" "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)" "143001420321778179" "44850949" "DE" "DE" "Stuttgart" "www.example.se" "44.7504" "44.7504" "Company" "www.example.se/contentassets/20.example.pdf" "" "HTTP" "REQ_PASSED" "524691810687060197" "" "125.125.125.125" "80" "GET" "" "200" "125.125.125.125" "121170" "1498533190398" "" "" "" "" "" "2017-06-27" "03:13:10" "5ffb2089-9843-4beb-8674-d07b5edd177e" "Microsoft Internet Security and Acceleration Server" "Someone Behind Proxy" "false" "false" "125.125.125.125" "" "dbf0f4349cf3d78de49f594c45d18aea7847e1f721f430076071545f8f44159522fd7aed92a4cc0ef2b32938efa0f7cac69c39c41779836a96f4ef83d38e261d22e4a42cd214641a4d14dd7cabb5cc83" "NA" "774502" "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)" "143001420321778179" "44850949" "DE" "DE" "Stuttgart" "www.example.se" "44.7504" "44.7504" "Company" "www.example.se/contentassets/21.-example.pdf" "" "HTTP" "REQ_PASSED" "330679285999143136" "" "125.125.125.125" "80" "GET" "" "200" "125.125.125.125" "45831" "1498533190436" "" "" "" "" ""

@sgiehl
Copy link
Member Author

sgiehl commented Jul 4, 2017

thx. will check that later

@sgiehl
Copy link
Member Author

sgiehl commented Jul 4, 2017

@dinasty02091994 the error is unrelated to that change.
Your field list does not have a field that provides the path only. Nevertheless I tried to import the log by changing that, but failed due to a too big regex that ran forever.
Will check if that can be improved. But will be in another issue

@sgiehl sgiehl mentioned this pull request Jul 5, 2017
@mattab mattab modified the milestone: Current sprint Jul 10, 2017
Copy link
Member

@mneudert mneudert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Perhaps a second log with flipped or otherwise disconnected date time combination (tested manually) could be added but that might already come with other log formats.

As this creates a new match group would it be worth checking for a relevant parsing time impact here? (would not expect that, but it is regex after all...)

@mattab
Copy link
Member

mattab commented Dec 11, 2017

Feedback:

  • add new test cases with disconnected date/time,

Should be good to merge then

@sgiehl
Copy link
Member Author

sgiehl commented Dec 13, 2017

I've added a simple test

@tsteur tsteur changed the base branch from master to 3.x-dev January 13, 2020 22:46
@sgiehl sgiehl merged commit 5eaeb20 into 3.x-dev Mar 2, 2020
@sgiehl sgiehl deleted the w3cdatetime branch March 2, 2020 14:14
@innocraft-automation innocraft-automation removed this from the Current sprint milestone Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants