Skip to content

Problem importing w3c logs #179

@sgiehl

Description

@sgiehl

Original Issue: matomo-org/matomo#11824

Hello

I have problems trying to import W3C logs from Incapsula services in to piwik. Below is the line i use to try to import the logfile. IP and domain info have been changed for protection.

/usr/bin/python /var/www/html/piwik/misc/log-analytics/import_logs.py --url=http://10.1.2.3 --idsite=8 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-name=w3c_extended --w3c-fields='#Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName' /root/web.log --debug --debug

Debug output below

2017-06-28 11:21:17,172: [DEBUG] Accepted hostnames: all
2017-06-28 11:21:17,172: [DEBUG] Piwik Tracker API URL is: http://10.1.2.3
2017-06-28 11:21:17,172: [DEBUG] Piwik Analytics API URL is: http://10.1.2.3
2017-06-28 11:21:17,172: [DEBUG] No token-auth specified
2017-06-28 11:21:17,172: [DEBUG] No credentials specified, reading them from "/var/www/html/piwik/config/config.ini.php"
2017-06-28 11:21:17,240: [DEBUG] Authentication token token_auth is: 90871c8584ddf2265f54553a305b6ae1
2017-06-28 11:21:17,240: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2017-06-28 11:21:17,343: [DEBUG] Launched recorder
2017-06-28 11:21:17,343: [DEBUG] Launched recorder
2017-06-28 11:21:17,344: [DEBUG] Launched recorder
2017-06-28 11:21:17,344: [DEBUG] Launched recorder
Parsing log /root/web.log...
2017-06-28 11:21:17,345: [DEBUG] Based on 'Fields:' line, computed regex to be (?P\d+[-\d+]+\s+[\d+:]+)[.\d]?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+"?(?P[\w*.:-])"?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P<user_agent>".?"|\S*)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P<query_string>\S*)\s+(?P\d+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)
2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Software: Incapsula LOGS API

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Version: 1.1

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Date: 28/Jun/2017 07:28:59

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName

2017-06-28 11:21:17,351: [DEBUG] Invalid line detected (line did not match): "2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

Logs import summary

0 requests imported successfully
0 requests were downloads
5 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    5 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

0 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 0.0 requests per second

Original logfile example below.

#Software: Incapsula LOGS API
#Version: 1.1
#Date: 28/Jun/2017 07:28:59
#Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName
"2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

I gues the problem is somthing in the regex? Any help would be appriciated. I have no knowledge of regex myself.

Regards
Magnus

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions