Skip to content

Problem importing w3c logs #11824

@magnus-84

Description

@magnus-84

Hello

I have problems trying to import W3C logs from Incapsula services in to piwik. Below is the line i use to try to import the logfile. IP and domain info have been changed for protection.

/usr/bin/python /var/www/html/piwik/misc/log-analytics/import_logs.py --url=http://10.1.2.3 --idsite=8 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-name=w3c_extended --w3c-fields='#Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName' /root/web.log --debug --debug

Debug output below

2017-06-28 11:21:17,172: [DEBUG] Accepted hostnames: all
2017-06-28 11:21:17,172: [DEBUG] Piwik Tracker API URL is: http://10.1.2.3
2017-06-28 11:21:17,172: [DEBUG] Piwik Analytics API URL is: http://10.1.2.3
2017-06-28 11:21:17,172: [DEBUG] No token-auth specified
2017-06-28 11:21:17,172: [DEBUG] No credentials specified, reading them from "/var/www/html/piwik/config/config.ini.php"
2017-06-28 11:21:17,240: [DEBUG] Authentication token token_auth is: 90871c8584ddf2265f54553a305b6ae1
2017-06-28 11:21:17,240: [DEBUG] Resolver: static
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2017-06-28 11:21:17,343: [DEBUG] Launched recorder
2017-06-28 11:21:17,343: [DEBUG] Launched recorder
2017-06-28 11:21:17,344: [DEBUG] Launched recorder
2017-06-28 11:21:17,344: [DEBUG] Launched recorder
Parsing log /root/web.log...
2017-06-28 11:21:17,345: [DEBUG] Based on 'Fields:' line, computed regex to be (?P\d+[-\d+]+\s+[\d+:]+)[.\d]?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+"?(?P[\w*.:-])"?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P<user_agent>".?"|\S*)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P<query_string>\S*)\s+(?P\d+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)
2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Software: Incapsula LOGS API

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Version: 1.1

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Date: 28/Jun/2017 07:28:59

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName

2017-06-28 11:21:17,351: [DEBUG] Invalid line detected (line did not match): "2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

Logs import summary

0 requests imported successfully
0 requests were downloads
5 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    5 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

0 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 0.0 requests per second

Original logfile example below.

#Software: Incapsula LOGS API
#Version: 1.1
#Date: 28/Jun/2017 07:28:59
#Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName
"2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

I gues the problem is somthing in the regex? Any help would be appriciated. I have no knowledge of regex myself.

Regards
Magnus

Metadata

Metadata

Assignees

No one assigned

    Labels

    answeredFor when a question was asked and we referred to forum or answered it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions