Skip to content

archiving job runs forever due to MultiChannelConversionAttribution #17185

@Pilot001

Description

@Pilot001

After upgrading our testing system (with very little traffic) from 3.14.1 to 4.1.1 our monitoring system alerted due to a rapid increase of the database binary logs. Investigation showed that the archive cron job (running once an hour), which usually takes a few seconds to finish, was running about 2 hours (with 2GB max_memory) and then stops because of out of memory.
Next time, the job started, it invalidated the previously created archive and started again. We already tried to invalidate it manually, but it did not help.
Running the job manually with -vvv we see a strange some strange things...
Here is the debug output of the run (just the last lines):

DEBUG [2021-02-03 12:36:33] 91274  Starting archiving for ?module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1
DEBUG [2021-02-03 12:36:34] 91274  Running command: /usr/bin/php -q -c /srv/apache/php-fcgi/piwik.ini -d memory_limit=4G /srv/apache/htdocs/console climulti:request -q --matomo-domain='piwik.local' --superuser 'module=API&method=CoreAdminHome.archiveReports&idSite=5&period=year&date=2021-01-01&format=json&plugin=MultiChannelConversionAttribution&trigger=archivephp&pluginOnly=1&pid=577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390&runid=91274' > /srv/apache/htdocs/tmp/climulti/577e08ac369c7b794c3b4f77df3cd96c35b6d4edf4e70841f96167b72eba602030f37cfd84d04392163627384568ab62ef390.output 2>&1 &

Meanwhile the job generates csv files in tmp directory, e. g. "archive_blob_2021_01-1e61491d2fbb205b6f8eeafd01da42d4.csv" (even here, just a few lines):

"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_11_prior90"     "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior7"      "x�K�2���O�"
"129277"        "5"     "2020-10-09"    "2020-10-09"    "1"     "2021-02-03 12:48:30"   "MultiChannelConversionAttribution_channelTypes_12_prior30"     "x�K�2���O�"

On one hand, the dates in columns 3 and 4 are out of the time period that the job should be working on (2021-01-01 to 2021-02-03). And when we continue to watch these csv files, we see that the timestamp in column 6 is always just a few seconds behind current time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions