Skip to content

Cron - Filesystem.php(430): Warning - filesize(): stat failed for [...] archive.sharedsiteids.pid #15865

@toredash

Description

@toredash

Hi,

There seems to be a race condition when using mutiple archiver jobs on the same node.

matomo/core/Filesystem.php

Lines 426 to 432 in 647ac56

if (!file_exists($pathToFile)) {
return;
}
$filesize = filesize($pathToFile);
$factor = $units[$unit];
$converted = $filesize / $factor;

Notice that the file is checked for existence before the filesize is checked.

Two archive jobs are started in a bash as such:

CONCURRENT_ARCHIVERS=2
  for i in $(seq 1 $CONCURRENT_ARCHIVERS)
  do
    (/var/www/console core:archive -vvv --concurrent-archivers=$CONCURRENT_ARCHIVERS) &
    pids+=($!)
  done

A few times a day, I get this error message for 1 of the 2 started processes:
WARNING [2020-04-27 03:39:26] 2654 /var/www/core/Filesystem.php(430): Warning - filesize(): stat failed for /var/www/tmp/climulti/archive.sharedsiteids.pid - Matomo 3.13.4. I'm not able to forcefully reproduce it, but we know if happens a few times a day since we get a notification if a archive process exits with non-zero code.

We only see this in our test environment, which has many sites but no new stats added to the sites. Run time for each archive job is <3s.

I looked through the code, and the only thing that I could spot as a potential source is this:

/**
* If there are multiple archiver running on the same node it makes sure only one of them performs an action and it
* will wait until another one has finished. Any closure you pass here should be very fast as other processes wait
* for this closure to finish otherwise. Currently only used for making multiple archivers at the same time work.
* If a closure takes more than 5 seconds we assume it is dead and simply continue.
*
* @param \Closure $closure
* @return mixed
* @throws \Exception
*/
private function runExclusive($closure)
{
$process = new Process('archive.sharedsiteids');
while ($process->isRunning() && $process->getSecondsSinceCreation() < 5) {
// wait max 5 seconds, such an operation should not take longer
usleep(25 * 1000);
}
$process->startProcess();
try {
$result = $closure();
} catch (Exception $e) {
$process->finishProcess();
throw $e;
}
$process->finishProcess();
return $result;
}

Is there something there, where if running multiple archivers that completes in less than 5s, where this can cause issues ?

I don't have a suggestion for a fix now. Attached the output from the archive processes that runs, where one of them (b) is giving a WARNING
a.log
b.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    not-in-changelogFor issues or pull requests that should not be included in our release changelog on matomo.org.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions