-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
While looking at a database with a higher load I noticed an unexpected SQL query which you can find below:
INSERT IGNORE INTO logtmpsegment3a36c9f2b44ab187ef5eada20ff3a447 (idvisit) SELECT /* trigger = CronArchive */
distinct log_visit.idvisit as idvisit
FROM
log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit LEFT JOIN log_media AS log_media ON log_media.idvisit = log_visit.idvisit
WHERE
( log_visit.visit_last_action_time >= '2023-03-07 15:00:00'
AND log_visit.visit_last_action_time <= '2023-03-08 14:59:59'
AND log_visit.idsite IN ('1') )
AND
( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'foo.com', '%') AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
log_visit.idvisit
FROM
log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE
( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
AND
( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'staging', '%') AND type = 1 )) ) )) ) AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'app', '%') AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
log_visit.idvisit
FROM
log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE
( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
AND
( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'cover', '%') AND type = 1 )) ) )) ) AND log_media.watched_time > '0' AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'foo.com', '%') AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
log_visit.idvisit
FROM
log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE
( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
AND
( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'staging', '%') AND type = 1 )) ) )) ) AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'app', '%') AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
log_visit.idvisit
FROM
log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
WHERE
( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
AND
( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'cover', '%') AND type = 1 )) ) )) ) )
ORDER BY
log_visit.idvisit ASC
This is a query for this segment:
pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover
Besides the query looking complex with heaps of subqueries (hence why it ran so long) the actual problem is the used date range where it says: log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00'
vs in the original part it uses log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-08 14:59:59'
. I would have expected it should use this date range in all the subqueries too. It otherwise can't really match many entries when it only considers visits from 1 second!
The site's timezone is Asia/Tokyo
.
I've tried to reproduce this with a test as below but couldn't:
public function test_getSelectQuery()
{
$idSite = Fixture::createWebsite('2022-01-01 00:02:03', '1', 'rerer', 'foo.com', 1, null, 'Asia/Tokyo');
$start = Date::factory('2023-02-07 15:00:00');
$end =Date::factory('2023-02-08 14:59:59');
$segmentBind = array(
$start->toString(Date::DATE_TIME_FORMAT),
$end->toString(Date::DATE_TIME_FORMAT),
$idSite);
$segmentWhere = "log_visit.visit_last_action_time >= ?
AND log_visit.visit_last_action_time <= ?
AND log_visit.idsite IN (". Common::getSqlStringFieldsArray([$idSite]) . ")";
$logQueryBuilder = StaticContainer::get('Piwik\DataAccess\LogQueryBuilder');
$logQueryBuilder->forceInnerGroupBySubselect(LogQueryBuilder::FORCE_INNER_GROUP_BY_NO_SUBSELECT);
$segment = 'pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover';
$segment = new Segment($segment, $idSites = array(), $start, $end);
$query = $segment->getSelectQuery('distinct log_visit.idvisit as idvisit', 'log_visit', $segmentWhere, $segmentBind, 'log_visit.idvisit ASC');
var_export($query);
}
The issue is reproducible whenever I start to launch archiving there for this segment. That's with PHP 8.0.
I haven't tried to reproduce this locally except for the test but I would try to:
- create a segment with this definition in the database:
pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover
. * Then launch archiving eg via./console core:archive --concurrent-requests-per-website=1 --force-idsites=1 --force-idsegments=4
- You should see the query being executed. If it's not reproducible, something must cause it.