Skip to content

Wrong dates used in a specific segment query and as a result barely any results would match #20441

@tsteur

Description

@tsteur

While looking at a database with a higher load I noticed an unexpected SQL query which you can find below:

INSERT IGNORE INTO logtmpsegment3a36c9f2b44ab187ef5eada20ff3a447 (idvisit) SELECT /* trigger = CronArchive */
				distinct log_visit.idvisit as idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit LEFT JOIN log_media AS log_media ON log_media.idvisit = log_visit.idvisit
			WHERE
				( log_visit.visit_last_action_time >= '2023-03-07 15:00:00'
				AND log_visit.visit_last_action_time <= '2023-03-08 14:59:59'
				AND log_visit.idsite IN ('1') )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'foo.com', '%')  AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
			WHERE
				( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'staging', '%')  AND type = 1 )) ) )) ) AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'app', '%')  AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
			WHERE
				( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'cover', '%')  AND type = 1 )) ) )) ) AND log_media.watched_time > '0' AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'foo.com', '%')  AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
			WHERE
				( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'staging', '%')  AND type = 1 )) ) )) ) AND ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'app', '%')  AND type = 1 )) ) AND ( log_visit.idvisit NOT IN (SELECT /* trigger = CronArchive */
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit
			WHERE
				( log_visit.idsite IN ('1') AND log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' )
                AND
                ( ( log_link_visit_action.idaction_url IN (SELECT idaction FROM log_action WHERE ( name LIKE CONCAT('%', 'cover', '%')  AND type = 1 )) ) )) ) )
			ORDER BY
				log_visit.idvisit ASC

This is a query for this segment:

pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover

Besides the query looking complex with heaps of subqueries (hence why it ran so long) the actual problem is the used date range where it says: log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-07 15:00:00' vs in the original part it uses log_visit.visit_last_action_time >= '2023-03-07 15:00:00' AND log_visit.visit_last_action_time <= '2023-03-08 14:59:59'. I would have expected it should use this date range in all the subqueries too. It otherwise can't really match many entries when it only considers visits from 1 second!

The site's timezone is Asia/Tokyo.

I've tried to reproduce this with a test as below but couldn't:

    public function test_getSelectQuery()
    {
        $idSite = Fixture::createWebsite('2022-01-01 00:02:03', '1', 'rerer', 'foo.com', 1, null, 'Asia/Tokyo');

        $start = Date::factory('2023-02-07 15:00:00');
        $end  =Date::factory('2023-02-08 14:59:59');
        $segmentBind = array(
            $start->toString(Date::DATE_TIME_FORMAT),
            $end->toString(Date::DATE_TIME_FORMAT),
            $idSite);
        $segmentWhere = "log_visit.visit_last_action_time >= ?
				AND log_visit.visit_last_action_time <= ?
				AND log_visit.idsite IN (". Common::getSqlStringFieldsArray([$idSite]) . ")";
        $logQueryBuilder = StaticContainer::get('Piwik\DataAccess\LogQueryBuilder');
        $logQueryBuilder->forceInnerGroupBySubselect(LogQueryBuilder::FORCE_INNER_GROUP_BY_NO_SUBSELECT);
        $segment = 'pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover';
        $segment = new Segment($segment, $idSites = array(), $start, $end);
        $query = $segment->getSelectQuery('distinct log_visit.idvisit as idvisit', 'log_visit', $segmentWhere, $segmentBind, 'log_visit.idvisit ASC');
        var_export($query);
}

The issue is reproducible whenever I start to launch archiving there for this segment. That's with PHP 8.0.

I haven't tried to reproduce this locally except for the test but I would try to:

  • create a segment with this definition in the database: pageUrl=@foo.com;pageUrl!@staging;pageUrl=@app;pageUrl!@cover. * Then launch archiving eg via ./console core:archive --concurrent-requests-per-website=1 --force-idsites=1 --force-idsegments=4
  • You should see the query being executed. If it's not reproducible, something must cause it.

Metadata

Metadata

Assignees

Labels

RegressionIndicates a feature used to work in a certain way but it no longer does even though it should.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions