Skip to content

Improve performance of archiving queries when a segment uses "Not contains" or "not equals" #20467

@tsteur

Description

@tsteur

Context

We're seeing slow DB queries on the cloud that take a very long time to run when a segment uses one of the in the title mentioned comparisons/operators.

Example:
Given a segment pageUrl=@foo.com;pageUrl!@baz;pageUrl=@test;pageUrl!@hello Matomo produces below complicated and slow query:

			SELECT
				distinct log_visit.idvisit as idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit LEFT JOIN log_action AS log_action_segment_log_link_visit_actionidaction_url ON log_link_visit_action.idaction_url = log_action_segment_log_link_visit_actionidaction_url.idaction
			WHERE
				( log_visit.visit_last_action_time >= ?
				AND log_visit.visit_last_action_time <= ?
				AND log_visit.idsite IN (?) )
                AND
                ( (log_action_segment_log_link_visit_actionidaction_url.name LIKE ? AND log_action_segment_log_link_visit_actionidaction_url.type = '1') AND ( log_visit.idvisit NOT IN (
			SELECT
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit LEFT JOIN log_action AS log_action_segment_log_link_visit_actionidaction_url ON log_link_visit_action.idaction_url = log_action_segment_log_link_visit_actionidaction_url.idaction
			WHERE
				( log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ? )
                AND
                ( (log_action_segment_log_link_visit_actionidaction_url.name LIKE ? AND log_action_segment_log_link_visit_actionidaction_url.type = '1') )) ) AND (log_action_segment_log_link_visit_actionidaction_url.name LIKE ? AND log_action_segment_log_link_visit_actionidaction_url.type = '1') AND ( log_visit.idvisit NOT IN (
			SELECT
				log_visit.idvisit
			FROM
				log_visit AS log_visit LEFT JOIN log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit LEFT JOIN log_action AS log_action_segment_log_link_visit_actionidaction_url ON log_link_visit_action.idaction_url = log_action_segment_log_link_visit_actionidaction_url.idaction
			WHERE
				( log_visit.idsite IN (?) AND log_visit.visit_last_action_time >= ? AND log_visit.visit_last_action_time <= ? )
                AND
                ( (log_action_segment_log_link_visit_actionidaction_url.name LIKE ? AND log_action_segment_log_link_visit_actionidaction_url.type = '1') )) ) )
			ORDER BY
				log_visit.idvisit ASC

Expected Behavior

The query is optimised so it executes a lot faster.

Steps to Reproduce (for Bugs)

  • Define a segment as described above.
  • Launch the archiving (either via core:archive or if browser archiving is enabled by requesting it in the browsers)
  • You will see in the logaggregator the query to create the temporary archive table with the complicated query.

Your Environment

  • Matomo Version: 4.13.3
  • PHP Version: 8.0
  • Server Operating System: Linux

Metadata

Metadata

Assignees

Labels

c: PerformanceFor when we could improve the performance / speed of Matomo.not-in-changelogFor issues or pull requests that should not be included in our release changelog on matomo.org.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions