Skip to content

Improve performance of segment queries when multiple action segment parts are used #19897

@tsteur

Description

@tsteur

Given a segment like this:

pageUrl!@https%253A%252F%252Ftest.matomo.org%252F;pageUrl!@https%253A%252F%252Fmatomo.org%252Fevent;pageUrl!@https%253A%252F%252Fmatomo.org%252Fsearch;pageUrl!@https%253A%252F%252Fmatomo.org%252Forga;pageUrl!@https%253A%252F%252Fmatomo.org%252Ffoo;pageUrl!@https%253A%252F%252Fmatomo.org%252Fbar;pageUrl!@https%253A%252F%252Fmatomo.org%252Ftesdlweke;pageUrl!@https%253A%252F%252Fmatomo.org%252Fhello-world;pageUrl!@https%253A%252F%252Fmatomo.org%252Fvideo;pageUrl!@https%253A%252F%252Fmatomo.org%252Fblog;pageUrl!@https%253A%252F%252Fmatomo.org%252F...;pageUrl!@https%253A%252F%252Fmatomo.org%252Fabout;pageUrl!@https%253A%252F%252Fmatomo.org%252Fcart

The resulting query takes many hours to execute even on a smaller data set (on Aurora). In this case there are around:

  • 75K log_action entries
  • 20K log_visit entries
  • 1M log_link_visit_action entries

It's not too much data and wouldn't be expected that this query takes that long.

The resulting where clause looks roughly like this:

archive

See #19581 and #8850 and #8867 we know that action name queries are slow. However, we could see a big improvement for these kind of queries if we may be able to merge them into one queries instead of multiple full table scans.

Maybe we can do something like where idaction_url in (select idaction from log_action where type = 1 and (name like '%...%' or name like '%...%' or name like '%...%'...)) or if the segment definition was different like where idaction_url in (select idaction from log_action where type = 1 and (name not like '%...%' and name not like '%...%' and name not like '%...%'...)).

The goal would be to minimise the amount of full table scans as much as possible and group the queries together as much as possible so that the query completes a lot faster.

Above is running slow on a custom report with a segment filter attached but the same should be happening if this was a regular segment. Let me know if there are any questions or if anything is unclear.

Note:
As part of this PR we could remove the segment subquery cache logic enable_segments_subquery_cache (NOT the segment cache enable_segments_cache which is still needed). Since we have enable_segments_cache I can't really think of any benefit of the enable_segments_subquery_cache logic.

Note 2:
It may be easier to rewrite this to use joins instead of subqueries see the attempted solution in https://github.com/matomo-org/matomo/compare/m19897

Metadata

Metadata

Assignees

Labels

c: PerformanceFor when we could improve the performance / speed of Matomo.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions