-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Given a segment like this:
pageUrl!@https%253A%252F%252Ftest.matomo.org%252F;pageUrl!@https%253A%252F%252Fmatomo.org%252Fevent;pageUrl!@https%253A%252F%252Fmatomo.org%252Fsearch;pageUrl!@https%253A%252F%252Fmatomo.org%252Forga;pageUrl!@https%253A%252F%252Fmatomo.org%252Ffoo;pageUrl!@https%253A%252F%252Fmatomo.org%252Fbar;pageUrl!@https%253A%252F%252Fmatomo.org%252Ftesdlweke;pageUrl!@https%253A%252F%252Fmatomo.org%252Fhello-world;pageUrl!@https%253A%252F%252Fmatomo.org%252Fvideo;pageUrl!@https%253A%252F%252Fmatomo.org%252Fblog;pageUrl!@https%253A%252F%252Fmatomo.org%252F...;pageUrl!@https%253A%252F%252Fmatomo.org%252Fabout;pageUrl!@https%253A%252F%252Fmatomo.org%252Fcart
The resulting query takes many hours to execute even on a smaller data set (on Aurora). In this case there are around:
- 75K log_action entries
- 20K log_visit entries
- 1M log_link_visit_action entries
It's not too much data and wouldn't be expected that this query takes that long.
The resulting where clause looks roughly like this:
See #19581 and #8850 and #8867 we know that action name queries are slow. However, we could see a big improvement for these kind of queries if we may be able to merge them into one queries instead of multiple full table scans.
Maybe we can do something like where idaction_url in (select idaction from log_action where type = 1 and (name like '%...%' or name like '%...%' or name like '%...%'...))
or if the segment definition was different like where idaction_url in (select idaction from log_action where type = 1 and (name not like '%...%' and name not like '%...%' and name not like '%...%'...))
.
The goal would be to minimise the amount of full table scans as much as possible and group the queries together as much as possible so that the query completes a lot faster.
Above is running slow on a custom report with a segment filter attached but the same should be happening if this was a regular segment. Let me know if there are any questions or if anything is unclear.
Note:
As part of this PR we could remove the segment subquery cache logic enable_segments_subquery_cache
(NOT the segment cache enable_segments_cache
which is still needed). Since we have enable_segments_cache
I can't really think of any benefit of the enable_segments_subquery_cache
logic.
Note 2:
It may be easier to rewrite this to use joins instead of subqueries see the attempted solution in https://github.com/matomo-org/matomo/compare/m19897