Skip to content

Matomo tracker select queries can get slow when most visits have same visitorId #16904

@tsteur

Description

@tsteur

Say you are tracking apps and you send a userId with every tracking request. Then by default the userId will be used to generate the visitorId. It may also be a problem if for every action a new visit is forced.

Queries like this will become very slow:

explain extended SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_seconds_since_first, visitor_seconds_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, profilable, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_browser_name, config_client_type, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, campaign_content, campaign_group, campaign_id, campaign_keyword, campaign_medium, campaign_name, campaign_placement, campaign_source, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_dimension_6, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var_k5, custom_var_v5  FROM log_visit FORCE INDEX (index_idsite_idvisitor)  WHERE idsite = '4' AND visit_last_action_time <= '2020-12-07 20:34:48' AND idvisitor = '...'
                ORDER BY visit_last_action_time DESC
                LIMIT 1

Explain:
image

or
image

To workaround this performance issue set [Tracker]enable_userid_overwrites_visitorid=0 in your config/config.ini.php if userId is used. If userId is not used, there is currently no workaround.

Ideally we find a way to not needing this setting though. Maybe we could at least limit time range by adding to the where visit_last_action_time > 1 year before upper limit . It won't make it much better but at least it be slightly faster especially when the table has over the years collected many hundred thousands or millions of requests for the same visitor. In our use case we have for example for the same idVisitor 1.6M entries which is roughly 30% of all visits for the same visitor.

We used to have this lower limit in the where clause but AFAIK we removed it for Cohorts. @diosmosis do you maybe remember? To make cohorts still useful, how far back would we need to look? 1 year? Or would it be fine to only look back 6 months? I suppose we'd need to look back kind of ideally at least 1 year for the reports to be fine when you look at months for cohorts.

Any other thoughts on how to make this use case faster be appreciated

UPDATE

See my comment further below. This issue is now about changing the existing index according to https://matomo.org/faq/troubleshooting/how-do-i-improve-the-load-time-of-the-find-visitor-sql-query-when-visitors-have-many-visits/

Metadata

Metadata

Assignees

No one assigned

    Labels

    MajorIndicates the severity or impact or benefit of an issue is much higher than normal but not critical.c: PerformanceFor when we could improve the performance / speed of Matomo.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions