Skip to content

Support optional sub-query substitution for COUNT(DISTINCT x) queries #19651

@bx80

Description

@bx80

TiDB currently cannot push COUNT(DISTINCT x) queries to TiFlash so on large datasets these queries can perform slowly and use a lot of memory. Reworking COUNT(DISTINCT x) queries to use sub-queries instead solves this issue for TiDB, but sub-queries perform ~2.5x slower on MySQL, so a hybrid solution will be required.

In order to support TiDB as an alternative database we need to provide the option for certain queries to be generated using sub-queries instead of COUNT(DISTINCT x). This could be a beneficial optimization for other databases too so it would be good to implement in a generic manner.

Suggested approach

  • Identify performance sensitive archiving queries that use COUNT(DISTINCT x) (33 occurrences in non-test code, mostly in core/DataAccess/LogAggregator).
  • Add a new core/Db/AdaptorInterface property preferSubqueries and implement it on the standard Matomo MySQL adapter to return false. A future TiDB PDO adapter can return true.
  • For each performance sensitive query using COUNT(DISTINCT x), rework the query generation code to optionally replace COUNT(DISTINCT x) with an appropriate sub-query if the PDO adapter preferSubqueries option is set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementFor new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions