Skip to content

Add option to force join order for segment queries #21636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 6, 2023

Conversation

bx80
Copy link
Contributor

@bx80 bx80 commented Dec 5, 2023

Description:

Fixes #21635

This PR adds two new config options:

[database]
enable_segment_first_table_join_prefix = 1
enable_first_table_join_prefix = 1

When enabled the first option will cause a query hint to be added to segment selection queries which will suggest the join order of the first table and prevent the MySQL query planner from choosing a different starting table for the join plan.

The second option does the same but for all queries built using LogAggregator::GenerateQuery()

Before:

SELECT 
    DISTINCT log_visit.idvisit AS idvisit
FROM log_visit AS log_visit
LEFT JOIN log_link_visit_action llva ON llva.idvisit = log_visit.idvisit
LEFT JOIN log_action la ON la.idaction = llva.idaction_url
-- etc

After:

SELECT /*+ JOIN_PREFIX(log_visit) */
    DISTINCT log_visit.idvisit AS idvisit
FROM log_visit AS log_visit
LEFT JOIN log_link_visit_action llva ON llva.idvisit = log_visit.idvisit
LEFT JOIN log_action la ON la.idaction = llva.idaction_url
-- etc

Notes:

  • This only applies to queries used to build segment temporary tables, it is not currently applied to any other queries.
  • Only the first table order is specified, the optimizer may choose any order for the other tables.
  • The config options default to disabled so this will have no affect unless enabled.
  • The hint should be ignored by database engines that do not support it, but it's not recommended to enable it unless running MySQL 8.0+.
  • A new DatabaseConfig class has been added to easily retrieve [database] config values.
  • The queryVisitsByDimension() and createSegmentTable() methods have both had their sql generation code moved to sub-methods to allow easier testing of the query sql.
  • Various tests have been added to check the addition of query hints based on config values.

Review

@bx80 bx80 added the Bug For errors / faults / flaws / inconsistencies etc. label Dec 5, 2023
@bx80 bx80 added this to the 5.0.0 milestone Dec 5, 2023
@bx80 bx80 self-assigned this Dec 5, 2023
Copy link
Contributor

@caddoo caddoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, I'm a bit nervous about there being no unit tests though for addJoinPrefixHintToQuery

Because this will be a opt-in piece of code maybe you could just add one test for this part:

$generalConfig = Config::getInstance()->General;
        if (empty($generalConfig['enable_segment_first_table_join_prefix']) || $generalConfig['enable_segment_first_table_join_prefix'] != "1") {
            return $sql;
        }

To make sure that the opt-in part is at least working

@michalkleiner
Copy link
Contributor

Code change looks good, just a question — would the hint use the actual table name or the aliased name (if there's an alias used)?

@bx80
Copy link
Contributor Author

bx80 commented Dec 5, 2023

@michalkleiner The hint uses the alias if the table has one:

If a table has an alias, hints must refer to the alias, not the table name.

It's a bit confusing here because the table alias we use is the same as the table name!

@bx80
Copy link
Contributor Author

bx80 commented Dec 5, 2023

I've updated the PR:

  • Added a second config option to apply the query join hint to all generated queries
  • Moved the segment temporary table sql generation code into a separate method to make it easier to test
  • Added three new tests to the LogAggregatorTest:
    • Test that the join hint is applied to segment table select queries if the config is enabled
    • Test that the join hint is NOT applied to segment table select queries if the config is disabled (we don't have any other tests for the segment table select so this seems necessary to ensure the hint isn't always being applied)
    • Test that the join hint is applied to a non-segment table generated query if the config is enabled

@bx80 bx80 requested review from michalkleiner and caddoo December 5, 2023 05:59
@bx80 bx80 added the Needs Review PRs that need a code review label Dec 5, 2023
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments for possible improvements. Besides that the changes look good to me 👍

Comment on lines 389 to 390
$generalConfig = Config::getInstance()->General;
if (!empty($generalConfig['enable_first_table_join_prefix']) && $generalConfig['enable_first_table_join_prefix'] == "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified by using GeneralConfig::getConfigValue('enable_first_table_join_prefix')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using the new DatabaseConfig::getConfigValue() method 👍

Comment on lines 435 to 436
$generalConfig = Config::getInstance()->General;
if (!empty($generalConfig['enable_segment_first_table_join_prefix']) && $generalConfig['enable_segment_first_table_join_prefix'] == "1") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified using GeneralConfig as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@@ -341,6 +341,71 @@ public function test_getSegmentTmpTableNameWithLongPrefix()
$this->assertEquals('logtmpsegmentcc2efa0acbd5f209', $this->logAggregator->getSegmentTmpTableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe be worth to add a test that uses the new join feature as well as max execution time hint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the query generation code from queryVisitsByDimension to a new sub-method getQueryByDimensionSql to make it easier to test the generated sql. I've then added a new test to check the expected visits by dimension query with an origin hint, join prefix hint and a max execution time hint.


; Add a query hint for the order of the first table for all log table queries in MySQL.
enable_first_table_join_prefix = 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering if this would make more sense to be part of the DB or DbReader config. As it's more related to database than general config. But don't have a strong opinion here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that probably makes the most sense 👍
I've moved the settings to the [database] section, as part of this I've made the GeneralConfig class a bit easier to extend and added a new descendant class DatabaseConfig so we can easily get settings from the [database] section in the same way as we already do for [General]

@michalkleiner
Copy link
Contributor

@bx80 I like the new DatabaseConfig class approach, but may I suggest that both the DatabaseConfig and GeneralConfig extended abstract class SectionConfig (or similar name)? They (db config and general config) should be at the same level (siblings) rather than one extending the other.

@bx80
Copy link
Contributor Author

bx80 commented Dec 6, 2023

@michalkleiner Yeah, I thought about doing that but was trying to limit the rework scope, in retrospect it should be quick to do and a better starting point for any other new descendants so I'll make the change now 👍

@bx80
Copy link
Contributor Author

bx80 commented Dec 6, 2023

@sgiehl @michalkleiner @caddoo Thanks for all the feedback, this should be ready for a final review now 😃

Copy link
Contributor

@michalkleiner michalkleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor formatting/text tweak that can be skipped for now.
Code changes look good, newly added tests are passing, although I haven't run the code locally to see the actual queries.

bx80 and others added 4 commits December 6, 2023 16:54
Co-authored-by: Michal Kleiner <michal@innocraft.com>
Co-authored-by: Michal Kleiner <michal@innocraft.com>
Co-authored-by: Michal Kleiner <michal@innocraft.com>
@bx80 bx80 merged commit 6d7be14 into 5.x-dev Dec 6, 2023
@bx80 bx80 deleted the m21635-segment-query-join-prefix branch December 6, 2023 04:21
@sgiehl sgiehl added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Dec 15, 2023
@tsteur
Copy link
Member

tsteur commented Dec 21, 2023

@matomo-org/core-reviewers just seeing some slow query and wonder if the join_prefix would be also expected to be set in the subqueries? I don't think it would have been useful in that particular case, and unsure how easy it is to add. Something to consider potentially as MySQL might still make suboptimal choices maybe in subqueries?( for independent subqueries that don't access the outer query)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Needs Review PRs that need a code review not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Development

Successfully merging this pull request may close these issues.

[Bug] Slow archiving query because of MySQL optimizer making wrong decision
5 participants