-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
join log_action in segment sql queries instead of using idaction IN (...) #20379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nts rather than many subqueries for performance improvement
…in to log_action directly now
…to customize exactly what SQL is generated to match a segment value and use by some action dimensions to include SQL that checks log_action.hash and log_action.type
…oin + add GoalName dimension/segment to allow querying by goal name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @diosmosis
I've tested this and it improved some queries 20 times when there is an account with many actions. From > 40 seconds to 2 seconds for a simple select. 🎉
Can I suggest we add some more tests to ensure it behaves as expected and will also do this in the future? For example tests/PHPUnit/Integration/SegmentTest.php
test cases for getSelectQuery
to ensure the SQL and bind is as expected for:
- Multiple action related segments. eg entryPageUrl, pageUrl, pageTitle, ... in one. Ideally sometimes even the same segment multiple times present
- Multiple action related segments containing And / or segments.
- Also ensuring that when we use a
==
that it speeds up the query correctly where it effectively does awhere idaction = $id
.
For example, I did create a test using 'entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fjobs%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fsupport%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fchangelog%25252F;entryPageUrl!@https%25253A%25252F%25252Fdeveloper.matomo.org%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Ftranslations%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fweb-analytics-training%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Ftag-manager-training%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Ffaq%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fhelp%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fcontact%25252F;entryPageUrl!@https%25253A%25252F%25252Fmatomo.org%25252Fdocs%25252F'
and one thing I oddly noticed is that while in 4.x-dev
it would bind matomo.orgtranslations
with this patch it would bind https://matomo.org/translations
as value
And ideally, we would maybe also add a system test for some complicated segment if it's not present yet to ensure we get the correct value.
FYI some tests are failing also |
These two already exist in: https://github.com/matomo-org/matomo/blob/4.x-dev/tests/PHPUnit/Integration/SegmentTest.php#L1427. But they only use pageUrl. I'll add one that does something similar w/ different action segments.
Curently not doing that, but instead, This should also be covered by existing SegmentTest tests.
Will add a system test. |
@tsteur updated
Reverted this.
I was thinking the same thing, but it looks like a significant amount of work to look into so I would consider it out of scope here. I could work on it as a follow up PR though, if desired. |
Agreed 👍 So far things are looking good. I've been doing more testing and more performance testing too and just seen one query that it improves by 80 times! That's because so far according to The only issue I'm still seeing is that it used to behave differently for the segment I'm testing this with below test public function test_getSelectQuery_whenUrls()
{
$select = 'sum(log_visit.idvisit)';
$from = 'log_visit';
$where = 'log_visit.idsite = ?';
$bind = array(1);
$segment = 'entryPageUrl=@https%253A%252F%252Fmatomo.org%252Ffaq%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fdocs%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fcontact%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fhelp%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Ftag-manager-training%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fweb-analytics-training%252F,entryPageUrl=@https%253A%252F%252Fdeveloper.matomo.org%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Ftranslations%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fchangelog%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fsupport%252F,entryPageUrl=@https%253A%252F%252Fmatomo.org%252Fjobs%252F';
$segment = new Segment($segment, $idSites = array());
$query = $segment->getSelectQuery($select, $from, $where, $bind);
$this->assertQueryDoesNotFail($query);
$expected = array(
"sql" => "....",
"bind" => array('matomo.org...'));
$this->assertEquals($this->removeExtraWhiteSpaces($expected), $this->removeExtraWhiteSpaces($query));
} And getting these binds. This would not give us any results since the URLs have the protocol removed. Can you reproduce this @diosmosis ? |
Yes, I can reproduce. Before this logic was handled by the TableLogAction method, but that isn't invoked when joining the segment. I'll look into a fix. |
…if a segment contains or does not contain data
@tsteur should be fixed |
Awesome! Thanks for this @diosmosis . I've just tested this and works nicely. Seeing partially some great performance gains! This looks all good from my perspective. Not sure if @matomo-org/core-reviewers had any extra thoughts? We'll likely want to cherry pick this also into 4.x-dev as it doesn't seem to be breaking API and would bring us some great performance improvement. I'd create separate issues for this once merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments. Besides that the branch currently has conflicts. Would be good to merge in the latest changes and resolve the conflicts
@sgiehl updated except for the one comment that still has a question |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of suggestions for cs improvements and stuff that can be changed/removed.
Otherwise this looks fine to me.
@tsteur you had done some performance checks on the new queries, right? If needed I can also do some testing. Otherwise guess this could be merged, once the suggestions were applied.
@sgiehl applied review feedback |
@sgiehl yes I did performance checks here and they looked really good 👍 We're looking forward to this improvement a lot. |
Once merged we can also run this on production for a while and keep a closer look on how it's impacting things overall in case certain things are causing issues. |
Description:
Fixes #19897
This changes the SQL generated from segments to use joins instead of
IN ( SELECT * FROM ... )
expressions. SQL that previously looked like:will now look like:
Changes:
Segment::getCleanedExpression()
handleDimension::getDbColumnJoin()
metadata if the dimension has it. This is what adds joins to tables like log_action or goal.Dimension::$sqlFilterMatch
property that allows segments to customize SQL used to match segments. Used by idaction segments to add a condition on thehash
column.Segment::getAvailableSegments()
to index the result by the segment name.Review