Skip to content

Adds new hits metric to Actions.get report and All Websites Dashboard #22731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 11, 2024

Conversation

sgiehl
Copy link
Member

@sgiehl sgiehl commented Oct 31, 2024

Description:

This PR introduce a new record builder to archive the hits metric as part of the Actions plugin.
Furthermore this metric has been added to the Actions.get report and to all reports and the UI of the All Websites Dashboard.

To ensure this metric is correctly displayed without further interaction, the hits metric will be automatically invalidated through an update script for the current year.

fixes #18978

Note: This will require some updates to test in submodules. Will create PRs for those, once the review here was done.

Review

@sgiehl sgiehl added this to the 5.2.0 milestone Oct 31, 2024
@sgiehl sgiehl added the Needs Review PRs that need a code review label Oct 31, 2024
@sgiehl sgiehl requested a review from a team October 31, 2024 15:18
@sgiehl sgiehl requested a review from a team November 4, 2024 11:31
michalkleiner
michalkleiner previously approved these changes Nov 8, 2024
Copy link
Contributor

@michalkleiner michalkleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally comparing an instance without the feature and with the feature, looked at the screenshots and some of the xml reports, ran the query manually for a custom date range to confirm getting the same number as in the UI.

@michalkleiner
Copy link
Contributor

@sgiehl can you please resolve the merge conflicts that probably come from submodule updates done in the interim.

Copy link
Contributor

@michalkleiner michalkleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failing tests within the SystemTestsPlugins suites are from submodule plugins that need to be adjusted in separate repositories. The rest looks good, thanks @sgiehl!

@sgiehl sgiehl merged commit bc57393 into 5.x-dev Nov 11, 2024
23 of 26 checks passed
@sgiehl sgiehl deleted the dev-14058 branch November 11, 2024 21:17
@MichaelRoosz
Copy link
Contributor

MichaelRoosz commented Jan 22, 2025

@sgiehl this change generated over 8 million rows in the archive_invalidations table in my setup and basically broke archive generation. I removed the invalidations via
delete from piwik_archive_invalidations where report = 'Actions_hits';
and then had to clear the re-archive list via
delete from piwik.piwik_option where option_name = 'ReArchiveList';

@sgiehl
Copy link
Member Author

sgiehl commented Jan 22, 2025

We have already applied some improvements around that, but we are aware that too many invalidations can currently break archiving due to memory consumption issues.

@MichaelRoosz
Copy link
Contributor

@sgiehl thanks a lot for the quick feedback. for me memory consumption is fine,I have dedicated over 30 GB memory to the archive cron, but the SQL queries take too long and are too many. my database already has many cores and a lot of memory, so I guess this would need a software optimisation within Matomo

@sgiehl
Copy link
Member Author

sgiehl commented Jan 22, 2025

Archiving the hits metric only should actually not be causing too much trouble. Since this change this is also done during normal archiving anyway. The created invalidations during update were only added to make the metric also available for already created reports of the past

@MichaelRoosz
Copy link
Contributor

MichaelRoosz commented Jan 22, 2025

Yes, after removing the invalidations, along with the ReArchiveList, archiving seems to be running fine again.

I also think that in theory it should also be fine to run those invalidations as its just for the current month, but this seems to be a weak point of the invalidation code currently.

We had exactly the same issue (invalidations overloading the archiver) when we upgraded the MultiChannelConversionAttribution plugin a few weeks ago. Only fix was to remove its invalidations, since it had created over half a million.

@sgiehl
Copy link
Member Author

sgiehl commented Jan 22, 2025

Ok. Thanks for that valuable feedback. I guess we need to create such automated invalidations with more caution in the future, but also need to improve our code to handle that better.
It would anyway help to understand why it made your archiving hanging. Do you by any chance have more information around that. Like are you running multiple archives concurrently? Or do you still know the queries that were taking too long and maybe blocking other archivers?

@MichaelRoosz
Copy link
Contributor

MichaelRoosz commented Jan 22, 2025

I am running 6 archivers concurrently (log says 12, but it detects one process twice due to two lines for each archiver in the process list shell output).

Here is the log output of one archiver which ran about 12 hours before I stopped it:

INFO      [2025-01-22 01:10:01] 193  INIT
INFO      [2025-01-22 01:10:01] 193  Running Matomo 5.2.1 as Super User
INFO      [2025-01-22 01:10:01] 193  ---------------------------
INFO      [2025-01-22 01:10:01] 193  NOTES
INFO      [2025-01-22 01:10:01] 193  - Async process archiving supported, using CliMulti.
INFO      [2025-01-22 01:10:01] 193  - Reports for today will be processed at most every 3600 seconds. You can change this value in Matomo UI > Settings > General Settings.
INFO      [2025-01-22 01:10:01] 193  - Maximum 12 websites will be processed.
INFO      [2025-01-22 01:10:01] 193  - Archiving was last executed without error 2 hours 21 min ago.
INFO      [2025-01-22 01:10:01] 193  - Continuing ongoing archiving run by pulling from shared idSite queue.
INFO      [2025-01-22 01:10:01] 193  Will skip segments archiving for today unless they were created recently
INFO      [2025-01-22 01:10:01] 193  ---------------------------
INFO      [2025-01-22 01:10:01] 193  START
INFO      [2025-01-22 01:10:01] 193  Starting Matomo reports archiving...
INFO      [2025-01-22 01:10:01] 193  12 out of 12 archivers running currently

What I observed while it ran:

  • it created, together with the other concurrent archivers, about 8 million rows in the archive_invalidation table
  • for over 10 hours it ran SQL queries like the following. The queries were identical except for the name check, where the values had the format 'doneXXX.Actions%', "XXX" was different for each query. Each of these queries took between 30 and 50 seconds, and it ran one after the other, so I assume this is what caused it to be stuck.
SELECT idsite,
       date1,
       date2,
       period,
       name,
       report,
       Count(*) AS `count`
FROM   `piwik_archive_invalidations`
WHERE  idsite IN ( 1, 5, 7, 11,
                   15, 17, 21, 23,
                   25, 27, 31, 35,
                   37, 39, 41, 43,
                   45, 47, 49, 51,
                   53, 55, 57, 59,
                   61, 65, 67, 69,
                   71, 73, 79, 81,
                   83, 85, 87, 89,
                   91, 93, 95, 97,
                   99, 101, 103, 105,
                   107, 109, 111, 113,
                   115, 117, 119, 123,
                   125, 127, 129, 131,
                   133, 137, 139, 141,
                   143, 145, 147, 149,
                   151, 153, 159, 161,
                   163, 165, 167, 169,
                   171, 173, 175, 177,
                   179, 181, 183, 185,
                   189, 191, 193, 195,
                   197, 199, 201, 203,
                   205, 207, 209, 211,
                   213, 215, 217, 219,
                   221, 223, 225, 227,
                   229, 231, 233, 235,
                   237, 239, 241, 243,
                   245, 247, 253, 255,
                   257, 259, 261, 263,
                   265, 267, 269, 271,
                   273, 275, 277, 279,
                   281, 283, 285, 287,
                   289, 291, 293, 295,
                   297, 299, 303, 305,
                   307, 311, 313, 315,
                   319, 321, 323, 325,
                   327, 329, 331, 333,
                   335, 337, 339, 343,
                   347, 349, 351, 353,
                   355, 357, 359, 361,
                   363, 365, 367, 369,
                   371, 375, 377, 379,
                   381, 383, 385, 387,
                   389, 391, 393, 397,
                   399, 401, 403, 405,
                   407, 409, 411, 413,
                   415, 417, 419, 421,
                   425, 427, 429, 431,
                   433, 435, 437, 439,
                   441, 443, 445, 447,
                   451, 452, 453, 454,
                   455, 456, 457, 458,
                   459, 460, 461, 462,
                   463, 464, 465, 466,
                   467, 468, 469, 470,
                   471, 472, 473, 474, 475 )
       AND status = 0
       AND ( ( period = 1
               AND date1 = '2025-01-01 00:00:00'
               AND date2 = '2025-01-01 00:00:00' )
              OR ( period = 3
                   AND date1 = '2025-01-01 00:00:00'
                   AND date2 = '2025-01-31 00:00:00' )
              OR ( period = 4
                   AND date1 = '2025-01-01 00:00:00'
                   AND date2 = '2025-12-31 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-02 00:00:00'
                   AND date2 = '2025-01-02 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-03 00:00:00'
                   AND date2 = '2025-01-03 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-04 00:00:00'
                   AND date2 = '2025-01-04 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-05 00:00:00'
                   AND date2 = '2025-01-05 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-06 00:00:00'
                   AND date2 = '2025-01-06 00:00:00' )
              OR ( period = 2
                   AND date1 = '2025-01-06 00:00:00'
                   AND date2 = '2025-01-12 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-07 00:00:00'
                   AND date2 = '2025-01-07 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-08 00:00:00'
                   AND date2 = '2025-01-08 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-09 00:00:00'
                   AND date2 = '2025-01-09 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-10 00:00:00'
                   AND date2 = '2025-01-10 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-11 00:00:00'
                   AND date2 = '2025-01-11 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-12 00:00:00'
                   AND date2 = '2025-01-12 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-13 00:00:00'
                   AND date2 = '2025-01-13 00:00:00' )
              OR ( period = 2
                   AND date1 = '2025-01-13 00:00:00'
                   AND date2 = '2025-01-19 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-14 00:00:00'
                   AND date2 = '2025-01-14 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-15 00:00:00'
                   AND date2 = '2025-01-15 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-16 00:00:00'
                   AND date2 = '2025-01-16 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-17 00:00:00'
                   AND date2 = '2025-01-17 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-18 00:00:00'
                   AND date2 = '2025-01-18 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-19 00:00:00'
                   AND date2 = '2025-01-19 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-20 00:00:00'
                   AND date2 = '2025-01-20 00:00:00' )
              OR ( period = 2
                   AND date1 = '2025-01-20 00:00:00'
                   AND date2 = '2025-01-26 00:00:00' )
              OR ( period = 1
                   AND date1 = '2025-01-21 00:00:00'
                   AND date2 = '2025-01-21 00:00:00' ) )
       AND name LIKE 'done4c50e8f391b7ce2dfcf68bd13e15acc4.Actions%'
GROUP  BY idsite,
          date1,
          date2,
          period,
          name
ORDER  BY NULL; 

@sgiehl
Copy link
Member Author

sgiehl commented Jan 22, 2025

Ok. Thanks that helps a lot. So it's caused by the query to look for the next archive to process rather then the queries to build the archives itself.

@sgiehl
Copy link
Member Author

sgiehl commented Jan 22, 2025

@MichaelRoosz Just one more question: Are you using the roll up plugin and maybe have a rollup defined that includes all the sites that were included in the idsite IN () ?

@MichaelRoosz
Copy link
Contributor

@sgiehl The Rollup plugin is activated, but so far we did not create any rollup site.

What I think is causing the "idsite IN ()" is:

The Migration from this PR calls:

$invalidator->scheduleReArchiving('all', 'Actions', 'Actions_hits', Date::factory($startOfCurrentMonth));

so $idSites equals 'all', and then when the archiver runs, code execution is like this:

ArchiveInvalidator::applyScheduledReArchiving()
-> ArchiveInvalidator::reArchiveReport() 
   ...  

   if ($idSites === 'all') {
       $idSites = $this->getAllSitesId();
   }
   ...
-> DataAccess::Model::updateArchiveAsInvalidated(... $idSites, ...)
-> DataAccess::Model::getExistingInvalidations($idSites, ...)

The SQL Query from my logs like the one getExistingInvalidations() generates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review
Development

Successfully merging this pull request may close these issues.

Add hits to all websites dashboard metrics
3 participants