Skip to content

Faster archiving for non-days periods by only storing one datatable and blob row at a time in memory #20512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2023

Conversation

tsteur
Copy link
Member

@tsteur tsteur commented Mar 23, 2023

refs #20332. This was merged in 5.x-dev. It would be a huge help for us on the Cloud to have this merged in 4.x-dev as well until 5.0 is released. From what I can see there are no breaking changes? In that case it be great to also apply this to 4.x.

  • proof of concept one blob table row at a time method of aggregating week, month, year, range data
  • sort blobs by subtable ID when chunk is being read
  • simplify code w/ generators
  • make sure single blob query is ordered by name correctly
  • REGEXP_SUBSTR() is only available in mysql 8 :/
  • fix a couple test failures
  • by default when aggregating tables in ArchiveProcessor sort by visits if the table contains visits
  • try fixing random test failures
  • debug ci only error
  • undo debugging change
  • fixing some system tests
  • refactor ArchiveSelector code for more code reuse
  • add some code documentation
  • remove DataCollection::forEachBlobExpanded() since it is no longer used + a couple other small changes
  • try debugging ci only random failure
  • remove previous debugging code
  • more debugging
  • more ci debugging
  • trigger build again and try to get more information for random failure
  • fix convoluted sql replacement for REGEXP_SUBSTRING
  • fix idsubtable extraction, need to check if extracted value is an empty string and order it before everything else if so
  • add log in case blob table order is incorrect
  • add tests for subtable extraction sql
  • remove unused import

Review

@tsteur tsteur added the Needs Review PRs that need a code review label Mar 23, 2023
@tsteur tsteur added this to the 4.14.x milestone Mar 23, 2023
@sgiehl sgiehl changed the base branch from 5.x-dev to 4.x-dev March 23, 2023 08:30
@github-actions
Copy link
Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Mar 31, 2023
…ow in memory at a time (#20332)

* proof of concept one blob table row at a time method of aggregating week, month, year, range data

* sort blobs by subtable ID when chunk is being read

* simplify code w/ generators

* make sure single blob query is ordered by name correctly

* REGEXP_SUBSTR() is only available in mysql 8 :/

* fix a couple test failures

* by default when aggregating tables in ArchiveProcessor sort by visits if the table contains visits

* try fixing random test failures

* debug ci only error

* undo debugging change

* fixing some system tests

* refactor ArchiveSelector code for more code reuse

* add some code documentation

* remove DataCollection::forEachBlobExpanded() since it is no longer used + a couple other small changes

* try debugging ci only random failure

* remove previous debugging code

* more debugging

* more ci debugging

* trigger build again and try to get more information for random failure

* fix convoluted sql replacement for REGEXP_SUBSTRING

* fix idsubtable extraction, need to check if extracted value is an empty string and order it before everything else if so

* add log in case blob table order is incorrect

* add tests for subtable extraction sql

* remove unused import

---------

Co-authored-by: Stefan Giehl <stefan@matomo.org>
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to apply this to 4.x as well. Will wait for the tests to pass before merging.

@sgiehl sgiehl merged commit 7578c0f into 4.x-dev Apr 13, 2023
@sgiehl sgiehl deleted the mm20332 branch April 13, 2023 07:20
@bx80 bx80 changed the title When aggregating non-day periods, only store one datatable and blob row in memory at a time Faster archiving for non-days periods by only storing one datatable and blob row at a time in memory Apr 18, 2023
diosmosis added a commit to matomo-org/plugin-GoogleAnalyticsImporter that referenced this pull request Apr 18, 2023
@sgiehl sgiehl modified the milestones: 4.14.x, 4.14.2 Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review Stale The label used by the Close Stale Issues action
Development

Successfully merging this pull request may close these issues.

3 participants