Skip to content

update resources faster with package_update #8360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 4, 2024

Conversation

wardi
Copy link
Contributor

@wardi wardi commented Jul 22, 2024

Fixes #5713

Proposed fixes:

  1. selectively skip validation and dictization on resources if both are true:
    • dataset metadata is unchanged
    • skipped resource metadata is unchanged
  2. skip updates completely if no metadata was changed
  3. compare new and existing resource model objects before commit to avoid commit processing time
  4. pass original_package in context from actions that call package_show with for_update=True to package_update to allow low-cost validated metadata comparisons
100-resource dataset action old new improvement
resource_create .721 .419 70%
resource_update 1.05 .445 130%
resource_delete .709 .478 48%
package_resource_reorder .697 .479 45%
package_patch .660 .621 6% *
resource_patch .746 .413 80%
package_revise (updating resource) .787 .267 190%

* validation and dictization applied for all resources when any dataset field changes, we only save commit time

Note

This change will affect custom validation rules that access resource metadata from dataset or other resource validators:

  • only the changed resources are passed to validation
  • flattened data the validators receive won't include skipped resources

This change does not affect validation of resource fields that depend on package fields because the package fields are always available in flattened data

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

@wardi wardi marked this pull request as draft July 22, 2024 15:13
@wardi wardi marked this pull request as ready for review August 12, 2024 16:03
Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understand this (mostly, see comment 🙂 )
The numbers are clear and these changes are worth merging but we should be mindful that we are adding more and more logic to package_update so need to make sure is maintainable going forward.

Co-authored-by: Adrià Mercader <amercadero@gmail.com>
@amercader amercader merged commit ce0450d into master Sep 4, 2024
7 of 9 checks passed
@amercader amercader deleted the 5713-package-update-faster branch September 4, 2024 15:29
wardi added a commit that referenced this pull request Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Very high render times on resource addition and updates
2 participants