-
Notifications
You must be signed in to change notification settings - Fork 296
Cache Dask arrays created from NetCDFDataProxy
s to speed up loading files with multiple variables
#6252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
916a1df
to
c61b12f
Compare
c61b12f
to
1249c6b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6252 +/- ##
==========================================
+ Coverage 89.85% 89.88% +0.02%
==========================================
Files 88 88
Lines 23401 23430 +29
Branches 4357 4361 +4
==========================================
+ Hits 21028 21059 +31
+ Misses 1646 1644 -2
Partials 727 727 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
⏱️ Performance Benchmark Report: 953e8f9Performance shifts
Full benchmark results
Generated by GHA run |
The benchmarks showing changes aren't really the ones I'd expect. |
⏱️ Performance Benchmark Report: 953e8f9Performance shifts
Full benchmark results
Generated by GHA run |
I added a benchmark in bfbd625 that should show the improvement. |
💯 it will be great to see this come together ! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
⏱️ Performance Benchmark Report: ad1e4f1Performance shifts
Full benchmark results
Generated by GHA run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bouweandela, looks really good to me for the most part!
Only one suggestion, and I'm happy for you to oppose that. Other than that happy for this to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this, thank you!
* upstream/main: (98 commits) [pre-commit.ci] pre-commit autoupdate (SciTools#6335) SPEC 0: drop py310 and support py313 (SciTools#6195) Better benchmarking Python version handling (SciTools#6329) Move loading and combine code into their own submodules. (SciTools#6321) Bump scitools/workflows from 2025.02.1 to 2025.02.2 (SciTools#6327) replaced reference from build to python build (SciTools#6324) [pre-commit.ci] pre-commit autoupdate (SciTools#6315) Cache Dask arrays created from `NetCDFDataProxy`s to speed up loading files with multiple variables (SciTools#6252) Bump scitools/workflows from 2025.02.0 to 2025.02.1 (SciTools#6313) [pre-commit.ci] pre-commit autoupdate (SciTools#6310) Bump scitools/workflows from 2025.01.5 to 2025.02.0 (SciTools#6306) Updated environment lockfiles (SciTools#6301) Improve speed of loading small NetCDF files (SciTools#6229) [pre-commit.ci] pre-commit autoupdate (SciTools#6298) Use cube chunks for weights in aggregations with smart weights (SciTools#6288) Updated environment lockfiles (SciTools#6296) Bump scitools/workflows from 2025.01.4 to 2025.01.5 (SciTools#6300) Bump scitools/workflows from 2025.01.3 to 2025.01.4 (SciTools#6295) Lazy rectilinear interpolator (SciTools#6084) Revert "Fix broken link. (SciTools#6246)" (SciTools#6297) ...
… files with multiple variables (SciTools#6252) * Cache Dask arrays to speed up loading files with multiple variables * Add benchmark for files with many cubes * Add whatsnew * Add test * Add license header * Use a global to set the cache size * Update whatsnew
🚀 Pull Request
Description
Another idea to speed up loading NetCDF files with many variables. This caches the last 100 Dask arrays created from
NetCDFDataProxy
s so shared coordinates can be re-used. Since copying a Dask array is much faster than creating a new one, this gives a speedup.Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: