Skip to content

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Mar 27, 2023

Rebased version of #5024

To echo comment on #5024 :

some such feature will be important to the proposed solutions for #4994
But we should remain open to changing how this is implemented.

NOTE: now targets main branch, instead of previous feature-branch FEATURE_xarray_readwrite.
This is because we are no longer planning to host the majority of the 'bridge' code in the Iris repo, but in a separate 'ncdata' repo : see #4994

Todo:

@pp-mo pp-mo force-pushed the loadsave_ncfiles_2 branch from f503e9d to 6459591 Compare March 27, 2023 09:41
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Patch coverage: 81.81% and project coverage change: +0.02 🎉

Comparison is base (b035094) 89.32% compared to head (ce94993) 89.34%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5214      +/-   ##
==========================================
+ Coverage   89.32%   89.34%   +0.02%     
==========================================
  Files          89       89              
  Lines       22392    22413      +21     
  Branches     5375     5380       +5     
==========================================
+ Hits        20002    20026      +24     
+ Misses       1640     1639       -1     
+ Partials      750      748       -2     
Impacted Files Coverage Δ
lib/iris/fileformats/netcdf/saver.py 89.02% <67.85%> (+0.02%) ⬆️
lib/iris/fileformats/netcdf/_thread_safe_nc.py 83.55% <70.00%> (+1.03%) ⬆️
lib/iris/__init__.py 88.88% <83.33%> (+0.36%) ⬆️
lib/iris/fileformats/__init__.py 84.37% <100.00%> (-0.48%) ⬇️
lib/iris/fileformats/cf.py 85.65% <100.00%> (+0.11%) ⬆️
lib/iris/fileformats/netcdf/loader.py 80.44% <100.00%> (+0.07%) ⬆️
lib/iris/io/__init__.py 83.21% <100.00%> (+2.05%) ⬆️
lib/iris/io/format_picker.py 83.72% <100.00%> (+2.26%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pp-mo
Copy link
Member Author

pp-mo commented May 10, 2023

Update

Rebased onto latest main
still WIP

@pp-mo pp-mo force-pushed the loadsave_ncfiles_2 branch from b7ac505 to 81dfd5b Compare May 10, 2023 16:24
@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Update

I discovered that this feature now has a problem with output to a already-open dataset (one owned by the caller) whencompute=True (which is the default!) :

  • since we added support for distributed schedulers in Lazy netcdf saves #5191, lazy streaming requires the original file to be closed -- because the dask workers doing the streaming may now be separate processes, not always threads, which means they must each (re-)open the dataset for writing -- which they can't if the original caller still has it open
  • this simply isn't compatible with saving to an open dataset, because iris.save doesn't have the right to close the dataset (since I don't think netCDF4 supports "re-opening" a dataset, and closing will invalidate any netCDF4 objects held by user).

So that means that, in most cases, you can't just iris.save(cubes, my_dataset).
You must instead saves = iris.save(cubes, my_dataset, compute=False); my_dataset.close(); saves.compute()

Following offline discussion with @bjlittle @trexfeathers

  • We can either

    • forbid this usage, or
    • error out except in those specific cases which happen to be OK -- e.g. when dataset is not a real file-based dataset (as intended for 4994 ), or if all data is real.
  • we decided that the usecase is so specialist that it will probably only be used for Xarray bridge #4994 support. Hence we should ban compute=True operation saving to a provided dataset, since the feature is still useable (just less convenient in a rare usecase) and it is the simplest solution, both to implement and document.

@pp-mo pp-mo marked this pull request as ready for review May 15, 2023 15:12
@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Ping @ESadek-MO
Where you going to review this (:crossed_fingers: :pray: ) ?
I think it is at last ready to go !

@pp-mo
Copy link
Member Author

pp-mo commented May 15, 2023

Ping @ESadek-MO Where you going to review this (crossed_fingers pray ) ? I think it is at last ready to go !

P.S. #5212 will be a follow-up, which should complete essential support for #4994.
But that should be a lot simpler than this.

Copy link
Contributor

@ESadek-MO ESadek-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pp-mo, looks good, just a couple soft suggestions!

@pp-mo
Copy link
Member Author

pp-mo commented May 17, 2023

Thanks @ESadek-MO
I think I addressed all those points now -- please re-review + see what you think.

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on the changes to _thread_safe_nc.py:

@pp-mo
Copy link
Member Author

pp-mo commented May 19, 2023

Thanks @trexfeathers
I think I've responded to all of those, now.
Please re-review !

Copy link
Contributor

@ESadek-MO ESadek-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pp-mo, all looks good now!
Thanks for the review assist @trexfeathers!

@ESadek-MO ESadek-MO merged commit 04dadb2 into SciTools:main May 19, 2023
@pp-mo
Copy link
Member Author

pp-mo commented May 19, 2023

Thanks so much, @ESadek-MO @trexfeathers !

tkknight added a commit to tkknight/iris that referenced this pull request May 21, 2023
* upstream/main:
  Updated environment lockfiles (SciTools#5332)
  Support netcdf variable emulation (SciTools#5212)
  Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214)
  minor refinement to release do-nothing script (SciTools#5326)
  update bibtex citation for v3.6.0 release (SciTools#5324)
  Whats new updates for v3.6.0 (SciTools#5323)
  Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322)
  Updated environment lockfiles (SciTools#5320)
  Bugfix 4566 (SciTools#4569)
  Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
tkknight added a commit to tkknight/iris that referenced this pull request Nov 9, 2023
* upstream/main:
  Updated environment lockfiles (SciTools#5334)
  Iris Docs: Dark theme ready (SciTools#5299)
  Updated environment lockfiles (SciTools#5332)
  Support netcdf variable emulation (SciTools#5212)
  Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214)
  minor refinement to release do-nothing script (SciTools#5326)
  update bibtex citation for v3.6.0 release (SciTools#5324)
  Whats new updates for v3.6.0 (SciTools#5323)
  Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322)
  Updated environment lockfiles (SciTools#5320)
  Bugfix 4566 (SciTools#4569)
  Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
  update release_do_nothing script (SciTools#5311)
  Restore latest Whats New files.
  Updated environment lockfiles (SciTools#5308)
  release_do_nothing script updates from v3.6.0rc0 (SciTools#5306)
  Whats new updates for v3.6.0rc0 (SciTools#5300)
  Bump scitools/workflows from 2023.04.3 to 2023.05.0 (SciTools#5303)
@pp-mo pp-mo deleted the loadsave_ncfiles_2 branch October 23, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

3 participants