-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Describe the Enhancement
This is a general issue with no due date to document suggested changes and/or improvements to how Python Embedding works within MET, with the primary goal of improving the user experience and ability to communicate and teach Python Embedding to users.
Time Estimate
TBD
Sub-Issues
Consider breaking the enhancement down into sub-issues.
- Revisit the Python embedding strings
PYTHON_NUMPY
andPYTHON_XARRAY
. I think we can generalize these more to be something likePYTHON_DATAPLANE
orPYTHON_GRID
andPYTHON_POINT
. Then, forPYTHON_DATAPLANE
(or_GRID
), have MET deal with deciphering whether it is an Xarray object or a NumPy N-D array. The point data is easier, and really only comes in one way. The largest confusion in my opinion is thatPYTHON_NUMPY
is used for point data currently, which has nothing to do with NUMPY whatsoever. Alternatively, we could addPYTHON_POINT
and maintain the two dataplane instances the way they are. - If --enable_python is used when building, have make_test check if the version of python specified can successfully load all the required packages from a list of required packages that is maintained somewhere. (See Enhance make_test to test Python embedding functionality Enhance make_test to test Python embedding functionality #2788)
- Add to Appendix F or the contributors guide much more detailed information about running with and without MET_PYTHON_EXE set. Include wire diagrams to illustrate. Also document the data structures that are populated in the 3 different types of python embedding (grid, point, pairs).
- This section in the METplus Wrappers docs: https://metplus.readthedocs.io/en/latest/Users_Guide/overview.html#metplus-components-python-requirements, is confusing because it seems that "cfgrib" is a required Python package for METplus Wrappers, but then later in chapter 4 it does not list "cfgrib", and says "Minimum Requirements To run most of the METplus wrappers, the following packages are required: dateutil (2.8)". I think there could be some documentation work in the METplus Wrappers also.
- When attempting to generalize where Python Embedding should be put for METplus Wrappers configuration, I was able to determine the following. For gridded data, the *_INPUT_TEMPLATE conf item is set to just
PYTHON_NUMPY
orPYTHON_XARRAY
, and then the Python script and any arguments is included in the *_VAR<n>_NAME conf item. However, for point data, the *_INPUT_TEMPLATE conf item contains both thePYTHON_NUMPY
string, and another "=" followed by the Python script and any script arguments. It would be nice if these worked similarly, where for point data we could set the *_INPUT_TEMPLATE to justPYTHON_NUMPY
, and then use the *_VAR<n>_NAME conf items for the Python script and any arguments to it, which would make configuring METplus wrappers for point and gridded data similar with respect to where elements of Python Embedding should be put in the configuration file. After talking to John HG, it sounds like it might not be feasible to change the way MET works -or- to change the wrappers to include both items in a single conf for gridded data like obs data due to how the MET tools can be called for gridded data. Some of our discussion:
Q: Hi John. Is there a good reason why point data works a little differently than gridded data for Python Embedding for METplus Wrappers? It looks like for point data, the *_INPUT_TEMPLATE conf item is used to specify the entire string "PYTHON_NUMPY=/my/script.py arg1 arg2", whereas for gridded data, the *_INPUT_TEMPLATE conf item simply only requires "PYTHON_NUMPY" (or _XARRAY), and then the _VAR_NAME conf entry requires the "/my/script.py arg1 arg2" part of the Python Embedding elements.
A: For gridded data, you typically define the script to be run along with arguments in the "field.name" config setting. There is no "field.name" setting for obs data really... since we supply all the point obs (consisting of all the types) in a single spot. So we replace the obs file name with the full python command to retrieve those obs from python.
- How can we expose the value of
MET_PYTHON_BIN_EXE
to the user? I was able to run this command:
seneca:~$ /d1/projects/MET/MET_regression/develop/NB20230123/MET-develop/bin/plot_data_plane PYTHON_NUMPY fcst.ps 'name="test.py";' -v 12
which showed the value. However, it might be nice to make a symbolic link in /usr/local/met-11.0.0/bin to the python3.8 exe that was used, OR have a utility script in /usr/local/met-11.0.0/bin the user can run to echo the value of the Python install for them. This way the user can directly instantiate the same Python MET will use, and also verify whether the Python packages they need are available there or whether they will need to handle installing their ownMET_PYTHON_EXE
to use. - Add more documentation or examples of how to use
point_stat
with Python Embedding. David Fillmore was using Python Embedding for both fcst and obs data, with the fcst being gridded data and the obs being point data. This led to constructing a command for MET and configuring METplus for both "point" data and "gridded" data, which work differently. Maybe more documentation, but maybe this is another data point for changing how it works for gridded/point to be more similar? - There are some Python scripts located on this DTC webpage: https://dtcenter.org/community-code/model-evaluation-tools-met/sample-analysis-scripts. Is this the best place for them? Or should this be included in the Appendix F, and linked to from there? From John HG:
- MET repo/RTD (https://met.readthedocs.io/en/latest/)
- METplus repo/RTD (https://metplus.readthedocs.io/en/latest/)
- METplus-Training repo/RTD (https://metplus-training.readthedocs.io/en/latest/)
I've been looking for more opportunities to expand METplus-Training... so perhaps that's a good landing spot.
- Should we document what other elements are allowed in the
plot_data_plane
field_string argument? name = "TMP"; level = "P500"; convert(x) = K_to_C(x); censor_thresh = lt0; censor_val = -9999; set_attr_name = "Temperature"; set_attr_level = "500mb"; set_attr_valid = 20210708_12;. See here for the best resource: https://met.readthedocs.io/en/latest/Users_Guide/config_options.html#fcst. - Added 2/10/23: On this line:
status = system(command.text()); - Added while testing Bugfix: Fix the MET vx_pointdata_python library to handle MET_PYTHON_EXE for python embedding of point observations #2428 on 2/17/23: We should consider making the Python requirements for METplus the same as the Python requirements for building MET with support for Python embedding. For using MET with Python embedding, it appears based on Appendix F the modules
sys, os, argparse, importlib, numpy, and netCDF4
are required. Some of those are stdlib, but numpy (and maybe netCDF4) are not. (see Clarify MET Compile Time Python requirements #2490) - Other open issues: Create interface for translating Xarray object attributes to MET attributes #1470, Enhance MPR Python embedding to support all MET STAT line types #2539
- Can we re-order the debugging of the Initializing/using/running for the Python Embedding messages?
- It seems like PYTHON_XARRAY is not needed? When I have an Xarray DataArray object in memory named "met_data", and use PYTHON_NUMPY, it works just fine. Why is this?
- Data ordered intuitively inside a hypercube read in by Python need to be flipped before sending to MET because MET populates the 2D dataplane from upper left to lower left rather than lower left first. This seems dangerous since it is not intuitive.
- Should we even have many examples of using MET with Python Embedding directly? Like on the DTC webpage with sample scripts and data, those are using MET directly but don't we want folks to use METplus wrappers?
- Enhance MET Python embedding to report an error if
MET_PYTHON_EXE
is set but does not exist, and warn that it is using the compile time instance. Related, ifMET_PYTHON_EXE
is set to an empty value (nul), it should just proceed as if it is not set and attempt to use the compile time Python as usual. - Decide how to support missing data values in MET for cases of compile time Python and MET_PYTHON_EXE. Met tools expect a missing data value/FillValue of -9999.0. Any other value will be treated as valid data. See here: Bugfix: Fix the fill value setting used in the write_tmp_dataplane internal Python embedding script #2525 for more info.
- The MPR Python Embedding example assumes users are reading an MPR file generated by another MET tool, and uses Pandas
read_csv()
to read it. It specifies the type of data asdtype='str'
, thus all columns of the MPR file are cast as strings prior to sending to MET. If a user curates their own MPR data in Python (for example, massaging JSON data into the MPR line type format), they may have columns that are not string but rightfully numeric in their type (e.g. QC, FCST, OBS). Should we allow mixed data types in this case? At a minimum, perhaps we return a more informative error message thanbad object type
. - Added 5/8/2024 during Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877 review: add documentation of the MET python module as needed. This is mostly intended for internal use though, so it is debatable how useful documentation would be for users. It might be more for developers.
- Added 9/26/2024: In the dictionary of grid attributes used for Python embedding for dataplanes, the
type
attribute appears to use CamelCase (e.g.LatLon
) as opposed to all lower case (e.g.latlon
). We use all lower case for grid specification strings, so why is it different for Python embedding? We should make it the same for both, or allowing case-insensitive options for Python embedding to preserve backwards compatibility. See Python embedding to read geotiff data (sentinel 3 products) METplus#2702.
Completed Items
- Is there a way to know where the Python installation that MET was compiled with? This would help in situations where a user wanted to know if the Python installation had any particular packages installed in it to know what might be available (or not) for Python Embedding. I think this will be handled via the
MET_PYTHON_BIN_EXE
work. @georgemccabe is there an issue for that somewhere? This can be found in the list of environment variables set at MET compile time, and is also printed to the user as a debug statement when using Python embedding. The issue where this was added was Refine Python runtime environment #2388, and the debug statements were added later. - Fix "dash-dash" formatting in appendix F (RST problem) Changed in Feature #2285 read_point_data #2509
- Differentiate Python stdlib packages from optional packages and remove stdlib packages from requirements in Appendix F (see Clarify MET Compile Time Python requirements #2490). Changed in Feature #2285 read_point_data #2509
- From @DanielAdriaansen on 2/3/23, enhance MET to automatically call
met_point_obs.py
at runtime when needed rather than expecting the user to figure out if/when it actually needs to be called. Changed in Feature #2285 read_point_data #2509 - Added 2/10/23: From Discussion python embedding; conda env METplus#2031, revise documentation to strongly suggest that users use the absolute path to their Python embedding scripts. Using relative paths or just simply the script name will produce unexpected results. Changed in Feature #2285 read_point_data #2509. Appendix F now has bolded "full path" references where appropriate.
- 2/17/2023: We need to clarify how to use
convert_point_obs
insidemet_point_obs
. It seems that if a user is NOT settingMET_PYTHON_EXE
, then MET will only find this after 5ad920b, but if before that commit then they need to setPYTHONPATH
. Similarly, if they are using Python embedding, they will have to setPYTHONPATH
prior to Bugfix: Fix the MET vx_pointdata_python library to handle MET_PYTHON_EXE for python embedding of point observations #2428 being merged. Is this right? Changed in Feature #2285 read_point_data #2509. The solution here is just to encourage users to update to a version after this PR, which removes the need forconvert_point_obs
. - From @georgemccabe on 2/3/23, in https://github.com/dtcenter/MET/tree/main_v11.0/scripts/python consider moving
met_point_obs.py
into some other directory to automatically be included in the user's python path at runtime. But we don't necessarily want to include EVERYTHING that lives in the scripts/python directory because that MIGHT cause unexpected behavior depending on the names of scripts the user chooses. Inventory the existing location of python scripts and data files in the MET repo and revise, as needed. Changed in Feature #2285 read_point_data #2509. Since a user typically won't need this anymore (i.e. to callconvert_point_data
), then this probably won't be an issue but I also think the Python reorg in Feature #2285 read_point_data #2509 also makes it such that all the Python code a user would need/want to use in their Python embedding script will be found by MET at runtime. If running standalone the user may need to add some of these scripts/python/met directories to their PYTHONPATH. - Appendix F does not have a list of required PyEmbed packages but the METplus docs does (see here: https://metplus.readthedocs.io/en/latest/Users_Guide/installation.html#python-package-requirements) (see Clarify MET Compile Time Python requirements #2490). Changed in Feature #2285 read_point_data #2509. There is still more work to do on the METplus side, see Documentation: Review and update Requirements METplus#1978, and also the PR for Enhancement: Update use cases to use new Python directory structure in MET METplus#2115.
- Update Appendix F to include what is reported if MET was compiled without Python? Changed in Feature #2285 read_point_data #2509.
Relevant Deadlines
NONE
Funding Source
NONE
Assignee
- Select engineer(s) or no engineer required
- Select scientist(s) or no scientist required
Labels
- Select component(s)
- Select priority
- Select requestor(s)
Projects and Milestone
- Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
- Select Milestone as the next official version or Future Versions
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
- Complete the issue definition above, including the Time Estimate and Funding Source.
- Fork this repository or create a branch of develop.
Branch name:feature_<Issue Number>_<Description>
- Complete the development and test your changes.
- Add/update log messages for easier debugging.
- Add/update unit tests.
- Add/update documentation.
- Push local changes to GitHub.
- Submit a pull request to merge into develop.
Pull request:feature <Issue Number> <Description>
- Define the pull request metadata, as permissions allow.
Select: Reviewer(s) and Development issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version - Iterate until the reviewer(s) accept and merge your changes.
- Delete your fork or branch.
- Close this issue.