-
Notifications
You must be signed in to change notification settings - Fork 26
Description
This issue was originally called:
Update Ensemble-Stat to better handle python-embedding failures and entire fields of missing data.
However, I updated it to more clearly state the actual fix.
Describe the Enhancement
This issue arose when John O was setting up a METplus use case for NRL that uses python-embedding to call Ensemble-Stat. Data for all 7 ensemble members live within the same variable in the same NetCDF file. The python-embedding script is called 7 times to pull values for each of the 7 members.
While the python script runs without error, 3 of the 7 members contain a full field of missing data values. In Ensemble-Stat, all 7 members appear to be "valid" so the ens.ens_thresh threshold is satisfied. However, no grid point contains 7 valid ensemble values, meaning that no ensemble statistics are computed.
The underlying rule here is that, after accounting for missing input files, at each grid point there can be 0 bad data values. That means, you can't have 7 valid ensemble values for the first grid point, and then only 4 for the next... because then we can't group the ensemble ranks together into a RANK histogram. It's OK for entire files to be missing... but not OK to have missing data values within the fields.
In John O's case, all 7 calls to the python embedding script run without error. It's just that 3 of the 7 calls produce fields of entire missing data values. So there are no grid points which contain 7 valid ensemble member values... and so they are all discarded due to missing ensemble values.
This task to it update the logic of Ensemble-Stat in 2 ways:
(1) Currently, if the python-embedding script returns bad status, the entire Ensemble-Stat run exits. Update the python-embedding logic to allow for runtime failures without the tool exiting. Then treat a python-embedding failure as if it were a missing input file... and count that against ens.ens_thresh.
(2) When reading ensemble data, check to see if all data is bad data. If so, also treat that as if it were a missing input file. Question... should this check only be applied for python-embedding, or for all input file types?
Some issues to consider:
- We would like to NOT have to execute the python embedding scripts multiple times because that's slow!
- For python-embedding, we can actually have multiple fields requested. How do we handle the case when the python-embedding script runs fine for the first field of the first file, but then fails for the second field of the first file?
Find data for this in eyewall:/d1/projects/nrl_aerosol/mp_work
Time Estimate
2 days.
Sub-Issues
Consider breaking the enhancement down into sub-issues.
- Add a checkbox for each sub-issue here.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
Define the source of funding and account keys here or state NONE.
Define the Metadata
Assignee
- Select engineer(s) or no engineer required
- Select scientist(s) or no scientist required
Labels
- Select component(s)
- Select priority
- Select requestor(s)
Projects and Milestone
- Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
- Select milestone to next major version milestone or "Future Versions"
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
- Complete the issue definition above, including the Time Estimate and Funding Source.
- Fork this repository or create a branch of develop.
Branch name:feature_<Issue Number>_<Description>
- Complete the development and test your changes.
- Add/update log messages for easier debugging.
- Add/update unit tests.
- Add/update documentation.
- Push local changes to GitHub.
- Submit a pull request to merge into develop.
Pull request:feature <Issue Number> <Description>
- Define the pull request metadata, as permissions allow.
Select: Reviewer(s), Project(s), Milestone, and Linked issues - Iterate until the reviewer(s) accept and merge your changes.
- Delete your fork or branch.
- Close this issue.