Skip to content

Treat gridded fields of entirely missing data as missing files and fix python embedding to call common data processing code. #1494

@JohnHalleyGotway

Description

@JohnHalleyGotway

This issue was originally called:
Update Ensemble-Stat to better handle python-embedding failures and entire fields of missing data.
However, I updated it to more clearly state the actual fix.

Describe the Enhancement

This issue arose when John O was setting up a METplus use case for NRL that uses python-embedding to call Ensemble-Stat. Data for all 7 ensemble members live within the same variable in the same NetCDF file. The python-embedding script is called 7 times to pull values for each of the 7 members.

While the python script runs without error, 3 of the 7 members contain a full field of missing data values. In Ensemble-Stat, all 7 members appear to be "valid" so the ens.ens_thresh threshold is satisfied. However, no grid point contains 7 valid ensemble values, meaning that no ensemble statistics are computed.

The underlying rule here is that, after accounting for missing input files, at each grid point there can be 0 bad data values. That means, you can't have 7 valid ensemble values for the first grid point, and then only 4 for the next... because then we can't group the ensemble ranks together into a RANK histogram. It's OK for entire files to be missing... but not OK to have missing data values within the fields.

In John O's case, all 7 calls to the python embedding script run without error. It's just that 3 of the 7 calls produce fields of entire missing data values. So there are no grid points which contain 7 valid ensemble member values... and so they are all discarded due to missing ensemble values.

This task to it update the logic of Ensemble-Stat in 2 ways:

(1) Currently, if the python-embedding script returns bad status, the entire Ensemble-Stat run exits. Update the python-embedding logic to allow for runtime failures without the tool exiting. Then treat a python-embedding failure as if it were a missing input file... and count that against ens.ens_thresh.

(2) When reading ensemble data, check to see if all data is bad data. If so, also treat that as if it were a missing input file. Question... should this check only be applied for python-embedding, or for all input file types?

Some issues to consider:

  • We would like to NOT have to execute the python embedding scripts multiple times because that's slow!
  • For python-embedding, we can actually have multiple fields requested. How do we handle the case when the python-embedding script runs fine for the first field of the first file, but then fails for the second field of the first file?

Find data for this in eyewall:/d1/projects/nrl_aerosol/mp_work

Time Estimate

2 days.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
  • Select milestone to next major version milestone or "Future Versions"

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s), Project(s), Milestone, and Linked issues
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions