Skip to content

Refine configuration options for defining bins in the verification of probabilistic forecasts #2280

@JohnHalleyGotway

Description

@JohnHalleyGotway

Describe the Enhancement

This issue arose via GitHub discussion dtcenter/METplus#1742. While a probability forecast can have any value between 0 and 1, it is very often the case that probability forecasts are not actually continuous. Instead, the values are often clustered around 1/n where n is an integer number of ensemble members. This issue is to refine the configuration options in support of verifying these non-continuous, ensemble-derived probability forecasts.

Verification of probabilistic forecasts is performed in MET's Point-Stat, Grid-Stat, Stat-Analysis, and Series-Analysis tools. In general, the probability logic is enabled by the presence of the fcst.prob dictionary in the tool's configuration file. It is either defined as a boolean (true/false) or provides additional information for how to select the probability data from the input GRIB1/2 file.

Prior to computing probabilistic statistics, MET first places all the probability matched pairs into an Nx2 contingency table. The N is based on the number of probability bins between 0 and 1. And the 2 indicates whether or not the event actually occurred in the observation. The cat_thresh configuration option specified for the probabilistic input data defines the bins that should be applied in the verification step. cat_thresh can be defined using two conventions:

  1. Explicitly specify thresholds spanning [0, 1], all with the same threshold type:
cat_thresh = [ >=0, >0.25, >=0.5, >=0.75, >=1.0 ];
  1. Use threshold equality logic as a shorthand (the example below is equivalent to the one above):
cat_thresh = [ ==0.25 ];

Both options define the same 4 probability bins. When computing the Brier Score, MET uses the center point of these bins as the probability value for ALL points falling in that bin, as seen on this line of code. For example, any probability value between 0 and 0.249999 would be evaluated as 0.125 (the center point of that bin) in the computation of Brier Score.

The task here is to refine these options to more easily evaluate probability forecasts whose values are actually 1/n, where n is a number of ensemble members. For example, when n = 6, we'd want the values used in the Brier Score computation to be 0, 0.166, 0.333, 0.5, 0.666, 0.8333, and 1.0, corresponding to ensemble values of 0/6, 1/6, 2/6, 3/6, 4/6, 5/6, and 6/6. Defining probability bins whose center values exactly match these is confusing and tedious. The task here is to add a configuration option to make it easy and convenient.

One option for supporting this logic is by adding a 3rd variation for defining probability bins. In addition to the 2 listed above, add a 3rd option for cat_thresh = [ ==N ]; where N is some integer > 1. As of MET version 10.1.0, this setting results in a runtime error:

ERROR  : string_to_prob_thresh() -> threshold value (6) must be between 0 and 1.

We could repurpose this by letting N>1 define the number of ensemble members for which we want probability bins of size 1/n and centered on those 1/n values. The advantage to this approach is that we would not need to add/document new config options. Probability thresholds are actually defined in multiple config file contexts (cat_thresh, cov_thresh, prob_cat_thresh, prob_genesis_thresh). Supporting this in all those contexts is nice. The downside is that perhaps a new config option name with additional documentation would make the logic more clear.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions