Skip to content

Sort mask.sid station lists to check their contents more efficiently. #1950

@JohnHalleyGotway

Description

@JohnHalleyGotway

Describe the Enhancement

The MetOffice often defines masking regions using the mask.sid option. That defines a list of station ID names to be included in the verification region. Those station name lists are stored in a StringArray object which in turn stores them as a vector of strings. For each point observation, the PairDataPoint class checks to see if the station names appears in the specified list. However, the StringArray::has() function searches the list linearly for a match which is O(n). This task is to optimize that search to be O(log(n)):

  • Add "bool Sorted" as a member of the StringArray class (mimic logic of NumArray).
  • Should be false by default and reset to false by all "add" fuctions.
  • Add "StringArray::sort_array()" member function (naming consistent with existing NumArray::sort_array())... which sets Sorted = true after sorting. See StackOverflow for quick/easy sorting example:
std::sort( serverList.begin() , serverList.end() )
  • Enhance 2 "StringArray::has()" member function variations to check if the StringArray has already been sorted, and, if so, search more efficiently. See StackOverflow for quick/easy searching example:
std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) to find first matching
  • Also look at NumArray::has() and TimeArray::has()/index() to see if similar enhancements could be made for sorted arrrays.
  • In config_util.cc, update the parse_sid_mask() function so that just prior to returning, call StringArray::sort(). Storing the station ID list in a sorted order should always be a good idea.
  • Lastly, devise a test before/after these changes to quantify how much time is saved using this strategy. Note that this does not need to be added as a new unit test... just to quantify the improvement this work provides.

Time Estimate

2 days (?)

Sub-Issues

Consider breaking the enhancement down into sub-issues.
No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2799991

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: @sethlinden
  • Select scientist(s) or no scientist required: @robdarvell (would be good for advice on testing)

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions