Skip to content

Reimplement the NumArray class based on an STL template. #1899

@JohnHalleyGotway

Description

@JohnHalleyGotway

Describe the Enhancement

This issue arose during development for MET #1875. For that issue, we found/fixed a bug that caused Stat-Analysis to consume way too much memory. However testing for that issue revealed an opportunity for additional optimization of memory usage, as described in this comment.

When running jobs over many, many combinations of station id, variable, level, and lead times using the -by option several times (over 30,000 jobs!), the Stat-Analysis tool consumes a great deal of memory. For each case, Stat-Analysis stores several NumArray objects to store arrays of values (i.e. fcst, obs, climo mean, stdev, and so on). However in MET version 10.0.0, the default allocation increment for NumArray is 1000. So any array with length less than 1000 actually consumes the same amount of memory an array of length 1000.

For the test described in comment, each case processes less than 100 time points. And for that we changed the default NumArray allocation increment from 1000 down to 100, which led to approximately a 75% reduction in memory usage for these two tests.

While running tens of thousands of jobs in Stat-Analysis is not done routinely, it is a good stress test and reveals an opportunity for optimization. Recommend that we reimplement the NumArray class using an STL template (like array or vector) rather than managing chunks of memory ourselves. Hopefully that'll slim down our memory usage.

When these changes are available on a feature branch for testing, recommend that we:

  • Configure DockerHub to build that feature branch.
  • Coordinate with @lindsayrblank rerun tests on that feature branch.

Time Estimate

3 days.

Sub-Issues

Consider breaking the enhancement down into sub-issues.
No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Split 2793541 and 2702691

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: John HG
  • Select scientist(s) or no scientist required: No scientist needed but coordinate with @lindsayrblank for testing.

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions