Skip to content

Internal: Add support for creating multiple input datasets for use case categories #1694

@georgemccabe

Description

@georgemccabe

Currently the automated tests are set up so that each model_applications category corresponds to an input data set that contains all of the data required to run all of the use cases in that category. The s2s input data set has become so large that while it doesn't exceed the maximum allowable size for the Docker data volume that stores it to use in the tests, but use case test groups that use this data run out of disk space when they write output data from the use cases.

The use case groups that fail also use the Conda environments required for METplotpy and METcalcpy, which are very large in size due to the many Python package dependencies. This also contributes to the total disk size that can be used in the test environment. The newly created metplotpy environment for #1566 is much larger in size than the existing environment, so this may cause disk space issues when that work is completed.

Size of current conda environments:

du -sh /usr/local/envs/*
897M    /usr/local/envs/metplotpy
185M    /usr/local/envs/metplus_base

Size of conda environments using Python 3.8.6 and updated package requirements:

du -sh /usr/local/envs/*
2.2G    /usr/local/envs/metplotpy.v5
168M    /usr/local/envs/metplus_base.v5

We may need to reconsider new requirements of use cases and how to group them in the tests, including:

  • Size of input data
  • Size out output data generated
  • Size of conda environment required to run
  • Others?

Describe the Enhancement

  • Come up with a good naming convention for the additional input data sets. Currently they are named after the category, i.e. s2s. We will need another data set such as s2s_2.
  • Update the automated test logic to support multiple input data sets for a given category
  • Update the Contributor's Guide Add Use Cases chapter with the updated process for adding new data
  • Update User's Guide to describe how to find input data for use cases since they will not just correspond to the model_applications sub-directory name anymore

Time Estimate

1-3 days

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

ASAP

Funding Source

2702691 2792541

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Add any new Python packages to the METplus Components Python Requirements table.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions