-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Describe the Enhancement
This issue arose via the dtcenter/METplus#2364 discussion. @johnlwagner has encountered severe issues running Stat-Analysis jobs on WCOSS2. @dkokron was able to trace the performance issue back to MET's writing of temporary files. When he reconfigured with tmp_dir = "/tmp";
, a large Stat-Analysis job completed in minutes rather than hanging for several hours. He found that MET was writing/reading/deleting thousands of tiny temp ascii files. Writing many of these tiny temp files to a project space across the lustre file system used by WCOSS2 is incredibly slow. Whereas writing to the local /tmp
directory relieves this pinch point.
However, in 2021, NCO directed NOAA staff to explicitly not write to the local /tmp
directory on the compute nodes to keep the output for each job well organized and prevent an accumulation of stale data in the local /tmp
directory.
For this issue, do the following:
- Survey the use of temporary files in the MET codebase by searching for occurrences of
MET_TMP_DIR
andget_tmp_dir()
. - Consider providing a workaround to prevent the need for writing a temp file.
Pay close attention to the MetConfig::read_string(const char * s) function. As described in this GitHub Discussion comment, avoiding writing a temp file there may go a long way to addressing this issue.
@johnlwagner notes that since this has to do with the processing of threshold strings, it may well likely be related to the perplexing behavior described in this dtcenter/METplus#1506 discussion.
Time Estimate
2 days
Sub-Issues
Consider breaking the enhancement down into sub-issues.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
Define the source of funding and account keys here or state NONE.
Define the Metadata
Assignee
- Select engineer(s) or no engineer required
- Select scientist(s) or no scientist required
Labels
- Review default alert labels
- Select component(s)
- Select priority
- Select requestor(s)
Milestone and Projects
- Select Milestone as the next official version or Backlog of Development Ideas
- For the next official version, select the MET-X.Y.Z Development project
Define Related Issue(s)
Consider the impact to the other METplus components.
- METplus, MET, METdataio, METviewer, METexpress, METcalcpy, METplotpy
No direct impacts right now. However, we have been discussing some related functionality in METplus to enhance its logic for cleaning up any temp files written by failed MET jobs. - MET Eliminate the use of temporary files in the
vx_config
library #2691 remove temp files fromvx_config
library. - MET Revise the use of temporary files in Stat-Analysis #2698 would revise temp files in
stat_analysis
.
Enhancement Checklist
See the METplus Workflow for details.
- Complete the issue definition above, including the Time Estimate and Funding Source.
- Fork this repository or create a branch of develop.
Branch name:feature_<Issue Number>_<Description>
- Complete the development and test your changes.
- Add/update log messages for easier debugging.
- Add/update unit tests.
- Add/update documentation.
- Push local changes to GitHub.
- Submit a pull request to merge into develop.
Pull request:feature <Issue Number> <Description>
- Define the pull request metadata, as permissions allow.
Select: Reviewer(s) and Development issue
Select: Milestone as the next official version
Select: MET-X.Y.Z Development project for development toward the next official release - Iterate until the reviewer(s) accept and merge your changes.
- Delete your fork or branch.
- Close this issue.