Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

georgemccabe · 2024-05-02T21:19:11Z

Expected Differences

Do these changes introduce new tools, command line arguments, or configuration file options? [No]
Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

Pull Request Testing

Describe testing already performed for these changes:

On seneca, I created a test script that runs plot_point_obs both passing an input file directly into MET and passing that same file through a python embedding script that converts the data to a Pandas DataFrame, then passes it to MET.

Test directory: /d1/projects/METplus/METplus_Data/development/met_2781

To run the script:

cd /d1/projects/METplus/METplus_Data/development/met_2781
./run_test.sh

or just compare the output files output/raw_subset.png and output/pyembed_subset.png which should contain the same plot.

I also added a unit test to demonstrate the new Python Embedding example and confirmed that it runs successfully both on seneca and in GHA.

Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

@DanielAdriaansen : confirm that the new logic works with your test data
@hsoh-u : confirm that the new python logic is in the correct location and matches the format/standards of the rest of the python logic

Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]

Could consider adding an example of using the new script somewhere, but I am not sure where that would live.

Do these changes include sufficient testing updates? [Yes]
Will this PR result in changes to the MET test suite? [Yes]

There will be 1 additional output file generated:

dir1: /data/output/met_test_truth contains 1144 files
dir2: /data/output/met_test_output contains 1145 files

ERROR: folder /data/output/met_test_truth missing 1 files
python/ndas.20120409.t12z.prepbufr.tm00.nr_met_nc_to_pandas.ps

Will this PR result in changes to existing METplus Use Cases? [No]

If yes, create a new Update Truth METplus issue to describe them.
Do these changes introduce new SonarQube findings? [No]
Please complete this pull request review by 5/7/2024.

Pull Request Checklist

See the METplus Workflow for details.

Review the source issue metadata (required labels, projects, and milestone).
Complete the PR definition above.
Ensure the PR title matches the feature or bugfix branch name.
Define the PR metadata, as permissions allow.
Select: Reviewer(s) and Development issue
Select: Milestone as the version that will include these changes
Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
Close the linked issue and delete your feature or bugfix branch from GitHub.

…a to pandas so it can be read and modified in a python embedding script. Added example python embedding script

…rn value to catch if function fails to read data

…when file passed to script cannot be read by the NetCDF library

…n from observation header IDs

…reads MET NetCDF point observation file and converts it to a pandas DataFrame

hsoh-u · 2024-05-03T20:25:21Z

scripts/python/met/point_nc.py

+         return False
+
+      try:
+          dataset = nc.Dataset(nc_filename, 'r')


Minor: one more space than other lines

hsoh-u

I approve the changes. No duplicated codes and the new API produces PANDA data frame.

DanielAdriaansen · 2024-05-06T21:58:12Z

I was able to test this on seneca. It takes 20s to read in this file:

DEBUG 1: Reading point observation file: PYTHON_NUMPY=pyembed_pandas_testing.py
           typ       sid              vld        lat         lon   elv   var     lvl         hgt qc         obs
0       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   HGT  1000.0  -151.19162  2 -151.000000
1       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0  SPFH   977.0    18.02284  2    0.000329
2       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   TMP   977.0    18.02284  2  249.449997
3       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   HGT   977.0    18.02284  2   18.000000
4       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0  SPFH   976.0 -9999.00000  2    0.000342
...        ...       ...              ...        ...         ...   ...   ...     ...         ... ..         ...
934843  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   500.0 -9999.00000  0    6.900000
934844  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  UGRD   400.0 -9999.00000  0    7.900000
934845  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   400.0 -9999.00000  0    6.100000
934846  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  UGRD   300.0 -9999.00000  0    4.700000
934847  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   300.0 -9999.00000  0    4.500000

[934848 rows x 11 columns]

which is close to 1M observations. Better running of PB2NC via config options could help speed this up.

I worked with the DataFrame a bit in Python and didn't observe any trouble.

I wonder if we need any documentation of this? Maybe in Appendix F? @JohnHalleyGotway thoughts?

DanielAdriaansen · 2024-05-06T22:05:32Z

I guess we have this section, which is empty:
https://met.readthedocs.io/en/develop/Users_Guide/appendixF.html#met-python-package

I thought maybe convert_point_data() was documented there, but it is not. So maybe for now it's OK to leave this undocumented.

Maybe I will add a "to-do" item here: on #2414 to document the "MET Python Module".

…name. Also raise TypeError exception from nc_point_obs.read_data() if input file cannot be read

georgemccabe · 2024-05-08T15:46:37Z

@hsoh-u , I talked with @DanielAdriaansen about these changes. Based on his feedback, I added an init function to nc_point_obs to take an input file path so it can be initialized without calling read_data(). I also changed the read_data() function to raise an exception instead of return a boolean for success.

hsoh-u

I approve the changes.

georgemccabe added 12 commits April 24, 2024 09:27

Per #2781, added function to convert MET NetCDF point observation dat…

c722bdc

…a to pandas so it can be read and modified in a python embedding script. Added example python embedding script

ignore python cache files

6ed91b3

fixed function call

f6cf1c0

reduce cognitive complexity to satisfy SonarQube and add boolean retu…

b08518a

…rn value to catch if function fails to read data

clean up script and add comments

62bc920

replace call to object function that doesn't exist, handle exception …

284aba7

…when file passed to script cannot be read by the NetCDF library

Merge branch 'develop' into feature_2781_met_nc_obs_to_pandas

7fd191c

rename example script

ae1791b

Merge branch 'develop' into feature_2781_met_nc_obs_to_pandas

482d9c5

add new example script to makefiles

2d974bf

fix logic to build pandas DataFrame to properly get header informatio…

b71160f

…n from observation header IDs

Per #2781, add unit test to demonstrate python embedding script that …

2c9d220

…reads MET NetCDF point observation file and converts it to a pandas DataFrame

georgemccabe added this to the MET 12.0.0 milestone May 2, 2024

georgemccabe requested review from hsoh-u and DanielAdriaansen May 2, 2024 21:19

georgemccabe linked an issue May 2, 2024 that may be closed by this pull request

Add new Python functionality to convert MET netcdf observation data to a Pandas DataFrame #2781

Closed

22 tasks

hsoh-u reviewed May 3, 2024

View reviewed changes

hsoh-u previously approved these changes May 6, 2024

View reviewed changes

georgemccabe added 2 commits May 8, 2024 15:28

Merge branch 'develop' into feature_2781_met_nc_obs_to_pandas

67703e3

Per #2781, added init function for nc_point_obs to take an input file…

51db09f

…name. Also raise TypeError exception from nc_point_obs.read_data() if input file cannot be read

georgemccabe dismissed hsoh-u’s stale review via 51db09f May 8, 2024 15:44

georgemccabe requested a review from hsoh-u May 8, 2024 15:44

hsoh-u previously approved these changes May 8, 2024

View reviewed changes

call parent class init function to properly initialize nc_point_obs

7e9cef9

georgemccabe dismissed hsoh-u’s stale review via 7e9cef9 May 8, 2024 17:04

hsoh-u self-requested a review May 8, 2024 18:05

DanielAdriaansen mentioned this pull request May 8, 2024

Suggested Improvements to Python Embedding across MET #2414

Open

50 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

Uh oh!

georgemccabe commented May 2, 2024 •

edited

Loading

Uh oh!

hsoh-u May 3, 2024

Uh oh!

hsoh-u left a comment

Uh oh!

DanielAdriaansen commented May 6, 2024

Uh oh!

DanielAdriaansen commented May 6, 2024

Uh oh!

georgemccabe commented May 8, 2024

Uh oh!

hsoh-u left a comment

Uh oh!

Uh oh!

Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame #2877

Uh oh!

Conversation

georgemccabe commented May 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expected Differences

Pull Request Testing

Pull Request Checklist

Uh oh!

hsoh-u May 3, 2024

Choose a reason for hiding this comment

Uh oh!

hsoh-u left a comment

Choose a reason for hiding this comment

Uh oh!

DanielAdriaansen commented May 6, 2024

Uh oh!

DanielAdriaansen commented May 6, 2024

Uh oh!

georgemccabe commented May 8, 2024

Uh oh!

hsoh-u left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

georgemccabe commented May 2, 2024 •

edited

Loading