Skip to content

Conversation

georgemccabe
Copy link
Collaborator

@georgemccabe georgemccabe commented May 2, 2024

Expected Differences

  • Do these changes introduce new tools, command line arguments, or configuration file options? [No]

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

Pull Request Testing

  • Describe testing already performed for these changes:

On seneca, I created a test script that runs plot_point_obs both passing an input file directly into MET and passing that same file through a python embedding script that converts the data to a Pandas DataFrame, then passes it to MET.

Test directory: /d1/projects/METplus/METplus_Data/development/met_2781

To run the script:

cd /d1/projects/METplus/METplus_Data/development/met_2781
./run_test.sh

or just compare the output files output/raw_subset.png and output/pyembed_subset.png which should contain the same plot.

I also added a unit test to demonstrate the new Python Embedding example and confirmed that it runs successfully both on seneca and in GHA.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

@DanielAdriaansen : confirm that the new logic works with your test data
@hsoh-u : confirm that the new python logic is in the correct location and matches the format/standards of the rest of the python logic

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]

Could consider adding an example of using the new script somewhere, but I am not sure where that would live.

  • Do these changes include sufficient testing updates? [Yes]

  • Will this PR result in changes to the MET test suite? [Yes]

There will be 1 additional output file generated:

dir1: /data/output/met_test_truth contains 1144 files
dir2: /data/output/met_test_output contains 1145 files

ERROR: folder /data/output/met_test_truth missing 1 files
python/ndas.20120409.t12z.prepbufr.tm00.nr_met_nc_to_pandas.ps

  • Will this PR result in changes to existing METplus Use Cases? [No]

    If yes, create a new Update Truth METplus issue to describe them.

  • Do these changes introduce new SonarQube findings? [No]

  • Please complete this pull request review by 5/7/2024.

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the version that will include these changes
    Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

return False

try:
dataset = nc.Dataset(nc_filename, 'r')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: one more space than other lines

hsoh-u
hsoh-u previously approved these changes May 6, 2024
Copy link
Collaborator

@hsoh-u hsoh-u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the changes. No duplicated codes and the new API produces PANDA data frame.

@DanielAdriaansen
Copy link
Contributor

I was able to test this on seneca. It takes 20s to read in this file:

DEBUG 1: Reading point observation file: PYTHON_NUMPY=pyembed_pandas_testing.py
           typ       sid              vld        lat         lon   elv   var     lvl         hgt qc         obs
0       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   HGT  1000.0  -151.19162  2 -151.000000
1       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0  SPFH   977.0    18.02284  2    0.000329
2       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   TMP   977.0    18.02284  2  249.449997
3       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0   HGT   977.0    18.02284  2   18.000000
4       ADPUPA     89571  20200824_113000 -68.580002   77.970001  18.0  SPFH   976.0 -9999.00000  2    0.000342
...        ...       ...              ...        ...         ...   ...   ...     ...         ... ..         ...
934843  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   500.0 -9999.00000  0    6.900000
934844  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  UGRD   400.0 -9999.00000  0    7.900000
934845  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   400.0 -9999.00000  0    6.100000
934846  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  UGRD   300.0 -9999.00000  0    4.700000
934847  SYNDAT  MA030044  20200824_120000  26.250000  127.500000    --  VGRD   300.0 -9999.00000  0    4.500000

[934848 rows x 11 columns]

which is close to 1M observations. Better running of PB2NC via config options could help speed this up.

I worked with the DataFrame a bit in Python and didn't observe any trouble.

I wonder if we need any documentation of this? Maybe in Appendix F? @JohnHalleyGotway thoughts?

@DanielAdriaansen
Copy link
Contributor

I guess we have this section, which is empty:
https://met.readthedocs.io/en/develop/Users_Guide/appendixF.html#met-python-package

I thought maybe convert_point_data() was documented there, but it is not. So maybe for now it's OK to leave this undocumented.

Maybe I will add a "to-do" item here: on #2414 to document the "MET Python Module".

…name. Also raise TypeError exception from nc_point_obs.read_data() if input file cannot be read
@georgemccabe
Copy link
Collaborator Author

@hsoh-u , I talked with @DanielAdriaansen about these changes. Based on his feedback, I added an init function to nc_point_obs to take an input file path so it can be initialized without calling read_data(). I also changed the read_data() function to raise an exception instead of return a boolean for success.

hsoh-u
hsoh-u previously approved these changes May 8, 2024
Copy link
Collaborator

@hsoh-u hsoh-u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

Add new Python functionality to convert MET netcdf observation data to a Pandas DataFrame
3 participants