Skip to content

Conversation

srivarra
Copy link
Contributor

@srivarra srivarra commented Jan 29, 2024

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?

Fixes several issues plauging CI.

  1. Fixes the example dataset not being found across macOS and Windows GitHub Actions Runners.
  2. Adds the EZ Segmentation dataset to the the ./github/scripts/get_example_dataset.py download script.
  3. Updated outdated GitHub Actions Workflows

How did you implement your changes

Example Dataset

Simplified the CI download script.

conftest.py directly access the GITHUB_WORKSPACE environment variable which is set in CI, thus making the path OS agnostic.

Modified the example_dataset.ExampleDataset class to:

  • Explicitly set the cache_dir to the HuggingFace default ~/.cache/huggingface/datasets if None is provided.
  • Converted the dataset_paths from DatasetDict to a regular dictionary in order to make the example datasets to be OS agnostic, and it's easier to access the paths.

For example:

'/Users/user/.cache/huggingface/datasets/downloads/extracted/<hash>'
'pathlib.path(self.dataset_cache) / downloads/extracted/<hash>/<feature_name>'

CI

  • Added GITHUB_WORKSPACE as an environment variable in CI.
  • Updated the macOS runner from macos-latest (where latest is 12) to macos-13
  • Updated the following actions:
    • actions/checkout: v3 $\rightarrow$ v4
    • docker/setup-qemu-action: v2 $\rightarrow$ v3
    • docker/setup-buildx-action: v2 $\rightarrow$ v3
    • docker/login-action: v2 $\rightarrow$ v3
    • docker/metadata-action: v4 $\rightarrow$ v5
    • docker/build-push-action: v4 $\rightarrow$ v5
    • pypa/cibuildwheel: v2.13 $\rightarrow$ v2.16
    • actions/upload-artifact: v3 $\rightarrow$ v4
    • actions/setup-python: v3 $\rightarrow$ v5
    • actions/download-artifact: v3 $\rightarrow$ v4
    • pypa/gh-action-pypi-publish: v1.6 $\rightarrow$ v1.8
  • actions/upload-artifact@v4 artifacts are immutable, so the artifact upload process has been adjusted for coverage (test) and distributions (build).
    • For coverage, each matrix configuration (runner + python version) uploads a coverage file: coverage-<python version>-<runner>, (e.x. coverage-3.9-windows-latest.lcov, artifact has the same name).
    • Similar for distributions.
  • Adjusted Branch Protection rules for main, where macos-13 status checks for test and build are required. Removed macos-latest status checks.

Dependencies

Updated the following:

  • alpineer: 0.1.10 $\rightarrow$ 0.1.12
  • Cython: 0.29.Z $\rightarrow$ >3.Y.Z (for both buld-system and dependencies)
  • pyFlowSOM: 0.1.15 $\rightarrow$ 0.1.16
  • scikit-image: <=0.19.3 $\rightarrow$ <0.19.3 (0.19.3 causes issues)

The valid datset configs are gathered from the HuggingFace repo itself now, and .github/scripts/get_example_dataset.py is simplified.

Pixel Clsutering

Adds natsort calls throughout the Pixie pipeline to avoid issues with channel ordering.
Removed the parameter channels in pixel_som_clustering.py::cluster_pixels as it is an unused parameter. Reflected the change in Notebook 2.

Misc

Adjusted the runtime Protocol definition for ClusterClassTemplate to be syntactically correct.
Removed a few cibuildwheel flags that are not needed.

Remaining issues

CI / Dependencies

  • macOS 14 Sonoma Runner are available:
    • Runs exclusivly on M1 silicon, therefore we do not need to use macOS-12/13's emulated arm64 build process for cibuildwheel.
    • Requires us to drop Python 3.9 support.
  • Python 3.12 Support: We should add Python 3.12 to the suite of supported versions.
    • This requires us to:
      1. Update to later versions of numpy to make use of these runners, however this requires us to drop the spatial-lda requirement as it is not compatible with numpy versions 1.24 and newer.
      2. Update scikit-image also require the more recent numpy versions, therefore we would need to update scikit-image to 0.20 or later. Requires us to re-address Bad filtering in fiber segmentation #1055.
  • NEP 29 - Recent Future NumPy Deprecations
    1. By April 05 2024 support for Python 3.9 for numpy 1.23+ is deprecated.
    2. By of June 22 2024 support for numpy is limited to 1.24+. (spatial-lda and scikit-image hinder this one)
  • PyPI has 2FA, need to set that up so we can make releases. We should figure out how it works and get it set up, as we might not be able to upload new releases see here.

Dataset

There have been several improvements to the general dataset workflow with the HuggingFace API. We should consider seeing what new features exist to make the maintenance of it easier for us.

@srivarra srivarra self-assigned this Jan 29, 2024
@srivarra srivarra linked an issue Jan 29, 2024 that may be closed by this pull request
@srivarra srivarra added the bug Something isn't working label Jan 29, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@srivarra srivarra changed the title Example Dataset valid datasets fix General CI Updates Feb 21, 2024
@srivarra srivarra linked an issue Feb 21, 2024 that may be closed by this pull request
@srivarra srivarra marked this pull request as ready for review February 21, 2024 23:03
Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's slay this beast for good.

@camisowers
Copy link
Contributor

Why does updating to alpineer v0.1.12 not cause issues the same way it did in toffy? angelolab/alpineer#43

@srivarra
Copy link
Contributor Author

@camisowers I'm unable to replicate that issue occurring in Toffy. For example I ran the following on the example data:

io_utils.list_files(dir_name = os.path.join(base_dir, "image_data", "fov0"), substrs=".tiff")

And the I got the expected output:

Output
['CD14.tiff',
 'H3K27me3.tiff',
 'HLADR.tiff',
 'Ki67.tiff',
 'Collagen1.tiff',
 'CD45.tiff',
 'GLUT1.tiff',
 'CK17.tiff',
 'CD68.tiff',
 'CD163.tiff',
 'Fibronectin.tiff',
 'Vim.tiff',
 'CD8.tiff',
 'CD4.tiff',
 'H3K9ac.tiff',
 'ECAD.tiff',
 'SMA.tiff',
 'CD31.tiff',
 'IDO.tiff',
 'CD20.tiff',
 'PD1.tiff',
 'CD3.tiff']

Is there a specific combination of arguments which causes trouble?

Copy link
Contributor

@camisowers camisowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested it and using substs=['.tiff'] in the alpineer v0.1.12 list_files() function doesn't seem to cause an issue! (Not sure why the toffy functions are failing.)

PR looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Order column pixel_pysom differs from feather file Example Dataset is causing tests to fail
3 participants