Skip to content

CI: adding a GPU-enabled CI job #21740

@rgommers

Description

@rgommers

We've touched on this in a few places, in a community meeting, on a PR I can't find back right now, and in in-person discussions. So here is a new issue with a concrete proposal.

We have experimental support for CuPy, PyTorch and JAX - all of which have CUDA (and in some cases ROCm and/or Intel XPUs) support, which we are not exercising in CI. As a result, we get issues like gh-21486 where someone finds locally that the test suite has regressed for a GPU-enabled array library. This is obviously not ideal, and we'd like a low-overhead and low-cost way to test PRs that a reviewer thinks may impact GPU support in CI.

Scikit-learn has such a CI job, see the GHA workflow file for it and @betatim's blog post describing the design.

The other thing we needed for it was a credit card from NumFOCUS that we can attach to this repo (with a limit on spending level, as described in the blog post) - we have that as well now. So we're all set in principle to add this.

Suggested logistics and design:

  • The SciPy core team should sign off on the spending limit (just like we did for Cirrus CI usage). I'd suggest either $50/month or $100/month as the limit, and adjust if needed once we have some experience with it.
  • The label-based triggering that scikit-learn has, where adding the label to a PR does a single run and then the label gets removed again so new commits don't auto-run the job again, seems nice - so let's copy it.
  • We can start with building SciPy in the GPU-enabled CI runnner; if it becomes expensive we can move the SciPy wheel build to a separate (CPU-only, free) job, upload the wheel to the GHA cache, and then trigger a second job based on the first one to run the CUDA tests.
  • The one change from the scikit-learn setup we should make is to use pixi rather than conda-lock. The latter makes sense for scikit-learn since they were already using it, but pixi will work better for us and we've mostly already got the setup worked out in https://github.com/rgommers/pixi-dev-scipystack. I don't want to propose adding that setup in the root of the repo, but only the single environment + lock file that we need to run this GPU job.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    CIItems related to the CI tools such as CircleCI, GitHub Actions or Azurearray typesItems related to array API support and input array validation (see gh-18286)enhancementA new feature or improvement

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions