-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
We've touched on this in a few places, in a community meeting, on a PR I can't find back right now, and in in-person discussions. So here is a new issue with a concrete proposal.
We have experimental support for CuPy, PyTorch and JAX - all of which have CUDA (and in some cases ROCm and/or Intel XPUs) support, which we are not exercising in CI. As a result, we get issues like gh-21486 where someone finds locally that the test suite has regressed for a GPU-enabled array library. This is obviously not ideal, and we'd like a low-overhead and low-cost way to test PRs that a reviewer thinks may impact GPU support in CI.
Scikit-learn has such a CI job, see the GHA workflow file for it and @betatim's blog post describing the design.
The other thing we needed for it was a credit card from NumFOCUS that we can attach to this repo (with a limit on spending level, as described in the blog post) - we have that as well now. So we're all set in principle to add this.
Suggested logistics and design:
- The SciPy core team should sign off on the spending limit (just like we did for Cirrus CI usage). I'd suggest either $50/month or $100/month as the limit, and adjust if needed once we have some experience with it.
- The label-based triggering that scikit-learn has, where adding the label to a PR does a single run and then the label gets removed again so new commits don't auto-run the job again, seems nice - so let's copy it.
- We can start with building SciPy in the GPU-enabled CI runnner; if it becomes expensive we can move the SciPy wheel build to a separate (CPU-only, free) job, upload the wheel to the GHA cache, and then trigger a second job based on the first one to run the CUDA tests.
- The one change from the scikit-learn setup we should make is to use
pixi
rather thanconda-lock
. The latter makes sense for scikit-learn since they were already using it, butpixi
will work better for us and we've mostly already got the setup worked out in https://github.com/rgommers/pixi-dev-scipystack. I don't want to propose adding that setup in the root of the repo, but only the single environment + lock file that we need to run this GPU job.
Thoughts?