-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
CI: add a GPU CI job #22275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: add a GPU CI job #22275
Conversation
Closes scipy issue 21740.
Note that the job itself won't run on this PR before merging. Here's a CI log that's less than an hour old. |
How about merge and iterate then? Unless you want to fix regression before merging |
I do want the job to pass before merging, otherwise it will make CI on all PRs red. Help is very welcome - most issues are recent regressions for PRs that weren't tested on GPU. I won't be able to do more before Friday at least. It may be a matter of some skips in |
#22279 fixes all torch and jax.numpy failures I see locally. |
All tests pass after gh-22279, thanks again @ev-br and @lucascolley. Test suite runtime is reasonable at the moment: If we enable much more functionality on GPU then we may end up splitting it, but for now it's easier to have it in a single job. Still optimizing cache strategies a bit, then this should be ready. |
Okay, ready for final review. |
See https://github.com/scipy/scipy/actions/workflows/gpu-ci.yml for more logs if you want to see what the changes do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's give this a go and follow up if anything goes red - thanks Ralf!
draft for now because not all tests pass yet. these are recent regressions in
main
, and I'd prefer for those to be fixed in another PR firstccache
builds are very fast and the setup time is dominated by the ~3 GB worth of packages that are needed for GPU installs of PyTorch, JAX and CuPy in a single conda environment.pixi.toml
inside.github/workflows/
for now, because I don't (yet) want to cross the bridge of enabling Pixi for all SciPy contributors (there's too much logistics to work out still there for how to do lock file updates).cd .github/workflows
and then running the samepixi run test-cuda ...
command as the failing CI job step.Closes gh-21740