CI: add a GPU CI job #22275

rgommers · 2025-01-07T20:16:11Z

draft for now because not all tests pass yet. these are recent regressions in main, and I'd prefer for those to be fixed in another PR first

Runs tests for all array API-enabled functionality with PyTorch, JAX and CuPy in a single CI job (for now)
Builds on the GPU runner, because with ccache builds are very fast and the setup time is dominated by the ~3 GB worth of packages that are needed for GPU installs of PyTorch, JAX and CuPy in a single conda environment.
Uses Pixi to ensure we get a maintainable and robust environment.
- Keeps pixi.toml inside .github/workflows/ for now, because I don't (yet) want to cross the bridge of enabling Pixi for all SciPy contributors (there's too much logistics to work out still there for how to do lock file updates).
- Failures reproduce quite well locally. Should be a matter of cd .github/workflows and then running the same pixi run test-cuda ... command as the failing CI job step.

Closes gh-21740

Closes scipy issue 21740.

rgommers · 2025-01-07T20:19:06Z

Note that the job itself won't run on this PR before merging. Here's a CI log that's less than an hour old.

.github/workflows/gpu-ci.yml

ev-br · 2025-01-08T09:00:16Z

Note that the job itself won't run on this PR before merging.

How about merge and iterate then? Unless you want to fix regression before merging

rgommers · 2025-01-08T09:14:11Z

How about merge and iterate then? Unless you want to fix regression before merging

I do want the job to pass before merging, otherwise it will make CI on all PRs red.

Help is very welcome - most issues are recent regressions for PRs that weren't tested on GPU. I won't be able to do more before Friday at least. It may be a matter of some skips in signal and stats, at least to fix PyTorch GPU.

ev-br · 2025-01-08T10:34:32Z

#22279 fixes all torch and jax.numpy failures I see locally.

rgommers · 2025-01-08T21:30:42Z

All tests pass after gh-22279, thanks again @ev-br and @lucascolley.

Test suite runtime is reasonable at the moment:

If we enable much more functionality on GPU then we may end up splitting it, but for now it's easier to have it in a single job.

Still optimizing cache strategies a bit, then this should be ready.

rgommers · 2025-01-08T22:01:06Z

Caching performance after the changes I'm about to push, that's about as good as it's going to get:

rgommers · 2025-01-08T22:12:56Z

Okay, ready for final review.

rgommers · 2025-01-08T22:13:36Z

See https://github.com/scipy/scipy/actions/workflows/gpu-ci.yml for more logs if you want to see what the changes do.

.github/workflows/gpu-ci.yml

lucascolley

let's give this a go and follow up if anything goes red - thanks Ralf!

CI: add a GPU CI job

8fb4385

Closes scipy issue 21740.

rgommers added enhancement A new feature or improvement CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure array types Items related to array API support and input array validation (see gh-18286) labels Jan 7, 2025

rgommers requested review from larsoner and andyfaff as code owners January 7, 2025 20:16

lucascolley reviewed Jan 7, 2025

View reviewed changes

.github/workflows/gpu-ci.yml Show resolved Hide resolved

rgommers marked this pull request as draft January 8, 2025 09:12

lucascolley marked this pull request as ready for review January 8, 2025 21:13

rgommers added 2 commits January 8, 2025 23:04

CI: optimize caching strategies for Pixi and Ccache

b67f91e

CI: add documentation inside gpu-ci.yml

61628a0

lucascolley reviewed Jan 8, 2025

View reviewed changes

.github/workflows/gpu-ci.yml Outdated Show resolved Hide resolved

Update gpu-ci.yml [skip circle]

d8784ca

lucascolley approved these changes Jan 8, 2025

View reviewed changes

lucascolley merged commit 185233e into scipy:main Jan 8, 2025
37 of 38 checks passed

j-bowhay added this to the 1.16.0 milestone Jan 8, 2025

lucascolley mentioned this pull request Feb 24, 2025

DOC: Interactive notebooks in stats tutorials are not working in version 1.15 docs. #22241

Open

btjanaka mentioned this pull request Jul 15, 2025

[FEATURE REQUEST] Adopt Array API standard to support torch, jax, etc. icaros-usc/pyribs#570

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CI: add a GPU CI job #22275

CI: add a GPU CI job #22275

Uh oh!

rgommers commented Jan 7, 2025

Uh oh!

rgommers commented Jan 7, 2025

Uh oh!

Uh oh!

ev-br commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

ev-br commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

Uh oh!

lucascolley left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CI: add a GPU CI job #22275

CI: add a GPU CI job #22275

Uh oh!

Conversation

rgommers commented Jan 7, 2025

Uh oh!

rgommers commented Jan 7, 2025

Uh oh!

Uh oh!

ev-br commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

ev-br commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

rgommers commented Jan 8, 2025

Uh oh!

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!