Update compatibility matrix files #2400

michaeldwan · 2025-06-09T20:54:27Z

Generate updated compatibility matrix files to handle recent changes in #2376 and #2099 which default to python 3.13

This required 2 supporting fixes:

nvidia/cuda no longer includes the CuDNN version in image tags, which requires fetching the image config and parsing the version out of ENV rather than parsing only the tag
The version helper needed to be loosed to accept but ignore more than 3 version components

@8W9aG do we need to do anything to base images for this?

Also required updating compatgen with `manylinux_2`

Generate updated compatibility matrix files This required fixing the helper that fetched images from nvidia/cuda. Image tags no longer include the CuDNN version in the tag, which requires fetching the image config and parsing the version out of ENV.

markphelps · 2025-07-08T15:21:26Z

pkg/util/version/version.go

@@ -14,12 +14,12 @@ type Version struct {
 }

 func NewVersion(s string) (version *Version, err error) {
+	// TODO[md]: handle prerelease versions (0.1.2-rc1) so they aren't appended to the previous component
+	// todo[md]: tbh just switch to hashicorp/go-version or github.com/Masterminds/semver/v3


💯 ive used semver/v3 in the past, its pretty nice

I always go to that one too, but I found it the other day it doesn't support "invalid" semver input like ubuntu's "22.04" with leading zeros. I was hoping to use the same version code in cog and the new base image generator code 😒

markphelps

one question, but overall lgtm

markphelps · 2025-07-08T15:22:29Z

tools/compatgen/internal/cuda.go

-	if len(parts) != 4 {
-		return nil, fmt.Errorf("Tag must be in the format <cudaVersion>-cudnn<cudnnVersion>-{devel,runtime}-ubuntu<ubuntuVersion>. Invalid tag: %s", tag)
+func parseCUDABaseImage(ctx context.Context, tag string) (*config.CUDABaseImage, error) {
+	fmt.Println("parsing", tag)


debug printlns ? / do we want to keep these ?

markphelps · 2025-07-08T15:24:29Z

tools/compatgen/internal/cuda.go

+	images := make([]config.CUDABaseImage, len(tags))
+	eg, egctx := errgroup.WithContext(context.TODO())
+	// set a concurrency limit to avoid throttling by the docker hub api (since these are authenticated requests)
+	eg.SetLimit(1)


why use error group at all then if we are running them serially?

natural evolution of intermittent issues and sloppy code. fixing :)

michaeldwan · 2025-07-08T15:46:04Z

@markphelps I removed the unnecessary errgroup and excessive print statements. I left one for each image since the process takes a few minutes and it's nice to see some output to know it's not hanging

markphelps

lgtm!

michaeldwan marked this pull request as ready for review June 9, 2025 21:11

michaeldwan requested a review from a team June 9, 2025 21:12

This was referenced Jun 9, 2025

Support Torch 2.7.0, 2.6.0; cuda 12.9, 12.8, 12.6.3 #2320

Open

Use Python 3.13 by default #2099

Draft

michaeldwan requested a review from andreasjansson June 10, 2025 00:11

michaeldwan force-pushed the md/fix-compatgen branch from 75ad8f3 to ac77130 Compare July 3, 2025 23:10

andreasjansson and others added 4 commits July 3, 2025 17:15

Support Torch 2.7.0, 2.6.0; cuda 12.9, 12.8, 12.6.3

ebaa1cb

Also required updating compatgen with `manylinux_2`

update compat matrix files

63d2897

Generate updated compatibility matrix files This required fixing the helper that fetched images from nvidia/cuda. Image tags no longer include the CuDNN version in the tag, which requires fetching the image config and parsing the version out of ENV.

use auth to avoid docker hub rate limits

de2a62e

regenerate compat matrix files

dc9197c

michaeldwan force-pushed the md/fix-compatgen branch from ac77130 to dc9197c Compare July 3, 2025 23:21

michaeldwan added 2 commits July 7, 2025 10:27

Merge branch 'main' into md/fix-compatgen

aa3731d

Update torch_compatibility_matrix.json

e4531fe

michaeldwan requested a review from markphelps July 8, 2025 15:16

markphelps reviewed Jul 8, 2025

View reviewed changes

markphelps approved these changes Jul 8, 2025

View reviewed changes

michaeldwan added 2 commits July 8, 2025 09:44

fix nits

6b3ddba

regenerate

7bf472b

markphelps approved these changes Jul 8, 2025

View reviewed changes

michaeldwan mentioned this pull request Jul 8, 2025

Remove python 3.12 from older torch versions #2455

Merged

Merge branch 'main' into md/fix-compatgen

eab374b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update compatibility matrix files #2400

Update compatibility matrix files #2400

Uh oh!

michaeldwan commented Jun 9, 2025 •

edited

Loading

Uh oh!

markphelps Jul 8, 2025

Uh oh!

michaeldwan Jul 8, 2025

Uh oh!

markphelps left a comment

Uh oh!

markphelps Jul 8, 2025

Uh oh!

markphelps Jul 8, 2025

Uh oh!

michaeldwan Jul 8, 2025

Uh oh!

michaeldwan commented Jul 8, 2025

Uh oh!

markphelps left a comment

Uh oh!

Uh oh!

Update compatibility matrix files #2400

Are you sure you want to change the base?

Update compatibility matrix files #2400

Uh oh!

Conversation

michaeldwan commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

markphelps left a comment

Choose a reason for hiding this comment

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan commented Jul 8, 2025

Uh oh!

markphelps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaeldwan commented Jun 9, 2025 •

edited

Loading