Skip to content

Conversation

michaeldwan
Copy link
Member

@michaeldwan michaeldwan commented Jun 9, 2025

Generate updated compatibility matrix files to handle recent changes in #2376 and #2099 which default to python 3.13

This required 2 supporting fixes:

  1. nvidia/cuda no longer includes the CuDNN version in image tags, which requires fetching the image config and parsing the version out of ENV rather than parsing only the tag
  2. The version helper needed to be loosed to accept but ignore more than 3 version components

@8W9aG do we need to do anything to base images for this?

@michaeldwan michaeldwan marked this pull request as ready for review June 9, 2025 21:11
@michaeldwan michaeldwan requested a review from a team June 9, 2025 21:12
andreasjansson and others added 4 commits July 3, 2025 17:15
Also required updating compatgen with `manylinux_2`
Generate updated compatibility matrix files

This required fixing the helper that fetched images from nvidia/cuda. Image tags no longer include the CuDNN version in the tag, which requires fetching the image config and parsing the version out of ENV.
@michaeldwan michaeldwan requested a review from markphelps July 8, 2025 15:16
@@ -14,12 +14,12 @@ type Version struct {
}

func NewVersion(s string) (version *Version, err error) {
// TODO[md]: handle prerelease versions (0.1.2-rc1) so they aren't appended to the previous component
// todo[md]: tbh just switch to hashicorp/go-version or github.com/Masterminds/semver/v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 ive used semver/v3 in the past, its pretty nice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always go to that one too, but I found it the other day it doesn't support "invalid" semver input like ubuntu's "22.04" with leading zeros. I was hoping to use the same version code in cog and the new base image generator code 😒

Copy link
Contributor

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, but overall lgtm

if len(parts) != 4 {
return nil, fmt.Errorf("Tag must be in the format <cudaVersion>-cudnn<cudnnVersion>-{devel,runtime}-ubuntu<ubuntuVersion>. Invalid tag: %s", tag)
func parseCUDABaseImage(ctx context.Context, tag string) (*config.CUDABaseImage, error) {
fmt.Println("parsing", tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug printlns ? / do we want to keep these ?

images := make([]config.CUDABaseImage, len(tags))
eg, egctx := errgroup.WithContext(context.TODO())
// set a concurrency limit to avoid throttling by the docker hub api (since these are authenticated requests)
eg.SetLimit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use error group at all then if we are running them serially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

natural evolution of intermittent issues and sloppy code. fixing :)

@michaeldwan
Copy link
Member Author

@markphelps I removed the unnecessary errgroup and excessive print statements. I left one for each image since the process takes a few minutes and it's nice to see some output to know it's not hanging

Copy link
Contributor

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants