Skip to content

Enable MD5 checksums on Google Cloud Storage #3018

@stanhu

Description

@stanhu

In Docker Distribution v2.7.1, we've seen many instances where the manifest files produced 0-byte files but did not report any errors.

One thing that would help would be to enable MD5 checksums on the PutObject call.

Right now the version of the Google Cloud SDK is so old (2015) that it doesn't have full support for this. However, by grafting on changes from the latest SDK, the following patch makes this work:

diff --git a/registry/storage/driver/gcs/gcs.go b/registry/storage/driver/gcs/gcs.go
index 86dc87f1..ed46088e 100644
--- a/registry/storage/driver/gcs/gcs.go
+++ b/registry/storage/driver/gcs/gcs.go
@@ -17,6 +17,7 @@ package gcs
 import (
 	"bytes"
 	"context"
+	"crypto/md5"
 	"encoding/json"
 	"fmt"
 	"io"
@@ -275,6 +276,9 @@ func (d *driver) PutContent(context context.Context, path string, contents []byt
 	return retry(func() error {
 		wc := storage.NewWriter(d.context(context), d.bucket, d.pathToKey(path))
 		wc.ContentType = "application/octet-stream"
+		h := md5.New()
+		h.Write(contents)
+		wc.MD5 = h.Sum(nil)
 		return putContentsClose(wc, contents)
 	})
 }
diff --git a/vendor/google.golang.org/cloud/storage/types.go b/vendor/google.golang.org/cloud/storage/types.go
index 060deb6a..99c023c6 100644
--- a/vendor/google.golang.org/cloud/storage/types.go
+++ b/vendor/google.golang.org/cloud/storage/types.go
@@ -113,6 +113,11 @@ type ObjectAttrs struct {
 	// Optional. If nil or empty, existing ACL rules are preserved.
 	ACL []ACLRule
 
+	// MD5 is the MD5 hash of the object's content. This field is read-only,
+	// except when used from a Writer. If set on a Writer, the uploaded
+	// data is rejected if its MD5 hash does not match this field.
+	MD5 []byte
+
 	// Metadata represents user-provided metadata, in key/value pairs.
 	// It can be nil if the current metadata values needs to preserved.
 	Metadata map[string]string
@@ -364,8 +369,14 @@ func (w *Writer) open() {
 	w.opened = true
 
 	go func() {
+		rawObj := attrs.toRawObject(w.bucket)
+
+		if w.MD5 != nil {
+			rawObj.Md5Hash = base64.StdEncoding.EncodeToString(w.MD5)
+		}
+
 		resp, err := rawService(w.ctx).Objects.Insert(
-			w.bucket, attrs.toRawObject(w.bucket)).Media(w.r).Projection("full").Context(w.ctx).Do()
+			w.bucket, rawObj).Media(w.r).Projection("full").Context(w.ctx).Do()
 		w.err = err
 		if err == nil {
 			w.obj = newObject(resp)

To make this work without changing the vendored Google files, we'd need to use a more recent version of the Google Cloud SDK. However, the latest SDK changed the interface. For one, the package has moved from google.golang.org/cloud to cloud.google.com/go. In addition, we'd have to make these kinds of changes (this is not totally correct, but you get the picture):

@@ -251,7 +251,11 @@ func (d *driver) GetContent(context context.Context, path string) ([]byte, error
        var rc io.ReadCloser
        err := retry(func() error {
                var err error
-               rc, err = storage.NewReader(gcsContext, d.bucket, name)
+               client, err := storage.NewClient(context)
+               if err != nil {
+                       return err
+               }
+               rc, err = client.Bucket(d.bucket).Object(name).NewReader(gcsContext)
                return err
        })
        if err == storage.ErrObjectNotExist {

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions