Skip to content

Conversation

hsanjuan
Copy link

This adds an IPFS storage driver which stores and makes the registry objects
available on IPFS. Said otherwise, it enables a swarm of docker registries to
share an underlying, content-addressed storage layer out of the box.

Running docker push ipfs-registry1/my-container can be followed by docker pull ipfs-registry2/my-container. ipfs-registry2 will transparently fetch
the necessary objects from ipfs-registry1 (or from any other providers in the
network).

The driver runs an embedded IPFS-Lite daemon and is therefore ready-to-use
without additional external requirements (the ipfs daemon does NOT need to run
alongside).


It implements the filesystem by storing the [paths -> ipfs_object_identifier]
mappings in a replicated database using go-ds-crdt. Thus, when a registry
instance commits an object to the storage layer, every other registry instance
in the cluster will become aware of the new path/cid map (as long as the
pubsub messages arrive) and will be able to fetch the objects from the Network
when needed. Any registry instance willing to participate in the same cluster
will need to set the same value for the "pubsubtopic" parameter, to which all
registries subscribe.

The IPFS peer used for adding/getting/announcing is embedded in the driver
using IPFS-Lite, as are the libp2p host, DHT, pubsub and blockstore/datastore
facilities. Therefore, it is not required to run an additional IPFS daemon nor
there is any dependency between the regular IPFS daemon and this.

Each registry generates a private key for the libp2p host during the first run
and is then persisted to the datastore, though it can always be manually
provided via parameters. Instances may include custom bootstrappers and an
optional private network pre-shared key. In this case, the registry instances
will form an isolated IPFS private network. By default, instances will
bootstrap to the public bootstrappers and content made available in the public
IPFS network. In this case we disallow using the default pubsubtopic to avoid
accidents.

For reference, until driver documentation is created, it can be configured
this way (all keys are optional):

ipfs:
  # all registry instances should share pubsubtopic and protectorkey
  pubsubtopic: "ipfs-registry"
  # protectorkey is a 32-byte hex-encoded key. Change it!
  # not setting the protectorkey will run the registry on the public IPFS network.
  protectorkey: "e373e1babfa619e1544c88e0fd9e9f987d8f46443245274d21c6027f1477f45d"
  # comma-separated list of listen multiaddresses
  listenaddrs: /ip4/0.0.0.0/tcp/4619
  loglevel: info
  datastorepath: /var/lib/registry
  # The comma-separated multiaddresses of different ipfs-registries.
  # Needed specially when using a protectorkey.
  bootstrappers: "/ip4/x.x.x.x/tcp/4619/p2p/12D3KooWBfXWVX7nNUA1Sm64qEdCxGWQMM41FNE8fUXymRQQ7EZ4"
  # Private key will be autogenerated if not set
  privatekey: "base64-encoded ipfs private key"

The driver will print INFO log messages on startup detailing the peer
multiaddresses and other information.

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "feat/ipfs-storage-driver" git@github.com:hsanjuan/distribution.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@hsanjuan
Copy link
Author

hsanjuan commented Apr 20, 2019

Please sign your commits following these rules:

Commit IS signed. Edit: cryptographically signed. Added the text sign-off now.

I have not found a way to easily vendor all the new dependencies. Is there a way to auto-update this vendor.conf file? I tried vndr and trash but they don't seem to pick up the dependencies from the new module They also make a bunch of changes to existing vendored deps which means they don't match the vendor.conf. In any case, doing it manually would take longer than it took to write this driver. Is there a chance that #2768 can land anytime soon?

This adds an IPFS storage driver which stores and makes the registry objects
available on IPFS. Said otherwise, it enables a swarm of docker registries to
share an underlying, content-addressed storage layer out of the box.

Running `docker push ipfs-registry1/my-container` can be followed by `docker
pull ipfs-registry2/my-container`. `ipfs-registry2` will transparently fetch
the necessary objects from ipfs-registry1 (or from any other providers in the
network).

The driver runs an embedded IPFS-Lite daemon and is therefore ready-to-use
without additional external requirements (the ipfs daemon does NOT need to run
alongside).

---

It implements the filesystem by storing the [paths -> ipfs_object_identifier]
mappings in a replicated database using go-ds-crdt. Thus, when a registry
instance commits an object to the storage layer, every other registry instance
in the cluster will become aware of the new path/cid map (as long as the
pubsub messages arrive) and will be able to fetch the objects from the Network
when needed. Any registry instance willing to participate in the same cluster
will need to set the same value for the "pubsubtopic" parameter, to which all
registries subscribe.

The IPFS peer used for adding/getting/announcing is embedded in the driver
using IPFS-Lite, as are the libp2p host, DHT, pubsub and blockstore/datastore
facilities. Therefore, it is not required to run an additional IPFS daemon nor
there is any dependency between the regular IPFS daemon and this.

Each registry generates a private key for the libp2p host during the first run
and is then persisted to the datastore, though it can always be manually
provided via parameters. Instances may include custom bootstrappers and an
optional private network pre-shared key. In this case, the registry instances
will form an isolated IPFS private network. By default, instances will
bootstrap to the public bootstrappers and content made available in the public
IPFS network. In this case we disallow using the default pubsubtopic to avoid
accidents.

For reference, until driver documentation is created, it can be configured
this way (all keys are optional):

```
ipfs:
  # all registry instances should share pubsubtopic and protectorkey
  pubsubtopic: "ipfs-registry"
  # protectorkey is a 32-byte hex-encoded key. Change it!
  # not setting the protectorkey will run the registry on the public IPFS network.
  protectorkey: "e373e1babfa619e1544c88e0fd9e9f987d8f46443245274d21c6027f1477f45d"
  # comma-separated list of listen multiaddresses
  listenaddrs: /ip4/0.0.0.0/tcp/4619
  loglevel: info
  datastorepath: /var/lib/registry
  # The comma-separated multiaddresses of different ipfs-registries.
  # Needed specially when using a protectorkey.
  bootstrappers: "/ip4/x.x.x.x/tcp/4619/p2p/12D3KooWBfXWVX7nNUA1Sm64qEdCxGWQMM41FNE8fUXymRQQ7EZ4"
  # Private key will be autogenerated if not set
  privatekey: "base64-encoded ipfs private key"
```

The driver will print INFO log messages on startup detailing the peer
multiaddresses and other information.

Signed-off-by: Hector Sanjuan <code@hector.link>

Signed-off-by: Hector Sanjuan <code@hector.link>
@moritonal
Copy link

Hi @hsanjuan, I was going to ask. How does this handle who owns a name? Is it a first-come first-serve for container names?

@hsanjuan
Copy link
Author

Hi @hsanjuan, I was going to ask. How does this handle who owns a name? Is it a first-come first-serve for container names?

I don't think docker/distribution handles namespace ownership? If it does, then this is invisible to the storage driver. This driver makes multiple registries appear like they are reading-writing to the same filesystem. So what one writes is readily available in the other.

@moritonal
Copy link

To explain, this was a big issue I faced when I made something similar (although if I'm honest your implementation is better functionally). Basically, what happens if two people push an image called ipfs-registry1/busybox?

If ipfs-registry1 pushes it first, does it now own that repo or can others push new tags? How when I pull ipfs-registry2/busybox do I know if I'm getting the right one and not something malicious? I imagine this revolves around the protectorkey and privatekey, but an explanation of those would be helpfull for people who don't know things like IPFS Cluster.

@hsanjuan
Copy link
Author

I think you are mixing something like DockerHub (which has users and permissions around namespaces) and may or may not run docker/distribution underneath, and docker/distribution which is a registry implementation and includes no user-auth or namespace-owning features (that I'm aware of).

If you plugged the S3 storage driver to docker/distrubtion, you could launch multiple instances of it to push/pull images to/from S3. You don't have the notion of owned-namespaced as long as those registry instances are configured to use the right S3 bucket. Same happens here. If your registries are configured to use the right ipfs private network (with the right protector key), they will use ipfs as a storage backend like they would use S3.

IPFS Cluster is not involved here at all. protectorkey is a encryption key for the libp2p network which your ipfs nodes join. privatekey is the privatekey that each registry-ipfs-node has. This is equivalent to the swarm.key file in the IPFS Private Networks feature and the PrivateKey field in the IPFS configuration.

If you leave the protectorkey empty, then you are joining the public IPFS network. It is like if everyone joined a public S3 bucket to push/pull their images from. Everyone could overwrite whatever someone elses does.

The role of the storage driver is not to assign and control who writes what to which namespace, this is something probably built around the registry API at a much higher level. In that sense, a distributed docker/distribution made with IPFS is different from a distributed dockerHub, which would need distributed identity management for a start.

You got around that by making conventions about the image path including an specially configured DNS domain, which is pretty nice, but I'm not sure the storage driver is the right abstraction layer to handle this (I guess you need to look at the paths and figure out if you can/need to resolve the domain into a CID). The notion of namespace is not something the storage driver API is aware of.

So with your driver docker pull localhost:5000/some.domain/image/name/etc:tag contacts the peer behind some.domain to figure the hash for the path. My driver instead checks if someone in the network can provide the hash associated to /some.domain/image/name/etc:tag without needing to specify an owner. In your driver, you get namespace control, the peer behind some.domain needs to be up for the lookup to work or for the image to be updated. In my driver, as long as someone can provide the images it does not matter who. Also, anyone can write them.

They are essentially different approaches (I use a permissionless database coupled with pubsub to store and distribute <path>-><cid> mappings and you use DNS and authoritative peers for that). I get an out-of-the-box distributed registry (mostly useful for private networks). You get more like a distributed Docker-Hub with DNS-based auth.

I wonder if it would be possible to merge both approaches by validating insertions in my database are signed by the peers behind the domain in the path..

@MarkusTeufelberger
Copy link

Any updates on this?

@hsanjuan
Copy link
Author

hsanjuan commented Apr 3, 2020

@MarkusTeufelberger Last time I checked (months ago) I realized Docker was not interested in taking these contributions. They are only interested in taking things (usually bug fixes) they can backport to their enterprise/for-money distribution.

In the meantime, some other approaches popped up (https://github.com/miguelmota/ipdr was one).

I see that this project has finally switched to go.mod so I may upgrade this PR in a few days to make it usable with latest deps etc.

Base automatically changed from master to main January 27, 2021 15:51
@Jamstah
Copy link
Collaborator

Jamstah commented Sep 18, 2023

Currently not accepting new storage drivers. If you are maintaining a fork for IPFS support, you could consider adding it here: https://github.com/distribution/distribution/blob/main/docs/storage-drivers/index.md#driver-contribution

To join in discussion around storage drivers, see #3988.

@Jamstah Jamstah closed this Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants