-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Storage Driver: Add an IPFS driver #2906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please sign your commits following these rules: $ git clone -b "feat/ipfs-storage-driver" git@github.com:hsanjuan/distribution.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f Amending updates the existing PR. You DO NOT need to open a new one. |
Commit IS signed. Edit: cryptographically signed. Added the text sign-off now. I have not found a way to easily vendor all the new dependencies. Is there a way to auto-update this |
This adds an IPFS storage driver which stores and makes the registry objects available on IPFS. Said otherwise, it enables a swarm of docker registries to share an underlying, content-addressed storage layer out of the box. Running `docker push ipfs-registry1/my-container` can be followed by `docker pull ipfs-registry2/my-container`. `ipfs-registry2` will transparently fetch the necessary objects from ipfs-registry1 (or from any other providers in the network). The driver runs an embedded IPFS-Lite daemon and is therefore ready-to-use without additional external requirements (the ipfs daemon does NOT need to run alongside). --- It implements the filesystem by storing the [paths -> ipfs_object_identifier] mappings in a replicated database using go-ds-crdt. Thus, when a registry instance commits an object to the storage layer, every other registry instance in the cluster will become aware of the new path/cid map (as long as the pubsub messages arrive) and will be able to fetch the objects from the Network when needed. Any registry instance willing to participate in the same cluster will need to set the same value for the "pubsubtopic" parameter, to which all registries subscribe. The IPFS peer used for adding/getting/announcing is embedded in the driver using IPFS-Lite, as are the libp2p host, DHT, pubsub and blockstore/datastore facilities. Therefore, it is not required to run an additional IPFS daemon nor there is any dependency between the regular IPFS daemon and this. Each registry generates a private key for the libp2p host during the first run and is then persisted to the datastore, though it can always be manually provided via parameters. Instances may include custom bootstrappers and an optional private network pre-shared key. In this case, the registry instances will form an isolated IPFS private network. By default, instances will bootstrap to the public bootstrappers and content made available in the public IPFS network. In this case we disallow using the default pubsubtopic to avoid accidents. For reference, until driver documentation is created, it can be configured this way (all keys are optional): ``` ipfs: # all registry instances should share pubsubtopic and protectorkey pubsubtopic: "ipfs-registry" # protectorkey is a 32-byte hex-encoded key. Change it! # not setting the protectorkey will run the registry on the public IPFS network. protectorkey: "e373e1babfa619e1544c88e0fd9e9f987d8f46443245274d21c6027f1477f45d" # comma-separated list of listen multiaddresses listenaddrs: /ip4/0.0.0.0/tcp/4619 loglevel: info datastorepath: /var/lib/registry # The comma-separated multiaddresses of different ipfs-registries. # Needed specially when using a protectorkey. bootstrappers: "/ip4/x.x.x.x/tcp/4619/p2p/12D3KooWBfXWVX7nNUA1Sm64qEdCxGWQMM41FNE8fUXymRQQ7EZ4" # Private key will be autogenerated if not set privatekey: "base64-encoded ipfs private key" ``` The driver will print INFO log messages on startup detailing the peer multiaddresses and other information. Signed-off-by: Hector Sanjuan <code@hector.link> Signed-off-by: Hector Sanjuan <code@hector.link>
18e537b
to
4502049
Compare
Hi @hsanjuan, I was going to ask. How does this handle who owns a name? Is it a first-come first-serve for container names? |
I don't think docker/distribution handles namespace ownership? If it does, then this is invisible to the storage driver. This driver makes multiple registries appear like they are reading-writing to the same filesystem. So what one writes is readily available in the other. |
To explain, this was a big issue I faced when I made something similar (although if I'm honest your implementation is better functionally). Basically, what happens if two people push an image called If |
I think you are mixing something like DockerHub (which has users and permissions around namespaces) and may or may not run docker/distribution underneath, and docker/distribution which is a registry implementation and includes no user-auth or namespace-owning features (that I'm aware of). If you plugged the S3 storage driver to docker/distrubtion, you could launch multiple instances of it to push/pull images to/from S3. You don't have the notion of owned-namespaced as long as those registry instances are configured to use the right S3 bucket. Same happens here. If your registries are configured to use the right ipfs private network (with the right protector key), they will use ipfs as a storage backend like they would use S3. IPFS Cluster is not involved here at all. If you leave the The role of the storage driver is not to assign and control who writes what to which namespace, this is something probably built around the registry API at a much higher level. In that sense, a distributed docker/distribution made with IPFS is different from a distributed dockerHub, which would need distributed identity management for a start. You got around that by making conventions about the image path including an specially configured DNS domain, which is pretty nice, but I'm not sure the storage driver is the right abstraction layer to handle this (I guess you need to look at the paths and figure out if you can/need to resolve the domain into a CID). The notion of namespace is not something the storage driver API is aware of. So with your driver They are essentially different approaches (I use a permissionless database coupled with pubsub to store and distribute I wonder if it would be possible to merge both approaches by validating insertions in my database are signed by the peers behind the domain in the path.. |
Any updates on this? |
@MarkusTeufelberger Last time I checked (months ago) I realized Docker was not interested in taking these contributions. They are only interested in taking things (usually bug fixes) they can backport to their enterprise/for-money distribution. In the meantime, some other approaches popped up (https://github.com/miguelmota/ipdr was one). I see that this project has finally switched to go.mod so I may upgrade this PR in a few days to make it usable with latest deps etc. |
Currently not accepting new storage drivers. If you are maintaining a fork for IPFS support, you could consider adding it here: https://github.com/distribution/distribution/blob/main/docs/storage-drivers/index.md#driver-contribution To join in discussion around storage drivers, see #3988. |
This adds an IPFS storage driver which stores and makes the registry objects
available on IPFS. Said otherwise, it enables a swarm of docker registries to
share an underlying, content-addressed storage layer out of the box.
Running
docker push ipfs-registry1/my-container
can be followed bydocker pull ipfs-registry2/my-container
.ipfs-registry2
will transparently fetchthe necessary objects from ipfs-registry1 (or from any other providers in the
network).
The driver runs an embedded IPFS-Lite daemon and is therefore ready-to-use
without additional external requirements (the ipfs daemon does NOT need to run
alongside).
It implements the filesystem by storing the [paths -> ipfs_object_identifier]
mappings in a replicated database using go-ds-crdt. Thus, when a registry
instance commits an object to the storage layer, every other registry instance
in the cluster will become aware of the new path/cid map (as long as the
pubsub messages arrive) and will be able to fetch the objects from the Network
when needed. Any registry instance willing to participate in the same cluster
will need to set the same value for the "pubsubtopic" parameter, to which all
registries subscribe.
The IPFS peer used for adding/getting/announcing is embedded in the driver
using IPFS-Lite, as are the libp2p host, DHT, pubsub and blockstore/datastore
facilities. Therefore, it is not required to run an additional IPFS daemon nor
there is any dependency between the regular IPFS daemon and this.
Each registry generates a private key for the libp2p host during the first run
and is then persisted to the datastore, though it can always be manually
provided via parameters. Instances may include custom bootstrappers and an
optional private network pre-shared key. In this case, the registry instances
will form an isolated IPFS private network. By default, instances will
bootstrap to the public bootstrappers and content made available in the public
IPFS network. In this case we disallow using the default pubsubtopic to avoid
accidents.
For reference, until driver documentation is created, it can be configured
this way (all keys are optional):
The driver will print INFO log messages on startup detailing the peer
multiaddresses and other information.