Skip to content

Conversation

aluzzardi
Copy link
Member

Forced IDs must be in the exact same format as the ones generated
automatically and be unique (as in, not already in use).

Signed-off-by: Andrea Luzzardi aluzzardi@gmail.com

Forced IDs must be in the exact same format as the ones generated
automatically and be unique (as in, not already in use).

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
@aluzzardi
Copy link
Member Author

/cc @vieux @ibuildthecloud

@vieux
Copy link
Contributor

vieux commented Dec 30, 2014

LGTM

@shykes
Copy link
Contributor

shykes commented Dec 30, 2014

That's a pretty big change with potentially lots of implications. Without a compelling reason to make the change I am compelled to say no. What's the rationale?

@jessfraz
Copy link
Contributor

We implement this and it is not long before someone makes a PR for --id to docker create...
but despite that (because we can't predict the future)
How can we trust users to make unique id's themselves without messing things up, we implement this and people will use it

@shykes
Copy link
Contributor

shykes commented Dec 30, 2014

I agree with Jess, however I suspect Andrea and Victor have a reasonable
requirement in mind, let's talk about that requirement then we can take
another look at the best solution.

On Tuesday, December 30, 2014, Jessie Frazelle notifications@github.com
wrote:

We implement this and it is not long before someone makes a PR for --id
to docker create...
but despite that (because we can't predict the future)
How can we trust users to make unique id's themselves without messing
things up, we implement this and people will use it


Reply to this email directly or view it on GitHub
#9854 (comment).

@ewindisch
Copy link
Contributor

It seems the worst that will happen is that users will get "Container ID is already in use" errors.

ValidateID is somewhat incomplete for container IDs, however. The GenerateRandomID function also excludes all-numeric IDs.

If these IDs are supposed to be considered global, we might want to consider that the all-numeric exclusion removes a significant chunk of otherwise valid IDs before we even consider the birthday problem.

@shykes
Copy link
Contributor

shykes commented Dec 30, 2014

Eric, no need to dive too far into possible problems (and remediation)
until we establish a solid reason to even consider it.

On Tue, Dec 30, 2014 at 1:17 PM, Eric Windisch notifications@github.com
wrote:

It seems the worst that will happen is that users will get "Container ID
is already in use" errors.

ValidateID is somewhat incomplete for container IDs, however. The
GenerateRandomID function also excludes all-numeric IDs.

If these IDs are supposed to be considered global, we might want to
consider that the all-numeric exclusion removes a significant chunk of
otherwise valid IDs before we even consider the birthday problem.


Reply to this email directly or view it on GitHub
#9854 (comment).

@vieux
Copy link
Contributor

vieux commented Dec 30, 2014

Our use case would be swarm, to be able to have the same IDs through swarm and when you go directly on a machine, also when we reschedule a container, the ID stays the same, without us having to maintain a mapping.

@ibuildthecloud seems also very interested in this feature for his project(s)

@shykes
Copy link
Contributor

shykes commented Dec 30, 2014

We already discussed this and I already gave you my opinion: a "virtual container" in the swarm is not the same thing as the actual container mapped to it at any given time. They are fundamentally different objects, and they should have different IDs. Otherwise we expose ourselves to all sorts of edge cases and naming conflicts.

There will always be a mapping somewhere. It's not a good idea to sweep it under the rug. And I really want to keep a solid foundation of globally unique objects with an immutable ID. So I'm not supportive of this change, sorry.

@ibuildthecloud
Copy link
Contributor

@shykes This PR doesn't have to be the solution, but I'll explain the issue we need solved. Here's the basic problem (@shykes, I know you've heard this before, but for everyone else...). Imagine you have a higher level system managing your containers (swarm being the perfect example). So I create a virtual container A in swarm and when I start the container on a host it is container Y. Then the host dies and I start virtual container A on a different host as container Z. I need to know that Y and Z are containers for the virtual A. So a mapping needs to be kept somewhere. There are two issues with keeping this mapping outside of Docker, the first technical, the second a usability concern.

The first issue is idempotency. Imagine I start virtual container A on the host and it creates container Y but right after the container starts my code/agent/whatever dies before it can record Y (or you have a networking partition, etc). Then I want to cleanup or redo the action. If I look at the host and I have container Y running there, I have no way to know that that container is really A or is a container manually started by someone else. In order to make the start container operation idempotent, the easiest way (and the sanest that I know), is to set virtual container ID "A" on the container during create (in some fashion, Rancher currently abused the container name - not a good solution). That way if something fails you can later list the containers and see that the newly created X was in fact for virtual container A.

The second issue is that if virtual container A is running on a host and a user logs in and does a ps, how do they know that container X is virtual container A. So it would be nice if the user could easily correlate the two.

So there are a couple solutions here.

  1. Allow the caller to set the container ID. This is by far the simplest approach, but I understand the "don't f*#$ with my IDs" argument.
  2. Add another specific field like UUID. This is how libvirt and kvm do it. KVM has an argument called -uuid that it doesn't really care about but is there just so that an external system can assign the ID for tracking.
  3. Add arbitrary meta data to containers. I can then set something like docker run -m VIRTUAL_CONTIANER_ID=A.

One important requirement is that if set the virtual container ID, I need to be able to efficiently look up the container by that ID. I can't do a docker ps and then inspect on each one. This is also why setting the container ID is the simplest (but maybe most dangerous) solution.

@shykes
Copy link
Contributor

shykes commented Dec 31, 2014

I think the sanest approach is option 3: arbitrary annotations, namespaced by calling application. It is much more future-proof than allowing the external caller to set the actual ID.

@tianon
Copy link
Member

tianon commented Dec 31, 2014

In option 1, what would happen if I clustered two machines together with
the same IDs on several unrelated containers? :)

@ibuildthecloud
Copy link
Contributor

@tianon @shykes Personally I like option 3. 1 and 2 could just be possibly easier to implement and don't require long UI discussions :)

@ibuildthecloud
Copy link
Contributor

If we go with 3, the important thing I'm trying to point out is that there's a real requirement for this. It's hard to sanely track Docker container from an external system like Swarm or Rancher. (unless you assume the management system is the sole owner of the box, but that is no fun...)

@shykes
Copy link
Contributor

shykes commented Dec 31, 2014

@ibuildthecloud I completely agree, option 3 is something we need for many reasons. We simply need a straightforward way to annotate every object in Docker, with simple namespacing so that different callers can annotate the same object without conflicts.

@phemmer
Copy link
Contributor

phemmer commented Dec 31, 2014

Throwing in my vote for option 3 as well. 1 & 2 just feel wrong. I can think of several situations where they could cause issues.
Option 3 is actually the first thing that popped into my mind reading ibuildthecloud's use case, and I think it has numerous other benefits as well. One use case is that we're considering writing our own scheduler for swarm, and this arbitrary metadata would help us so that the scheduler can determine where to launch the container (for things like "this container needs a host with a GPU")

@bfirsh
Copy link
Contributor

bfirsh commented Dec 31, 2014

👍 Arbitrary metadata. We desperately need that anyway.

Here's a proposal, though I don't like the word "annotate". I might put together my own proposal that aggregates the dozens of issues that already exist about this.

@thaJeztah
Copy link
Member

Fwiw; +1 on option 3. Meta-data keeps coming up in various issues.

@bfirsh; I collected some issues the other day here; #9841 (comment) might save you some time collecting

@aluzzardi
Copy link
Member Author

@shykes

This is the same concept as "Virtual IDs" except that they are physically mapped back to the container.

After using Virtual IDs ourselves without mapping, we found out that the user experience is really bad. Users will always end up accessing single nodes for various reasons, and the mapping just makes it a a major hassle to do so.

This change does not alter the principle of globally unique objects with an immutable ID.

@ibuildthecloud
Copy link
Contributor

@aluzzardi Most people seem to be on board with the idea of option 3. Do you think that is doable? So Swarm would have container id 42 and you would essentially docker run --label io.docker.swarm=id:42 ... which would create container xyz. So the user would then do a docker ps and see that container xyz has the swarm id of 42.

@ibuildthecloud
Copy link
Contributor

Sorry, I couldn't help myself but create yet another meta data PR. #9882 If you think this is not helpful, I'll close it. We just need to move forward....

@aluzzardi
Copy link
Member Author

@ibuildthecloud It would technically work, but usability wise it would be inconvenient.

For instance, running a docker ps on a node would yield "garbage" IDs (as in, not directly usable).

@tianon
Copy link
Member

tianon commented Jan 5, 2015

ie, something like docker ps --filter swarm=some-swarm-id ? or docker ps --filter swarm-id=some-specific-swarm-container-id ?

@thaJeztah
Copy link
Member

@tianon I think Andrea means that the ID that is shown in docker ps is not the id that swarm uses to identify the container, which makes it confusing.

So something like docker ps --show-labels=swarm-id to make docker ps output a custom column containing the value of the swarm-id label for each container.

@aluzzardi
Copy link
Member Author

@thaJeztah Indeed! I'm simply worried about the user experience and not ending up with a "monster" implementation to support this.

Although I agree that listing labels in docker ps might be useful for other use cases and may be a workable solution.

Whether it's Swarm or other clustering solutions, users will always end up SSH'ing to nodes directly (very common for small deployments, but it will also happen on larger deployments for debugging purposes). I want to make sure we fully support that use case and don't end up with a central black box.

@jessfraz
Copy link
Contributor

jessfraz commented Jan 6, 2015

So in favor of cleaning house and pushing everything to the edge we are going to close this for #9882, because the discussion has lead to that being a better route.

@jessfraz jessfraz closed this Jan 6, 2015
@aluzzardi aluzzardi deleted the api-specify-id branch May 7, 2015 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.