Skip to content

Conversation

mrjana
Copy link
Contributor

@mrjana mrjana commented Sep 9, 2016

With swarm scope network connected containers with autostart enabled
there was a dependency problem with the cluster to be initialized before
we can autostart them. With the current container restart code happening
before cluster init, these containers were not getting autostarted
properly. Added a fix to delay the container start of those containers
which has atleast one swarm scope endpoint to until after the cluster is
initialized.

Signed-off-by: Jana Radhakrishnan mrjana@docker.com

@mavenugo
Copy link
Contributor

@mrjana I think there is a timing issue between the overlay cleanup and container restart. when I have multiple containers with --restart=always, they fail to come up due to the error : ERRO[0019] subnet sandbox join failed for "10.0.0.0/24": overlay subnet 10.0.0.0/24 has conflicts in the host while running in host mode

@mavenugo
Copy link
Contributor

@mrjana i stand corrected. the above issue was not due to this PR. I had some stale bridges in my setup that resulted in this conflicting subnet in kernel < 3.16 (host-mode). With those stale bridges removed and using moby/libnetwork#1442, this works fine.

@mavenugo
Copy link
Contributor

LGTM

1 similar comment
@vikstrous
Copy link
Contributor

LGTM

@cpuguy83
Copy link
Member

Side note, noticed that when you disconnect a container from a swarm network you get an error in the daemon logs: ERRO[0099] task unavailable method=(*Dispatcher).processUpdates module=dispatcher task.id=b6fv33i39ahpjjex083752804

@cpuguy83
Copy link
Member

Can you add a test case for swarm daemon restart w/ attached container + autorestart?

@mrjana
Copy link
Contributor Author

mrjana commented Sep 13, 2016

Side note, noticed that when you disconnect a container from a swarm network you get an error in the daemon logs: ERRO[0099] task unavailable method=(*Dispatcher).processUpdates module=dispatcher task.id=b6fv33i39ahpjjex083752804

That's because the task is removed in the manager before the the dispatcher is processing an update about the task completing from the agent.

@mrjana
Copy link
Contributor Author

mrjana commented Sep 13, 2016

Will add a test case

The swarm scope network connected containers with autostart enabled
there was a dependency problem with the cluster to be initialized before
we can autostart them. With the current container restart code happening
before cluster init, these containers were not getting autostarted
properly. Added a fix to delay the container start of those containers
which has atleast one swarm scope endpoint to until after the cluster is
initialized.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
@mrjana
Copy link
Contributor Author

mrjana commented Sep 13, 2016

@cpuguy83 Added a test case


out, err = d.Cmd("ps", "-q")
c.Assert(err, checker.IsNil)
c.Assert(strings.TrimSpace(out), checker.Not(checker.Equals), "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it doesn't make a big difference. But it will be great if you can get the container-id prior to restart and then compare it here. that makes it more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one container running on this daemon. So I don't think that is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cpuguy83
Copy link
Member

LGTM

@mavenugo
Copy link
Contributor

Re-LGTM.

win2lin failure is unrelated. merging.

@mavenugo mavenugo merged commit 1d76ab4 into moby:master Sep 14, 2016
@thaJeztah thaJeztah added this to the 1.13.0 milestone Sep 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants