Retry AttachNetwork when it fails to find network #27466

mrjana · 2016-10-17T21:40:22Z

When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.

Fixes #26588

Signed-off-by: Jana Radhakrishnan mrjana@docker.com

xiaods · 2016-10-18T01:48:43Z

daemon/container_operations.go

-		config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses)
-		if err != nil {
-			return nil, nil, err
+	for {


It's a retry loop

@mrjana discussing this; should we have a limit on this loop? And a delay?

@thaJeztah The retry should only happen when the last container on a network is going away or disconnecting while at some short time window a new container is trying to connect to the same network. So even though in theory this might loop indefinitely in practice it should never retry more than once. Also we should probably not add a delay since the chance of getting into the another such time window probably increases with more delay so the best possible course is to retry immediately rather than waiting.

mrjana · 2016-10-24T23:53:34Z

ping @mavenugo

mavenugo · 2016-10-25T13:04:46Z

LGTM

thaJeztah · 2016-10-26T21:05:23Z

daemon/container_operations.go

-		config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses)
-		if err != nil {
-			return nil, nil, err
+	for {


@mrjana discussing this; should we have a limit on this loop? And a delay?

When trying to attach to swarm scope network for an unmanaged container sometimes even if attaching to network succeeds, we may not find the network because some other container which was using the network went down and removed the network. So if it is not found, try to detach and reattach to re-download the network from the manager. Fixes moby#26588 Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>

mrjana · 2016-11-02T04:47:03Z

@thaJeztah Added a retry limit.

dongluochen · 2016-11-03T19:15:10Z

SGTM (not a maintainer)

mrjana · 2016-11-03T20:30:46Z

ping @thaJeztah

thaJeztah · 2016-11-04T17:11:25Z

daemon/container_operations.go

+			// the process of attaching.
+			if config != nil {
+				if _, ok := err.(libnetwork.ErrNoSuchNetwork); ok {
+					if retryCount >= 5 {


Any reason for doing this, instead of for i := 0; i < 5; i++ { (or for retries := 0; retries < 5; retries++ {)?

(On the outer for loop)

same question

ping @mrjana

Because I need to return a specific error if this failed because of exceeding number of retries.

Ah, hm, I'd have expected the error to be after the for loop (i.e., if no network was found after 5 attempts, generate, and return an error).

It's just a nit, so let's keep it

thaJeztah

LGTM

GordonTheTurtle added the status/0-triage label Oct 17, 2016

xiaods reviewed Oct 18, 2016

View reviewed changes

mrjana force-pushed the net branch from 019d814 to 98d950d Compare October 18, 2016 04:36

thaJeztah added status/2-code-review and removed status/0-triage labels Oct 18, 2016

mavenugo added this to the 1.13.0 milestone Oct 25, 2016

mavenugo approved these changes Oct 25, 2016

View reviewed changes

mrjana force-pushed the net branch from 98d950d to 0e68a9c Compare October 25, 2016 18:01

thaJeztah requested changes Oct 26, 2016

View reviewed changes

thaJeztah mentioned this pull request Oct 28, 2016

Overlay network X not found | Swarm 1.12.2 #27513

Closed

mrjana force-pushed the net branch from 10f2276 to a017f32 Compare October 31, 2016 22:07

mrjana force-pushed the net branch from a017f32 to 849e345 Compare November 2, 2016 04:46

thaJeztah reviewed Nov 4, 2016

View reviewed changes

thaJeztah approved these changes Nov 8, 2016

View reviewed changes

thaJeztah merged commit 9a61bd0 into moby:master Nov 8, 2016

thaJeztah mentioned this pull request Dec 9, 2016

When a service is created with a network attached to it, it's not found by the other nodes #29273

Closed

thaJeztah mentioned this pull request Jun 27, 2019

[epic] flaky tests #37306

Open

thaJeztah mentioned this pull request Sep 17, 2023

daemon: Improve NetworkingConfig & EndpointSettings validation #46183

Merged

Retry AttachNetwork when it fails to find network #27466

Retry AttachNetwork when it fails to find network #27466

Uh oh!

Conversation

mrjana commented Oct 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrjana Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrjana commented Oct 24, 2016

Uh oh!

mavenugo commented Oct 25, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrjana commented Nov 2, 2016

Uh oh!

dongluochen commented Nov 3, 2016

Uh oh!

mrjana commented Nov 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mrjana Oct 31, 2016 •

edited

Loading