Skip to content

Conversation

mrjana
Copy link
Contributor

@mrjana mrjana commented Oct 17, 2016

When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.

Fixes #26588

Signed-off-by: Jana Radhakrishnan mrjana@docker.com

config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses)
if err != nil {
return nil, nil, err
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a retry loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrjana discussing this; should we have a limit on this loop? And a delay?

Copy link
Contributor Author

@mrjana mrjana Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah The retry should only happen when the last container on a network is going away or disconnecting while at some short time window a new container is trying to connect to the same network. So even though in theory this might loop indefinitely in practice it should never retry more than once. Also we should probably not add a delay since the chance of getting into the another such time window probably increases with more delay so the best possible course is to retry immediately rather than waiting.

@mrjana
Copy link
Contributor Author

mrjana commented Oct 24, 2016

ping @mavenugo

@mavenugo
Copy link
Contributor

LGTM

@mavenugo mavenugo added this to the 1.13.0 milestone Oct 25, 2016
config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses)
if err != nil {
return nil, nil, err
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrjana discussing this; should we have a limit on this loop? And a delay?

When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.

Fixes moby#26588

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
@mrjana
Copy link
Contributor Author

mrjana commented Nov 2, 2016

@thaJeztah Added a retry limit.

@dongluochen
Copy link
Contributor

SGTM (not a maintainer)

@mrjana
Copy link
Contributor Author

mrjana commented Nov 3, 2016

ping @thaJeztah

// the process of attaching.
if config != nil {
if _, ok := err.(libnetwork.ErrNoSuchNetwork); ok {
if retryCount >= 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for doing this, instead of for i := 0; i < 5; i++ { (or for retries := 0; retries < 5; retries++ {)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(On the outer for loop)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @mrjana

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to return a specific error if this failed because of exceeding number of retries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hm, I'd have expected the error to be after the for loop (i.e., if no network was found after 5 attempts, generate, and return an error).

It's just a nit, so let's keep it

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants