-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Retry AttachNetwork when it fails to find network #27466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses) | ||
if err != nil { | ||
return nil, nil, err | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a retry loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrjana discussing this; should we have a limit on this loop? And a delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah The retry should only happen when the last container on a network is going away or disconnecting while at some short time window a new container is trying to connect to the same network. So even though in theory this might loop indefinitely in practice it should never retry more than once. Also we should probably not add a delay since the chance of getting into the another such time window probably increases with more delay so the best possible course is to retry immediately rather than waiting.
ping @mavenugo |
LGTM |
config, err = daemon.clusterProvider.AttachNetwork(idOrName, container.ID, addresses) | ||
if err != nil { | ||
return nil, nil, err | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrjana discussing this; should we have a limit on this loop? And a delay?
When trying to attach to swarm scope network for an unmanaged container sometimes even if attaching to network succeeds, we may not find the network because some other container which was using the network went down and removed the network. So if it is not found, try to detach and reattach to re-download the network from the manager. Fixes moby#26588 Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
@thaJeztah Added a retry limit. |
SGTM (not a maintainer) |
ping @thaJeztah |
// the process of attaching. | ||
if config != nil { | ||
if _, ok := err.(libnetwork.ErrNoSuchNetwork); ok { | ||
if retryCount >= 5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for doing this, instead of for i := 0; i < 5; i++ {
(or for retries := 0; retries < 5; retries++ {
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(On the outer for
loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @mrjana
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need to return a specific error if this failed because of exceeding number of retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hm, I'd have expected the error to be after the for loop (i.e., if no network was found after 5 attempts, generate, and return an error).
It's just a nit, so let's keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When trying to attach to swarm scope network for an unmanaged container
sometimes even if attaching to network succeeds, we may not find the
network because some other container which was using the network went
down and removed the network. So if it is not found, try to detach and
reattach to re-download the network from the manager.
Fixes #26588
Signed-off-by: Jana Radhakrishnan mrjana@docker.com