-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Keep the refuseUid in a better way, #22156 #23617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test FAILed. |
What's the consequence of the bug fix? It seems the bug lead to other nodes quarantining a just restarted node so that it could impact e.g. rolling upgrades. Would this now be fixed? |
var uidConfirmed: Boolean = uid.isDefined && (uid != refuseUid) | ||
|
||
if (uid.isDefined && (uid == refuseUid)) | ||
throw new HopelessAssociation(localAddress, remoteAddress, uid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, did not realise we have that type. Makes sense
@jrudolph The bug was more about that it could talk to (accept Ack message) from a node that was already quarantined. If it would have been restarted the new incarnation would have a different uid and then it is ok to talk to that address (and sys msg buffers would not be out of sync). |
Mmh, the original ticket said
So, I wonder if that scenario is actually fixed? Or, was the description inaccurate in the first place? Or, did the error message fixed here just correlate with another problem we haven't yet understood? AFAIR from our other internal reports about this issue there were real functional problems observed where one or more nodes could not talk to one that was recently restarted. |
@jrudolph ah, I didn't read the original issue much, I focused on the scenario that we had a reproducer for, that I described here: #22156 (comment) We don't have much to go on for the original description so I don't think we will be able to reproduce that if it really involved restart. I suggest that we rename the ticket, and create a a new one if that original scenario is reported again. ok? |
Test PASSed. |
Yes, makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (let's see if the original issue will be reported again and under which circumstances)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, not 100% sure I understand the consequences, but from the description the changes make sense to me.
* The scenario described in the issue can cause the quarantine marker to be lost when creating a new endpoint for that address. Then when later creating another endpoint from an inbound connection the uid is considered confirmed and Ack message is accepted, triggering the unexpected seq number issue. * The refuseUid was kept in the endpoint policy markers, but that is just very complicated and as illustrated by this issue not always safe. * Instead, keep the refuseUid separately so it's not lost when registering new endpoint. * The purpose of WasGated was only to try to keep the refuseUid (as far as I know), and that is not needed any longer. mima filter
edef0aa
to
d660f89
Compare
good, I have rebased and will merge after validation |
Test PASSed. |
* completed port of Properly_quarantine_stashed_inbound_connections * porting akka/akka#23617 * modified AssociationHandle to allow explicit DEBUG logging of disassociation reasons * final pass of EndpointManager updates from akka/akka#23617 * fixed issue with EndpointRegistry quarantine management * updated the reliable delivery supervisor * fixed bug when handling inbound associations * cleaning up AkkaProtocolState actors * porting over OutboundConnection timeout spec * ported InboundTimeout spec * approved new Akka.Remote API additions
* completed port of Properly_quarantine_stashed_inbound_connections * porting akka/akka#23617 * modified AssociationHandle to allow explicit DEBUG logging of disassociation reasons * final pass of EndpointManager updates from akka/akka#23617 * fixed issue with EndpointRegistry quarantine management * updated the reliable delivery supervisor * fixed bug when handling inbound associations * cleaning up AkkaProtocolState actors * porting over OutboundConnection timeout spec * ported InboundTimeout spec * approved new Akka.Remote API additions
be lost when creating a new endpoint for that address. Then when later
creating another endpoint from an inbound connection the uid is considered
confirmed and Ack message is accepted, triggering the unexpected seq number
issue.
complicated and as illustrated by this issue not always safe.
new endpoint.
and that is not needed any longer.
Refs #22156