Skip to content

Conversation

bison
Copy link
Contributor

@bison bison commented Apr 8, 2021

The MemberRoll controller was changed in MAISTRA-2197 to not send the
initial update with just the system namespace to all listeners. This
can be a problem when the user hasn't created an SMMR resource yet,
because any listeners registered before the caches sync may never get
an update informing them to watch the system namespace.

This attempts to work around the issue by having all listeners
individually wait on the caches to sync, then seed just the system
namespace if no SMMR resource is found. Unfortunately, it's possible
for this to race a user creating an SMMR, in which case the state may
be inconsistent until the next update or full sync.

@bison
Copy link
Contributor Author

bison commented Apr 8, 2021

I am not 100% convinced we should do this for the 2.0 branch. The issue on 2.1 is that changes in istiod mean this bug prevents things from starting at all in some cases. In my testing on the latest 2.0 builds, that is not the case, regardless of whether you've created the initial SMMR or not.

I do think this has the potential to introduce a race between listeners seeing the cache has synced and looking for the SMMR, and one being created. I think it's possible, but maybe unlikely, that we then send add events out-of-order with this initial "seed" event. If that's the case, is it worth introducing that if we haven't seen this cause any actual issues on this branch?

Copy link

@knrc knrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race can be handled in a similar way to how @jwendell handled things in #312, could you please make similar changes to this code?

Copy link
Contributor

@dgn dgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding a mutex around the cacheWarmed variable will be sufficient here. IMHO we should not fix a problem nobody has (ie 'no SMMR present causes strange behaviour') if it adds complexity - if a customer runs into an issue here we can still tackle it then

@knrc
Copy link

knrc commented Apr 12, 2021

I think just adding a mutex around the cacheWarmed variable will be sufficient here. IMHO we should not fix a problem nobody has (ie 'no SMMR present causes strange behaviour') if it adds complexity - if a customer runs into an issue here we can still tackle it then

We know the problem exists, and while we may not currently be sure it has any impact it is much better to fix it now and not have it impact any customers.

The MemberRoll controller was changed in MAISTRA-2197 to not send the
initial update with just the system namespace to all listeners.  This
can be a problem when the user hasn't created an SMMR resource yet,
because any listeners registered before the caches sync may never get
an update informing them to watch the system namespace.

This attempts to work around the issue by having all listeners
individually wait on the caches to sync, then seed just the system
namespace if no SMMR resource is found.
@bison
Copy link
Contributor Author

bison commented Apr 12, 2021

Updated to use a channel in a similar way to #312 to be sure we don't process events from the informer until we've sent the initial update when necessary.

@dgn
Copy link
Contributor

dgn commented Apr 12, 2021

We know the problem exists, and while we may not currently be sure it has any impact it is much better to fix it now and not have it impact any customers.

Alright, that's fine with me, too

@maistra-bot maistra-bot merged commit 5ff6940 into maistra:maistra-2.0 Apr 12, 2021
@bison bison deleted the MAISTRA-2234 branch April 13, 2021 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants