-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ipcache/metadata: flatten metadata once on read #40274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
df92537
to
46d3f4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few nits in first pass; but this feels rather non-trivial for how late in the cycle we are and I don't feel confident in being the only @cilium/ipcache eyes on it; so if someone else form the team could take a look 🙏
You also mention a race; could you spell that out a bit more explicitly for me so I can retrace it and understand how it's fixed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of this PR, I had considered something like this previously recently so nice to see it happening now. I have a few comments on top of David's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One locking question, the rest are nits.
Main question I'm wondering about beyond this PR is just whether to hold the v1.18.0-rc.0 until we merge this or not. This could provide more time for users to try out this new codepath prior to the v1.18.0 final release. Thoughts?
@joestringer Good question. The answer isn't clear. How many -rc releases were you hoping to cut? I would definitely like to either see this in the release, or revert the entire FQDN pre-allocation code. The race here is a ticking time bomb. I think merging this will be cleaner than reverting all of the FQDN changes. So, I suppose I have a weak vote for "delay rc.0 for a day or two". I'll have an updated version addressing feedback in a few hours. |
@squeed 👍 Makes sense to me yep. We're planning two RCs, two weeks apart, with the full release in about a month's time. The full milestones list is pinned in the discussions tab. |
46d3f4b
to
dc550a8
Compare
I updated the description a bit, but the race was thus:
|
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest version LGTM. One minor nit about not adding a TODO
comment.
I dug through the users of the Lock and I agree that the RWMutex seems unnecessary and overoptimizing given the current ipcache usage patterns. Switching to a Mutex looks unlikely to cause a regression as far as I can tell.
dc550a8
to
673ab63
Compare
Whoops. Changed the TODO to a short note. CI was green with zero flakes :-). Let's see if it stays that way. |
This wraps the prefixInfo type, which was previously a map, in a struct. Future commits will add more members. Adjust some of the functions to be pointer receivers (since the preivous type, a map, was implicitly a pointer). Signed-off-by: Casey Callendrello <cdc@isovalent.com>
The ipcache aggregated metadata for a given prefix by resource. When determining the ultimate metadata, it loops through all the resource-level metadata to determine the value for each property. This changes the code to flatten all the metadata once. It serves two purposes: 1. It fixes a race condition, where we performed metadata reads without holding the ipcache metadata lock. Since the values were updated in-place, this caused races. Specifically, if a resource was deleted for a prefix while injection was happening, this caused nil panics. 2. It prevents needless prefix updates when multiple resources all provide the same information. This is particularly prevalent for CIDR labels. As a further optimization, if a given metadata update would not change the flattened metadata, then we short-cut the prefix update and save an ipcache iteration. Fixes: cilium#40134 Fixes: 495164a Signed-off-by: Casey Callendrello <cdc@isovalent.com>
673ab63
to
83db0d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM now!
/test |
CI is green :). Just a review needed for logfields changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack on behalf of @cilium/cli for the logfields change. Didn't review the rest in-depth.
I'll let @bimmlerd give this another pass, then this should be good. |
The ipcache aggregates metadata for a given prefix by resource. When
determining the ultimate metadata, it loops through all the
resource-level metadata to determine the value for each property.
This changes the code to flatten all the metadata once. It serves two
purposes:
without holding the ipcache metadata lock. Since the values were
updated in-place, this caused races. Specifically, if a resource was
deleted for a prefix while injection was happening, this caused
nil panics.
provide the same information. This is particularly prevalent for CIDR
labels.
As a further optimization, if a given metadata update would not change
the flattened metadata, then we short-cut the prefix update and save an
ipcache iteration.
This includes another commit which fixes an awkward usage of the metadata data that interacted poorly with this change.
Fixes: #40134
Fixes: 495164a