-
Notifications
You must be signed in to change notification settings - Fork 274
Add more robust CRD conversion patching #5300
Add more robust CRD conversion patching #5300
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #5300 +/- ##
==========================================
- Coverage 69.53% 69.52% -0.02%
==========================================
Files 197 197
Lines 16070 16071 +1
==========================================
- Hits 11174 11173 -1
- Misses 4839 4841 +2
Partials 57 57
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Can we also set rollout strategy on the bootstrap deployment to replace? |
b05e007
to
a01aaf5
Compare
I'm not a huge fan of that change honestly. Replace effectively guarantees downtime in the gap between the old replicaset going down and the new one becoming Ready |
The only thing the bootstrap pod does after it starts up (after it creates/updates CRD's on startup), is the reconciler. The bootstrap pod should really be a job anyways. IMO. Not blocking on this though Technically it has a webhook for:
|
@steeling it's those webhooks I'm worried about, especially during upgrades. Any client or operator of an OSM resource that's using an older CRD version will end up triggering a conversion request to OSM bootstrap which, for a time, wouldn't have any replicas able to serve traffic |
I'm not sure a couple seconds of downtime on a small subset of operations is necessarily a bad thing during an upgrade, given the issues we've had here. Adding further complexity to the code, and banking on allowing an old reconciler to flip/flop back to prior states is undesirable IMO |
If anyone builds on top of OSM, we've now introduced errors/downtime in that upgrade path. Downtime during upgrade can be ok if all parties are aware and plan for it. To me, making that kind of change in a patch release is unexpected and error prone |
Hey @keithmattix, could you update the PR description to describe the osm version label approach? Thanks! |
Good catch @jaellio! Changed it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -243,7 +242,7 @@ func applyOrUpdateCRDs(crdClient *apiclient.ApiextensionsV1Client) { | |||
log.Fatal().Err(err).Msgf("Error decoding CRD file %s", file) | |||
} | |||
|
|||
crd.Labels[constants.ReconcileLabel] = strconv.FormatBool(enableReconciler) | |||
crd.Labels[constants.ReconcileLabel] = osmVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe conditionally add the label based on enableReconciler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense in general, but I'm wary of changing more behavior than I need to. The label wouldn't actually be used until the reconciler is turned on, so I think it's safe to keep this logic as is
Watch CRDs over a span of 15s to make sure that the CRD conversion patch is not overwritten by another reconciler Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
4e6da88
to
b3b0f9a
Compare
Change the reoncile label to refer to the version of OSM that is in charge of reconciling the CRDs. This ensures that on upgrade, the new version is the only version managing the CRDs
Description:
Fixes the upgrade bug from pre-v1.x.3 patches
Testing done:
E2E tests
Affected area:
Please answer the following questions with yes/no.
Does this change contain code from or inspired by another project? no
Is this a breaking change? no
Has documentation corresponding to this change been updated in the osm-docs repo (if applicable)? N/A