Skip to content

Conversation

cPu1
Copy link
Contributor

@cPu1 cPu1 commented Oct 14, 2021

Description

This changelist enables Windows IPAM when creating a new nodegroup in create cluster and create nodegroup, and adds a --delete flag to eksctl utils install-vpc-controllers to allow deleting an existing installation of VPC controller from worker nodes.

Closes #2401

Logs from testing:

$ eksctl utils install-vpc-controllers --delete --cluster <cluster>
2021-10-14 16:21:47 [ℹ]  using region us-west-2
2021-10-14 16:21:50 [ℹ]  (plan) would have deleted "kube-system:MutatingWebhookConfiguration.admissionregistration.k8s.io/vpc-admission-webhook-cfg"
2021-10-14 16:21:51 [ℹ]  (plan) would have deleted "CertificateSigningRequest.certificates.k8s.io/vpc-admission-webhook.kube-system"
2021-10-14 16:21:51 [ℹ]  (plan) would have deleted "kube-system:Deployment.apps/vpc-admission-webhook"
2021-10-14 16:21:51 [ℹ]  (plan) would have deleted "ClusterRole.rbac.authorization.k8s.io/vpc-resource-controller"
2021-10-14 16:21:52 [ℹ]  (plan) would have deleted "ClusterRoleBinding.rbac.authorization.k8s.io/vpc-resource-controller"
2021-10-14 16:21:52 [ℹ]  (plan) would have deleted "kube-system:Deployment.apps/vpc-resource-controller"
2021-10-14 16:21:52 [ℹ]  (plan) would have deleted "kube-system:Secret/vpc-admission-webhook-certs"
2021-10-14 16:21:52 [!]  no changes were applied, run again with '--approve' to apply the changes
$ eksctl utils install-vpc-controllers --delete --cluster <cluster> --approve
2021-10-14 16:23:18 [ℹ]  using region us-west-2
2021-10-14 16:23:22 [ℹ]  deleted "kube-system:MutatingWebhookConfiguration.admissionregistration.k8s.io/vpc-admission-webhook-cfg"
2021-10-14 16:23:23 [ℹ]  deleted "CertificateSigningRequest.certificates.k8s.io/vpc-admission-webhook.kube-system"
2021-10-14 16:23:28 [ℹ]  deleted "kube-system:Deployment.apps/vpc-admission-webhook"
2021-10-14 16:23:29 [ℹ]  deleted "ClusterRole.rbac.authorization.k8s.io/vpc-resource-controller"
2021-10-14 16:23:30 [ℹ]  deleted "ClusterRoleBinding.rbac.authorization.k8s.io/vpc-resource-controller"
2021-10-14 16:23:40 [ℹ]  deleted "kube-system:Deployment.apps/vpc-resource-controller"
2021-10-14 16:23:41 [ℹ]  deleted "kube-system:Secret/vpc-admission-webhook-certs"

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@cPu1 cPu1 force-pushed the vpc-controller-delete branch from 016e8cc to 384a7ae Compare October 14, 2021 11:30
@cPu1 cPu1 added the kind/feature New feature or request label Oct 14, 2021
@@ -13,6 +13,8 @@ nodeGroups:
amiFamily: WindowsServer2019FullContainer
minSize: 2
maxSize: 3

managedNodeGroups:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

Comment on lines 37 to 93
func (w *IPAM) Enable(ctx context.Context) error {
configMaps := w.Clientset.CoreV1().ConfigMaps(metav1.NamespaceSystem)
vpcCNIConfig, err := configMaps.Get(ctx, vpcCNIName, metav1.GetOptions{})
if err != nil {
if !apierrors.IsNotFound(err) {
return errors.Wrap(err, "error getting resource")
}
return createConfigMap(ctx, configMaps)
}

if val, ok := vpcCNIConfig.Data[windowsIPAMField]; ok && val == "true" {
logger.Info("Windows IPAM is already enabled")
return nil
}

patch, err := createPatch(vpcCNIConfig)
if err != nil {
return errors.Wrap(err, "error creating merge patch")
}

_, err = configMaps.Patch(ctx, vpcCNIName, types.StrategicMergePatchType, patch, metav1.PatchOptions{})
if err != nil {
return errors.Wrap(err, "failed to patch resource")
}
return nil
}

func createPatch(cm *corev1.ConfigMap) ([]byte, error) {
oldData, err := json.Marshal(cm)
if err != nil {
return nil, err
}
cm.Data[windowsIPAMField] = "true"
modifiedData, err := json.Marshal(cm)
if err != nil {
return nil, err
}
return jsonpatch.CreateMergePatch(oldData, modifiedData)
}

func createConfigMap(ctx context.Context, configMaps corev1client.ConfigMapInterface) error {
cm := &corev1.ConfigMap{
TypeMeta: metav1.TypeMeta{
Kind: "ConfigMap",
APIVersion: "v1",
},
ObjectMeta: metav1.ObjectMeta{
Name: vpcCNIName,
Namespace: vpcCNINamespace,
},
Data: map[string]string{
windowsIPAMField: "true",
},
}
_, err := configMaps.Create(ctx, cm, metav1.CreateOptions{})
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to say the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a test suite.

Clientset kubernetes.Interface
}

// Enable enables Windows IPAM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Enable enables Windows IPAM
// Enable enables Windows IPAM in the VPC CNI configmap

vpcCNIConfig, err := configMaps.Get(ctx, vpcCNIName, metav1.GetOptions{})
if err != nil {
if !apierrors.IsNotFound(err) {
return errors.Wrap(err, "error getting resource")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "error getting resource")
return errors.Wrapf(err, "error getting %q configmap", vpcCNIName)

Copy link
Contributor

@Skarlso Skarlso Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more fmt.Errorf results than errors.Wrapf. Which one do we use under what circumstance? It would be nice to be uniform. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I prefer fmt.Errorf as it has %w for wrapping errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most calls to fmt.Errorf are used without the %w verb, so they are equivalent to errors.Errorf. For consistency, I prefer using errors.Wrapf.


_, err = configMaps.Patch(ctx, vpcCNIName, types.StrategicMergePatchType, patch, metav1.PatchOptions{})
if err != nil {
return errors.Wrap(err, "failed to patch resource")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "failed to patch resource")
return errors.Wrapf(err, "failed to patch %q configmap", vpcCNIName)

}

// Enable enables Windows IPAM
func (w *IPAM) Enable(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if someone enables this functionality while also having the legacy windows VPC controller installed? Do we need to warn them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not providing a dedicated command to enable this functionality, but rather enabling IPAM behind the scenes when a Windows nodegroup is created as part of create cluster or create nodegroup. We log a warning in create cluster and utils install-vpc-controllers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about existing clusters with windows nodegroups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't query the cluster for existing Windows nodegroups, but an improvement would be to log the same warning if the cluster has Windows nodegroups and VPC controller installed.

Comment on lines 4 to 22
"context"
"encoding/json"

"github.com/kris-nova/logger"

"k8s.io/apimachinery/pkg/types"

jsonpatch "github.com/evanphx/json-patch/v5"

corev1client "k8s.io/client-go/kubernetes/typed/core/v1"

corev1 "k8s.io/api/core/v1"

apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"github.com/pkg/errors"

"k8s.io/client-go/kubernetes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, but would you mind ordering these imports please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"context"
"encoding/json"
"github.com/kris-nova/logger"
"k8s.io/apimachinery/pkg/types"
jsonpatch "github.com/evanphx/json-patch/v5"
corev1client "k8s.io/client-go/kubernetes/typed/core/v1"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"github.com/pkg/errors"
"k8s.io/client-go/kubernetes"
import (
"context"
"encoding/json"
jsonpatch "github.com/evanphx/json-patch/v5"
"github.com/kris-nova/logger"
"github.com/pkg/errors"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
corev1client "k8s.io/client-go/kubernetes/typed/core/v1"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I usually rely on goimports to do the sorting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I do the same. This the result of it. 🤔 I don't think I'm setting any funky settings.

Comment on lines +64 to +75
func createPatch(cm *corev1.ConfigMap) ([]byte, error) {
oldData, err := json.Marshal(cm)
if err != nil {
return nil, err
}
cm.Data[windowsIPAMField] = "true"
modifiedData, err := json.Marshal(cm)
if err != nil {
return nil, err
}
return jsonpatch.CreateMergePatch(oldData, modifiedData)
}
Copy link
Contributor

@Skarlso Skarlso Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this logic. You already have the original object. You don't need jsonpatch here.

Just use this:

newCM := cm.DeepCopy()
newCM.Data[windowsIPAMField] = "true"
data, err := json.Marshal(newCM)
if err != nil {
    return nil, err
}

if _, err := configMaps.Patch(ctx, vpcCNIName, types.StrategicMergePatchType, data, metav1.PatchOptions{}); err != nil {
	return errors.Wrapf(err, "failed to patch %q configmap", vpcCNIName)
}

And done. Unless I'm mis-reading something :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need a JSON patch here to avoid overwriting any changes made by other clients between get configmap and patch configmap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. But then shouldn't you be defining a JSONPatchType instead of strategic merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StrategicMergePatchType constant is actually the application/strategic-merge-patch+json media type, which also handles JSON patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay. Cool then :) You win.:D

@@ -1,10 +1,15 @@
# Windows Worker Nodes

From version 1.14, Amazon EKS supports [Windows Nodes][eks-user-guide] that allow running Windows containers.
In addition to having Windows nodes, a Linux node in the cluster is required to run the VPC resource controller and CoreDNS, as Microsoft doesn't support host-networking mode yet. Thus, a Windows EKS cluster will be a mixed-mode cluster containing Windows nodes and at least one Linux node.
In addition to having Windows nodes, a Linux node in the cluster is required to run CoreDNS, as Microsoft doesn't support host-networking mode yet. Thus, a Windows EKS cluster will be a mixed-mode cluster containing Windows nodes and at least one Linux node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition to having Windows nodes, a Linux node in the cluster is required to run CoreDNS, as Microsoft doesn't support host-networking mode yet. Thus, a Windows EKS cluster will be a mixed-mode cluster containing Windows nodes and at least one Linux node.
In addition to having Windows nodes, a Linux node in the cluster is required to run CoreDNS, as Microsoft doesn't support host-networking mode yet. Thus, a Windows EKS cluster will be a mixed-node cluster containing Windows nodes and at least one Linux node.

I think that's supposed to be mixed-node, right? I can see that it was mixed-mode previously as well, but I think that's a typo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure because mixed-node also seems unclear to me. WDYT about rephrasing it to "Thus, a Windows EKS cluster will be a mixture of Windows nodes and at least one Linux node"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds a lot better. Thanks! :)

You no longer need to install the VPC resource controller on Linux worker nodes to run Windows workloads in EKS clusters.
You can enable Windows IP address management on the EKS control plane via a ConfigMap setting (see https://todo.com for details).
eksctl will automatically patch the ConfigMap to enable Windows IP address management when a Windows nodegroup is created.
For existing clusters, you can enable it manually and run `eksctl utils install-vpc-controllers` with the `--delete` flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For existing clusters, you can enable it manually and run `eksctl utils install-vpc-controllers` with the `--delete` flag
For existing clusters, you can enable it manually by running `eksctl utils install-vpc-controllers` with the `--delete` flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eksctl utils install-vpc-controllers --delete does not enable Windows IPAM. In this part, we are asking users to enable IPAM manually by following the linked documentation and then delete VPC controller which is no longer required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, enabling it manually is a step they have to take? Can you separate the two sentences then please? Because reading that made me think I'm enabling it by running that command. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put a , after manually I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, enabling it manually is a step they have to take?

Correct.

Can you separate the two sentences then please?

Sure, makes sense.

@cPu1 cPu1 force-pushed the vpc-controller-delete branch from 2357367 to a6da10a Compare October 14, 2021 13:54
Copy link
Contributor

@aclevername aclevername left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@Skarlso Skarlso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, from me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate installation of VPC resource controller
3 participants