Skip to content

Encounter cluster has nodes with different OS images issue when using 1.17.0-pre.3 #36320

@truongnht

Description

@truongnht

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.4 and lower than v1.17.0

What happened?

environment: 1.17.0-pre.3
cilium-cli: built locally with 1.17.0-pre.3 source

When trying out new 1.17.0-pre.3, I noticed that the old cli is not compatible with new cilium version, so a new cli is built and used. In our EKS cluster, we do have two different OSes:

  • AL2 which is used in fargate node hosting Karpenter
  • Bottlerocket is configured on nodes provisioned by Karpenter

we do have nodeSelector in cilium values file for cilium agents/ operators/ envoys workloads. And in earlier version, our approach works well. This behavior breaks in 1.17.0-pre.3 (not in .2) with below error

cilium upgrade cilium cilium/cilium --namespace kube-system --version "$cilium_version" --values cilium-values.yaml       --set cluster.id=1       --set "cluster.name=${cluster_name}"       --set "eni.iamRole=${cilium_role_arn}"       --set "serviceAccounts.operator.annotations.eks\.amazonaws\.com/role-arn=${cilium_role_arn}" --set "k8sServiceHost=${cluster_api_endpoint}"
🔮 Auto-detected Kubernetes kind: EKS
ℹ️  Using Cilium version 1.17.0-pre.3
ℹ️  Using cluster name "truong-test-cluster"
🔮 Auto-detected kube-proxy has been installed
❌ Could not detect AWS node image family, defaulted to fallback value: Custom

Error: Unable to upgrade Cilium: cluster has nodes with different OS images: 'AmazonLinux2' and 'Bottlerocket'

I guess it might be related to this feature.

How can we reproduce the issue?

environment: 1.17.0-pre.3
cilium-cli: built locally with 1.17.0-pre.3 source

When trying out new 1.17.0-pre.3, I noticed that the old cli is not compatible with new cilium version, so a new cli is built and used. In our EKS cluster, we do have two different OSes:

  • AL2 which is used in fargate node hosting Karpenter
  • Bottlerocket is configured on nodes provisioned by Karpenter

we do have nodeSelector in cilium values file for cilium agents/ operators/ envoys workloads. And in earlier version, our approach works well. This behavior breaks in 1.17.0-pre.3 (not in .2) with below error

cilium upgrade cilium cilium/cilium --namespace kube-system --version "$cilium_version" --values cilium-values.yaml       --set cluster.id=1       --set "cluster.name=${cluster_name}"       --set "eni.iamRole=${cilium_role_arn}"       --set "serviceAccounts.operator.annotations.eks\.amazonaws\.com/role-arn=${cilium_role_arn}" --set "k8sServiceHost=${cluster_api_endpoint}"
🔮 Auto-detected Kubernetes kind: EKS
ℹ️  Using Cilium version 1.17.0-pre.3
ℹ️  Using cluster name "truong-test-cluster"
🔮 Auto-detected kube-proxy has been installed
❌ Could not detect AWS node image family, defaulted to fallback value: Custom

Error: Unable to upgrade Cilium: cluster has nodes with different OS images: 'AmazonLinux2' and 'Bottlerocket'

I guess it might be related to this feature.

Cilium Version

1.17.0-pre.3

Kernel Version

NA

Kubernetes Version

1.31

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

area/cliImpacts the command line interface of any command in the repository.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions