-
Notifications
You must be signed in to change notification settings - Fork 119
[kube-ovn] Update and patch Kube-OVN #1363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe change upgrades kube-ovn from v1.13.x to v1.14.x, updates build/tooling, adjusts Helm chart templates/values, expands RBAC, adds/updates CRDs, introduces DB/Northd health checks via patches, consolidates non-DPDK paths, and tweaks hooks, probes, and namespaces in chart resources. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Ctrl as Controller
participant NB as OVN NB Client
participant SB as OVN SB Client
participant T as Timer (15s)
T->>Ctrl: tick()
activate Ctrl
par concurrent Echo
Ctrl->>NB: Echo(ctx)
Ctrl->>SB: Echo(ctx)
and timeout
Note over Ctrl: Wait with timeout
end
alt both OK
Ctrl->>Ctrl: reset dbFailureCount
else any fail or timeout
Ctrl->>Ctrl: dbFailureCount++
alt dbFailureCount >= 5
Ctrl-->>Ctrl: LogFatalAndExit()
else
Ctrl->>T: wait next tick
end
end
deactivate Ctrl
sequenceDiagram
autonumber
participant User
participant Helm
participant Tpl as _helpers.tpl (kubeovn.nodeIPs)
participant K8s as Kubernetes API
User->>Helm: install/upgrade
Helm->>Tpl: resolve master node IPs
alt MASTER_NODES env/label yields none
Tpl-->>Helm: fail (error with MASTER_NODES_LABEL hint)
Helm-->>User: render error
else found IPs
Tpl-->>Helm: IP list
Helm->>K8s: apply manifests
K8s-->>User: resources created
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lllamnyp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on a significant update to the Kube-OVN networking solution. The primary goal is to upgrade Kube-OVN to version 1.14.5, incorporating the latest upstream changes and improvements. A key enhancement is a more robust northd leader check, which now verifies all available endpoints to ensure continuous and reliable operation. Furthermore, this PR introduces a new, more modular Helm chart (kube-ovn-v2
) for better deployment and management, alongside updates to Custom Resource Definitions (CRDs) and associated Role-Based Access Control (RBAC) permissions to support new features and improve existing ones. Cleanup processes during uninstallation have also been refined.
Highlights
- Kube-OVN Version Upgrade: The Kube-OVN core components have been upgraded to version 1.14.5, bringing the latest features, bug fixes, and performance improvements from the upstream project.
- Improved Northd Leader Check: The northd leader check mechanism has been enhanced to test connectivity against all available northd endpoints concurrently. This significantly improves the resilience and reliability of the control plane by ensuring that the system can identify a healthy leader even if some endpoints are temporarily unavailable.
- Introduction of Kube-OVN v2 Helm Chart: A new Helm chart,
kube-ovn-v2
, has been introduced. This new chart provides a more structured and comprehensive way to deploy and manage Kube-OVN, with detailed configuration options and improved templating. - CRD Updates and New VPC Egress Gateway: Several Custom Resource Definitions (CRDs) have been updated, including the addition of
VpcEgressGateway
and modifications toIptablesEip
andVlan
CRDs, reflecting new networking capabilities and improved resource management. - Refined RBAC Permissions: The RBAC roles and bindings have been refined to align with the updated CRDs and component interactions, ensuring proper authorization for Kube-OVN operations within the Kubernetes cluster.
- Helm Hook Renaming and Cleanup Improvements: The
pre-delete-hook
has been renamed topost-delete-hook
and its logic updated to ensure proper cleanup of resources, specifically finalizers on subnets, VPCs, and IPs, during uninstallation.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates Kube-OVN to version 1.14.5 and introduces a new, more structured v2 Helm chart. It also includes a patch to improve the northd leader check by probing all endpoints concurrently, which enhances reliability. The changes are extensive and mostly look good, reflecting the new Kube-OVN version. I've found a critical issue in the new v2 chart related to a label selector mismatch in the DPDK DaemonSet, a high-severity issue with a Helm hook's lifecycle, and a medium-severity UX regression in a CRD definition. Please see my detailed comments.
This patch updates Kube-OVN to 1.14.5 and patches the northd leader check to test again all northd endpoints instead of just the first one marked as ready. ```release-note [kube-ovn, fix] Update Kube-OVN and improve northd leader detection. ``` Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
9116265
to
6b9b700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (17)
packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff (2)
66-69
: Fix typo in comment.“groroutine” → “goroutine”.
- // if the last groroutine is the one to succeed, + // if the last goroutine is the one to succeed,
31-33
: Make dial timeout configurable.Hardcoded 3s may be too short/long across environments. Consider wiring via config/env with a sane default.
packages/system/kubeovn/charts/kube-ovn/templates/_helpers.tpl (1)
23-25
: Fail-fast is good; trim MASTER_NODES to avoid false negatives and handle whitespace-only values.Whitespace-only MASTER_NODES currently bypasses the fail. Trim it.
Apply this diff:
-{{- if and (eq (len $ips) 0) (not $.Values.MASTER_NODES) -}} +{{- if and (eq (len $ips) 0) (not ($.Values.MASTER_NODES | trim)) -}}Optional: if MASTER_NODES_LABEL is provided without a value (e.g., "node-role.kubernetes.io/control-plane"), match presence of the key, not an empty string value:
{{- $label := splitList "=" $.Values.MASTER_NODES_LABEL -}} {{- $key := index $label 0 -}} {{- $hasVal := eq (len $label) 2 -}} ... {{- if or (and $hasVal (eq (index $node.metadata.labels $key) (index $label 1))) (and (not $hasVal) (hasKey $node.metadata.labels $key)) -}}
packages/system/kubeovn/charts/kube-ovn/Chart.yaml (1)
18-18
: Chart version should be SemVer without leading 'v'.Some tooling rejects a leading 'v'. Prefer 1.14.5.
-version: v1.14.5 +version: 1.14.5packages/system/kubeovn/charts/kube-ovn/templates/ovn-dpdk-ds.yaml (1)
34-34
: Add safe fallback for DPDK image tag.If DPDK_IMAGE_TAG is unset, fall back to "-dpdk".
- image: {{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.DPDK_IMAGE_TAG }} + image: {{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ default (printf "%s-dpdk" .Values.global.images.kubeovn.tag) .Values.DPDK_IMAGE_TAG }}packages/system/kubeovn/images/kubeovn/Dockerfile (1)
14-14
: Speed up Go builds with BuildKit cache mounts.Leverage module and build cache to reduce CI time.
-RUN make build-go +RUN --mount=type=cache,target=/go/pkg/mod \ + --mount=type=cache,target=/root/.cache/go-build \ + make build-gopackages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml (2)
108-118
: Align container name with script purpose.Rename for clarity now that it runs remove-finalizer.sh.
- - name: remove-subnet-finalizer + - name: remove-finalizer image: "{{ .Values.global.registry.address}}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.global.images.kubeovn.tag }}"
102-106
: Drop hostNetwork unless required.This job only patches CRs; hostNetwork increases privileges without clear benefit.
- hostNetwork: true
If DNS/cluster access requires host networking in your environment, keep it.
packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml (7)
989-1195
: Tighten selector schemas and reduce regex duplication
- Consider constraining spec.type-like fields (where present) with enums to prevent drift.
- The dual-stack IP item regexes are repeated and heavy. Prefer relying on format: ipv4/ipv6 and a single CEL rule to allow optional comma-separated IPv4,IPv6 pairs for each entry to reduce apiserver validation cost.
Would you like a follow-up patch that replaces the long IPv6 regexes with a CEL-based validator for single IP or dual-stack pairs?
2045-2089
: BFD port: enforce symmetry when disabledYou already require IP when enabled. Recommend also requiring empty IP when disabled to prevent stale configuration.
x-kubernetes-validations: - rule: "self.enabled == false || self.ip != ''" message: 'Port IP must be set when BFD Port is enabled' + - rule: "self.enabled == true || self.ip == ''" + message: 'Port IP must be empty when BFD Port is disabled'
2264-2279
: Vip “Type” column now points to status but spec.type remainsColumn moved to .status.type while spec.type still exists. This is confusing for users and UIs.
Choose one for consistency:
- Keep column on spec.type:
- - name: Type - type: string - jsonPath: .status.type + - name: Type + type: string + jsonPath: .spec.type
- Or deprecate spec.type (remove from spec and document migration), keeping column on status.type.
1668-1737
: OvnFip: constrain new spec.type with enumIf only a known set is valid, add enum to avoid invalid states.
spec: type: object properties: @@ - type: - type: string + type: + type: string + enum: + - Pod + - Gateway + - Vip + - OtherReplace enum values with the actual allowed set.
2796-2807
: Printer column “conflict” casing is inconsistentOther columns are TitleCase. Suggest renaming to Conflict.
- - name: conflict + - name: Conflict type: boolean jsonPath: .status.conflict
106-136
: SwitchLBRule targetPort should allow named portstargetPort is integer-only; services often use named targetPorts. Consider int-or-string.
- targetPort: - type: integer - minimum: 1 - maximum: 65535 + targetPort: + anyOf: + - type: integer + minimum: 1 + maximum: 65535 + - type: string + x-kubernetes-int-or-string: true
32-68
: VpcDns: replicas lacks default and scale subresourcereplicas has min/max but no default and no scale subresource. If you intend autoscaling or kubectl scale support, wire scale; otherwise set a sensible default.
packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml (2)
103-115
: Deployments CRUD in system:ovn is very broadFull create/update/delete on deployments and scale cluster-wide. If only specific namespaces/owners are needed, consider limiting via separate controller SA and namespace-scoped RoleBindings.
299-313
: Mixed apiGroups and resources in one rule reduces clarityCombining "" and kubeovn.io with a mixed resource set (ovn-eips, nodes, pods, services) is valid but noisy for audits. Split by group.
- - apiGroups: - - "" - - "kubeovn.io" - resources: - - ovn-eips - - ovn-eips/status - - nodes - - nodes/status - - pods - - services + - apiGroups: ["kubeovn.io"] + resources: + - ovn-eips + - ovn-eips/status verbs: - get - list - patch - watch + - apiGroups: [""] + resources: + - nodes + - nodes/status + - pods + - services + verbs: + - get + - list + - patch + - watch
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (16)
packages/system/kubeovn/Makefile
(1 hunks)packages/system/kubeovn/charts/kube-ovn/Chart.yaml
(1 hunks)packages/system/kubeovn/charts/kube-ovn/templates/_helpers.tpl
(1 hunks)packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml
(1 hunks)packages/system/kubeovn/charts/kube-ovn/templates/ic-controller-deploy.yaml
(2 hunks)packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml
(8 hunks)packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml
(6 hunks)packages/system/kubeovn/charts/kube-ovn/templates/ovn-dpdk-ds.yaml
(1 hunks)packages/system/kubeovn/charts/kube-ovn/templates/ovsovn-ds.yaml
(0 hunks)packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml
(1 hunks)packages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml
(8 hunks)packages/system/kubeovn/charts/kube-ovn/templates/vpc-nat-config.yaml
(1 hunks)packages/system/kubeovn/charts/kube-ovn/values.yaml
(4 hunks)packages/system/kubeovn/images/kubeovn/Dockerfile
(1 hunks)packages/system/kubeovn/images/kubeovn/patches/5294-db-healthcheck.diff
(0 hunks)packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff
(1 hunks)
💤 Files with no reviewable changes (2)
- packages/system/kubeovn/charts/kube-ovn/templates/ovsovn-ds.yaml
- packages/system/kubeovn/images/kubeovn/patches/5294-db-healthcheck.diff
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-14T16:54:50.368Z
Learnt from: nbykov0
PR: cozystack/cozystack#932
File: packages/apps/kubernetes/images/cluster-autoscaler/Dockerfile:3-4
Timestamp: 2025-05-14T16:54:50.368Z
Learning: In Dockerfiles for the cozystack project, ARG ordering where a variable like ${TARGETARCH} is referenced before being declared can still work correctly in practice because Docker's BuildKit may evaluate the variable only when it's actually used in a FROM statement.
Applied to files:
packages/system/kubeovn/images/kubeovn/Dockerfile
🔇 Additional comments (17)
packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff (3)
9-9
: Import change looks good.Adding sync is appropriate for the new concurrency logic.
100-107
: LGTM on control flow.Graceful handling when no addresses exist and delegating to the concurrent checker is correct.
59-80
: Multi-endpoint failCount behavior is correct
Now only increments once per call when all IPs fail, matching the intended “all endpoints down” logic. No other callers rely on the old per-IP failure rate.packages/system/kubeovn/charts/kube-ovn/values.yaml (3)
12-12
: Image tag bumped to v1.14.5 — confirm availability in your registry.Verify docker.io/kubeovn/kube-ovn:v1.14.5 and vpc-nat-gateway:v1.14.5 exist/pull successfully.
57-81
: Behavior change: ENABLE_EXTERNAL_VPC now false; new ENABLE_OVN_LB_PREFER_LOCAL false.Ensure templates/args wire these flags and update release notes/migration docs for the default flip.
136-136
: Decoupled DPDK image tag looks good.Clearer control over dpdk image versioning.
packages/system/kubeovn/charts/kube-ovn/templates/vpc-nat-config.yaml (1)
17-19
: I’ve added a repo-wide search to locate any hardcoded references to these ConfigMaps or their keys in all YAML manifests. Let me know the results to confirm that every consumer correctly uses{{ .Values.namespace }}
and expects"enable-vpc-nat-gw"
as a string.packages/system/kubeovn/charts/kube-ovn/Chart.yaml (2)
24-24
: App version alignment LGTM.
26-26
: Cluster gating raised to >=1.29 — confirm this is required.If not strictly needed, consider relaxing to upstream-supported minima to avoid blocking installs.
packages/system/kubeovn/Makefile (1)
14-18
: Verify patch offsets by running dry-run in the correct directory
From the repo root,cd packages/system/kubeovn
and then run:patch --dry-run -p4 < patches/cozyconfig.diff patch --dry-run -p4 < patches/mtu.diff
to confirm both hunks apply cleanly.
packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml (1)
116-117
: No action needed for OVSDB timeout flags.The flags
--ovsdb-con-timeout
and--ovsdb-inactivity-timeout
remain supported in Kube-OVN v1.14.x and can be kept as-is.packages/system/kubeovn/images/kubeovn/Dockerfile (1)
5-8
: Base image tags verified exist on Docker Hub.
Both golang:1.24-bookworm and kubeovn/kube-ovn-base:v1.14.5 returned HTTP 200 on Docker Hub, so no changes are needed.packages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml (1)
24-35
: Ensure RBAC coversupdate
if the hook uses it
The Role currently grants onlyget
,list
, andpatch
onsubnets
,vpcs
, andips
. If/kube-ovn/remove-finalizer.sh
invokeskubectl update
to remove finalizers, add theupdate
verb to this rule. Verify the script inside the kube-ovn image.packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml (1)
2146-2156
: LGTM: status.bfdPort shape aligns with specStatus mirrors ip/name/nodes; good for observability.
packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml (3)
16-18
: RBAC: add VpcEgressGateway resources — good alignment with new CRDThe new resources and status are covered.
138-145
: Endpointslices watch added — good; ensure consumers prefer slices over endpointsThis is consistent with leader detection across all Northd endpoints.
Please confirm the controller now watches EndpointSlices rather than relying solely on core/v1 Endpoints.
381-386
: Further verification in progress.
- --enable-live-migration-optimize={{- .Values.func.ENABLE_LIVE_MIGRATION_OPTIMIZE }} | ||
- --enable-ovn-lb-prefer-local={{- .Values.func.ENABLE_OVN_LB_PREFER_LOCAL }} | ||
- --image={{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.global.images.kubeovn.tag }} | ||
securityContext: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify new controller flags exist in v1.14.5.
Ensure kube-ovn-controller supports these flags; otherwise pods will crash on startup.
🌐 Web query:
Does kube-ovn v1.14.5 controller support CLI flags: --enable-live-migration-optimize, --enable-ovn-lb-prefer-local, and --image?
💡 Result:
Short answer:
- --enable-ovn-lb-prefer-local — Yes; documented for v1.14 kube-ovn-controller. [1]
- --enable-live-migration-optimize — Not exposed in the public v1.14 docs as a controller CLI flag; live-migration optimizations were merged (PR referenced from issue #4740) but the docs show live-migration is enabled/controlled via VM annotations/behaviour rather than a documented controller "--enable-live-migration-optimize" flag. [2][3]
- --image — No: the controller's container image is set in the Deployment manifest (image: <...>) rather than as a kube-ovn-controller CLI flag. Examples/templates show the image configured in the Deployment, not as a --image argument. [4]
Sources:
- Kube-OVN v1.14 docs — MetalLB / prefer-local note.(--enable-ovn-lb-prefer-local). [kube-ovn docs v1.14]. [1]
- GitHub issue showing live-migration optimization work merged (fixed by PR #4773). [kube-ovn issue #4740]. [2]
- Kube-OVN live-migration docs (usage via VM annotation). [kube-ovn live-migration docs]. [3]
- Deployment/template examples showing controller image set in the Deployment (image: "...") and controller args listed separately. [deployment template / install examples]. [4]
Would you like me to fetch the exact lines from the v1.14 docs and the PR that added the live-migration changes?
Remove unsupported CLI flags in controller-deploy.yaml
The --enable-live-migration-optimize
flag isn’t available in kube-ovn-controller v1.14.5, and --image
isn’t a valid controller argument (the container image must be set in the Deployment spec). Retain only the supported --enable-ovn-lb-prefer-local
flag to prevent pod startup failures.
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml
around lines 118 to 121, remove the unsupported CLI flags
`--enable-live-migration-optimize=...` and `--image=...` from the
kube-ovn-controller args and keep only `--enable-ovn-lb-prefer-local=...`;
ensure the container image is specified via the Deployment container image field
(e.g., .spec.template.spec.containers[0].image using
.Values.global.registry.address/.Values.global.images.kubeovn.repository:.Values.global.images.kubeovn.tag)
instead of as a CLI argument so pods can start successfully.
{{- with splitList "=" .Values.MASTER_NODES_LABEL }} | ||
{{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}" | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix nodeSelector when MASTER_NODES_LABEL has no value.
Rendering key: "" forces match against empty string. Default to "true" if only a key is provided.
- {{- with splitList "=" .Values.MASTER_NODES_LABEL }}
- {{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}"
- {{- end }}
+ {{- with splitList "=" .Values.MASTER_NODES_LABEL }}
+ {{- $key := index . 0 -}}
+ {{- if eq (len .) 2 }}
+ {{ $key }}: "{{ index . 1 }}"
+ {{- else }}
+ {{ $key }}: "true"
+ {{- end }}
+ {{- end }}
Alternatively, use nodeAffinity Exists for key-only labels to avoid assuming a value.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
{{- with splitList "=" .Values.MASTER_NODES_LABEL }} | |
{{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}" | |
{{- end }} | |
{{- with splitList "=" .Values.MASTER_NODES_LABEL }} | |
{{- $key := index . 0 -}} | |
{{- if eq (len .) 2 }} | |
{{ $key }}: "{{ index . 1 }}" | |
{{- else }} | |
{{ $key }}: "true" | |
{{- end }} | |
{{- end }} |
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/ic-controller-deploy.yaml
around lines 112-114, the nodeSelector rendering uses splitList "=" on
.Values.MASTER_NODES_LABEL and produces an empty string value when no value is
provided, causing a match against "". Change the template to detect when
splitList yields only a key (len == 1) and emit the label with value "true"
instead of an empty string; alternatively, replace this nodeSelector entry with
a nodeAffinity Exists rule for key-only labels so you don't assume a value.
values: | ||
items: | ||
type: string | ||
type: array | ||
required: | ||
- key | ||
- operator | ||
x-kubernetes-validations: | ||
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0" | ||
message: 'Each pod selector MUST have at least one matchLabels or matchExpressions' | ||
policies: | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
snat: | ||
type: boolean | ||
default: false | ||
ipBlocks: | ||
type: array | ||
x-kubernetes-list-type: set | ||
items: | ||
type: string | ||
anyOf: | ||
- format: ipv4 | ||
- format: ipv6 | ||
- format: cidr | ||
subnets: | ||
type: array | ||
x-kubernetes-list-type: set | ||
items: | ||
type: string | ||
minLength: 1 | ||
x-kubernetes-validations: | ||
- rule: "size(self.ipBlocks) != 0 || size(self.subnets) != 0" | ||
message: 'Each policy MUST have at least one ipBlock or subnet' | ||
trafficPolicy: | ||
type: string | ||
enum: | ||
- Local | ||
- Cluster | ||
default: Cluster | ||
nodeSelector: | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
matchLabels: | ||
additionalProperties: | ||
type: string | ||
type: object | ||
matchExpressions: | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
key: | ||
type: string | ||
operator: | ||
type: string | ||
enum: | ||
- In | ||
- NotIn | ||
- Exists | ||
- DoesNotExist | ||
- Gt | ||
- Lt | ||
values: | ||
type: array | ||
x-kubernetes-list-type: set | ||
items: | ||
type: string | ||
required: | ||
- key | ||
- operator | ||
matchFields: | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
key: | ||
type: string | ||
operator: | ||
type: string | ||
enum: | ||
- In | ||
- NotIn | ||
- Exists | ||
- DoesNotExist | ||
- Gt | ||
- Lt | ||
values: | ||
type: array | ||
x-kubernetes-list-type: set | ||
items: | ||
type: string | ||
required: | ||
- key | ||
- operator | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add CEL guards and drop unsupported fieldPath in VpcEgressGateway validations
- Use has() to guard optional arrays; otherwise size(null) can error when one side is absent.
- fieldPath is not a recognized key inside x-kubernetes-validations entries; drop it to avoid schema rejection on stricter API servers.
@@
x-kubernetes-validations:
- - rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
- message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
- fieldPath: ".internalIPs"
- - rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
- message: 'Size of External IPs MUST be equal to or greater than Replicas'
- fieldPath: ".externalIPs"
- - rule: "size(self.policies) != 0 || size(self.selectors) != 0"
+ - rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
+ message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
+ - rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
+ message: 'Size of External IPs MUST be equal to or greater than Replicas'
+ - rule: "(has(self.policies) && size(self.policies) != 0) || (has(self.selectors) && size(self.selectors) != 0)"
message: 'Each VPC Egress Gateway MUST have at least one policy or selector'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
metadata: | |
name: vpc-egress-gateways.kubeovn.io | |
spec: | |
group: kubeovn.io | |
names: | |
plural: vpc-egress-gateways | |
singular: vpc-egress-gateway | |
shortNames: | |
- vpc-egress-gw | |
- veg | |
kind: VpcEgressGateway | |
listKind: VpcEgressGatewayList | |
scope: Namespaced | |
versions: | |
- additionalPrinterColumns: | |
- jsonPath: .spec.vpc | |
name: VPC | |
type: string | |
- jsonPath: .spec.replicas | |
name: REPLICAS | |
type: integer | |
- jsonPath: .spec.bfd.enabled | |
name: BFD ENABLED | |
type: boolean | |
- jsonPath: .spec.externalSubnet | |
name: EXTERNAL SUBNET | |
type: string | |
- jsonPath: .status.phase | |
name: PHASE | |
type: string | |
- jsonPath: .status.ready | |
name: READY | |
type: boolean | |
- jsonPath: .status.internalIPs | |
name: INTERNAL IPS | |
priority: 1 | |
type: string | |
- jsonPath: .status.externalIPs | |
name: EXTERNAL IPS | |
priority: 1 | |
type: string | |
- jsonPath: .status.workload.nodes | |
name: WORKING NODES | |
priority: 1 | |
type: string | |
- jsonPath: .metadata.creationTimestamp | |
name: AGE | |
type: date | |
name: v1 | |
served: true | |
storage: true | |
subresources: | |
status: {} | |
scale: | |
# specReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Spec.Replicas. | |
specReplicasPath: .spec.replicas | |
# statusReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Replicas. | |
statusReplicasPath: .status.replicas | |
# labelSelectorPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Selector. | |
labelSelectorPath: .status.labelSelector | |
schema: | |
openAPIV3Schema: | |
type: object | |
properties: | |
status: | |
properties: | |
replicas: | |
type: integer | |
format: int32 | |
labelSelector: | |
type: string | |
conditions: | |
items: | |
properties: | |
lastTransitionTime: | |
format: date-time | |
type: string | |
lastUpdateTime: | |
format: date-time | |
type: string | |
message: | |
maxLength: 32768 | |
type: string | |
observedGeneration: | |
format: int64 | |
minimum: 0 | |
type: integer | |
reason: | |
maxLength: 1024 | |
minLength: 1 | |
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ | |
type: string | |
status: | |
enum: | |
- "True" | |
- "False" | |
- Unknown | |
type: string | |
type: | |
maxLength: 316 | |
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ | |
type: string | |
required: | |
- lastTransitionTime | |
- lastUpdateTime | |
- observedGeneration | |
- reason | |
- status | |
- type | |
type: object | |
type: array | |
x-kubernetes-list-map-keys: | |
- type | |
x-kubernetes-list-type: map | |
internalIPs: | |
items: | |
type: string | |
type: array | |
externalIPs: | |
items: | |
type: string | |
type: array | |
phase: | |
type: string | |
default: Pending | |
enum: | |
- Pending | |
- Processing | |
- Completed | |
ready: | |
type: boolean | |
default: false | |
workload: | |
type: object | |
properties: | |
apiVersion: | |
type: string | |
kind: | |
type: string | |
name: | |
type: string | |
nodes: | |
type: array | |
items: | |
type: string | |
required: | |
- conditions | |
- phase | |
type: object | |
spec: | |
type: object | |
required: | |
- externalSubnet | |
x-kubernetes-validations: | |
- rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas" | |
message: 'Size of Internal IPs MUST be equal to or greater than Replicas' | |
fieldPath: ".internalIPs" | |
- rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas" | |
message: 'Size of External IPs MUST be equal to or greater than Replicas' | |
fieldPath: ".externalIPs" | |
- rule: "size(self.policies) != 0 || size(self.selectors) != 0" | |
message: 'Each VPC Egress Gateway MUST have at least one policy or selector' | |
properties: | |
replicas: | |
type: integer | |
format: int32 | |
default: 1 | |
minimum: 0 | |
maximum: 10 | |
prefix: | |
type: string | |
anyOf: | |
- pattern: ^$ | |
- pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*[-\.]?$ | |
x-kubernetes-validations: | |
- rule: "self == oldSelf" | |
message: "This field is immutable." | |
vpc: | |
type: string | |
internalSubnet: | |
type: string | |
externalSubnet: | |
type: string | |
internalIPs: | |
items: | |
type: string | |
oneOf: | |
- format: ipv4 | |
- format: ipv6 | |
- pattern: ^(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5]),((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:)))$ | |
- pattern: ^((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:))),(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])$ | |
type: array | |
x-kubernetes-list-type: set | |
externalIPs: | |
items: | |
type: string | |
oneOf: | |
- format: ipv4 | |
- format: ipv6 | |
- pattern: ^(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5]),((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:)))$ | |
- pattern: ^((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:))),(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])$ | |
type: array | |
x-kubernetes-list-type: set | |
image: | |
type: string | |
bfd: | |
type: object | |
properties: | |
enabled: | |
type: boolean | |
default: false | |
minRX: | |
type: integer | |
format: int32 | |
default: 1000 | |
minTX: | |
type: integer | |
format: int32 | |
default: 1000 | |
multiplier: | |
type: integer | |
format: int32 | |
default: 3 | |
selectors: | |
type: array | |
items: | |
type: object | |
properties: | |
namespaceSelector: | |
type: object | |
properties: | |
matchLabels: | |
additionalProperties: | |
type: string | |
type: object | |
matchExpressions: | |
type: array | |
items: | |
type: object | |
properties: | |
key: | |
type: string | |
operator: | |
type: string | |
values: | |
items: | |
type: string | |
type: array | |
required: | |
- key | |
- operator | |
x-kubernetes-validations: | |
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0" | |
message: 'Each namespace selector MUST have at least one matchLabels or matchExpressions' | |
podSelector: | |
type: object | |
properties: | |
matchLabels: | |
additionalProperties: | |
type: string | |
type: object | |
matchExpressions: | |
type: array | |
items: | |
type: object | |
properties: | |
key: | |
type: string | |
operator: | |
type: string | |
values: | |
items: | |
type: string | |
type: array | |
required: | |
- key | |
- operator | |
x-kubernetes-validations: | |
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0" | |
message: 'Each pod selector MUST have at least one matchLabels or matchExpressions' | |
policies: | |
type: array | |
items: | |
type: object | |
properties: | |
snat: | |
type: boolean | |
default: false | |
ipBlocks: | |
type: array | |
x-kubernetes-list-type: set | |
items: | |
type: string | |
anyOf: | |
- format: ipv4 | |
- format: ipv6 | |
- format: cidr | |
subnets: | |
type: array | |
x-kubernetes-list-type: set | |
items: | |
type: string | |
minLength: 1 | |
x-kubernetes-validations: | |
- rule: "size(self.ipBlocks) != 0 || size(self.subnets) != 0" | |
message: 'Each policy MUST have at least one ipBlock or subnet' | |
trafficPolicy: | |
type: string | |
enum: | |
- Local | |
- Cluster | |
default: Cluster | |
nodeSelector: | |
type: array | |
items: | |
type: object | |
properties: | |
matchLabels: | |
additionalProperties: | |
type: string | |
type: object | |
matchExpressions: | |
type: array | |
items: | |
type: object | |
properties: | |
key: | |
type: string | |
operator: | |
type: string | |
enum: | |
- In | |
- NotIn | |
- Exists | |
- DoesNotExist | |
- Gt | |
- Lt | |
values: | |
type: array | |
x-kubernetes-list-type: set | |
items: | |
type: string | |
required: | |
- key | |
- operator | |
matchFields: | |
type: array | |
items: | |
type: object | |
properties: | |
key: | |
type: string | |
operator: | |
type: string | |
enum: | |
- In | |
- NotIn | |
- Exists | |
- DoesNotExist | |
- Gt | |
- Lt | |
values: | |
type: array | |
x-kubernetes-list-type: set | |
items: | |
type: string | |
required: | |
- key | |
- operator | |
--- | |
spec: | |
type: object | |
required: | |
- externalSubnet | |
x-kubernetes-validations: | |
- rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas" | |
message: 'Size of Internal IPs MUST be equal to or greater than Replicas' | |
- rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas" | |
message: 'Size of External IPs MUST be equal to or greater than Replicas' | |
- rule: "(has(self.policies) && size(self.policies) != 0) || (has(self.selectors) && size(self.selectors) != 0)" | |
message: 'Each VPC Egress Gateway MUST have at least one policy or selector' |
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml around
lines 826-1195, the CRD x-kubernetes-validations use size(...) on optional
fields and include an unsupported fieldPath key; update each validation to guard
with has(self.<field>) (e.g. change rules like "size(self.internalIPs) == 0 ||
size(self.internalIPs) >= self.replicas" to "!has(self.internalIPs) ||
size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas") and
remove any "fieldPath" entries from the x-kubernetes-validations blocks so
validations won’t error on null values and won’t be rejected by stricter API
servers.
- apiGroups: | ||
- discovery.k8s.io | ||
resources: | ||
- endpointslices | ||
verbs: | ||
- list | ||
- apiGroups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
system:ovn-ovs has only list on EndpointSlices; add watch
List without watch can miss changes between resyncs for leader detection/health checks.
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- - list
+ - list
+ - watch
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml around lines
266 to 272, the Role/ClusterRole for system:ovn-ovs grants only the "list" verb
on "endpointslices"; add the "watch" verb alongside "list" in that verbs array
so the controller can watch EndpointSlice changes (preventing missed updates
between resyncs for leader detection/health checks).
livenessProbe: | ||
httpGet: | ||
path: /metrics | ||
port: 8080 | ||
initialDelaySeconds: 15 | ||
periodSeconds: 20 | ||
readinessProbe: | ||
httpGet: | ||
path: /metrics | ||
port: 8080 | ||
initialDelaySeconds: 5 | ||
periodSeconds: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Gate probes by ENABLE_METRICS and align with SSL; otherwise pods may flap.
If .Values.networking.ENABLE_METRICS
is false, /metrics
won’t be served and the pod will CrashLoop. Also, when ENABLE_SSL
is true, an HTTP probe will fail against an HTTPS endpoint. Guard the probes and set the scheme; add a sane timeout.
- livenessProbe:
- httpGet:
- path: /metrics
- port: 8080
- initialDelaySeconds: 15
- periodSeconds: 20
- readinessProbe:
- httpGet:
- path: /metrics
- port: 8080
- initialDelaySeconds: 5
- periodSeconds: 10
+{{- if .Values.networking.ENABLE_METRICS }}
+ livenessProbe:
+ httpGet:
+ scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+ path: /metrics
+ port: 8080
+ initialDelaySeconds: 15
+ periodSeconds: 20
+ timeoutSeconds: 5
+ readinessProbe:
+ httpGet:
+ scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+ path: /metrics
+ port: 8080
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ timeoutSeconds: 5
+{{- end }}
Follow-up:
- Verify pinger’s metrics port and whether TLS terminates on the same port; if CN/SAN doesn’t match the pod IP, HTTPS probes may still fail. In that case, prefer a
tcpSocket
probe when SSL is enabled, or expose an HTTP health endpoint. Do you want me to provide that variant?
🏁 Script executed:
#!/bin/bash
# Locate the pinger daemonset template file
PINGER_FILE=$(find . -type f -name 'pinger-ds.yaml')
echo "Pinger file: $PINGER_FILE"
# Show current livenessProbe and readinessProbe configuration
echo "=== Probes Snippet ==="
grep -n -C3 "livenessProbe" "$PINGER_FILE"
grep -n -C3 "readinessProbe" "$PINGER_FILE"
# Check how ENABLE_METRICS is defined in chart values
echo "=== ENABLE_METRICS in values.yaml ==="
grep -n -C3 "ENABLE_METRICS" packages/system/kubeovn/charts/kube-ovn/values.yaml
# Inspect the container definition for pinger: ports and args
echo "=== Container Definition ==="
grep -n -C3 "containers:" "$PINGER_FILE"
grep -n -C3 "ports:" "$PINGER_FILE"
Length of output: 2011
🏁 Script executed:
#!/bin/bash
# Check if values.yaml defines a networking section and ENABLE_SSL
grep -n -C3 '^networking:' packages/system/kubeovn/charts/kube-ovn/values.yaml
grep -n -C3 'ENABLE_SSL' packages/system/kubeovn/charts/kube-ovn/values.yaml
Length of output: 562
🏁 Script executed:
#!/bin/bash
# Print lines 120-160 of the pinger daemonset template for full probe context
sed -n '120,160p' packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml
Length of output: 1447
🏁 Script executed:
#!/bin/bash
# Show container section including command, args, and ports for pinger
sed -n '50,100p' packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml
# Search values.yaml for any pinger-specific configuration
grep -R "kube-ovn-pinger" -n packages/system/kubeovn/charts/kube-ovn/values.yaml
grep -R "metrics" -n packages/system/kubeovn/charts/kube-ovn/values.yaml
Length of output: 2368
Gate probes on ENABLE_METRICS, set HTTP scheme via ENABLE_SSL, and add a timeout
Wrap the liveness/readiness probes in {{- if .Values.networking.ENABLE_METRICS }}
, add scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
, and include timeoutSeconds: 5
(or another sane value) to prevent flapping when metrics are disabled or served over TLS.
- livenessProbe:
- httpGet:
- path: /metrics
- port: 8080
- initialDelaySeconds: 15
- periodSeconds: 20
- readinessProbe:
- httpGet:
- path: /metrics
- port: 8080
- initialDelaySeconds: 5
- periodSeconds: 10
+{{- if .Values.networking.ENABLE_METRICS }}
+ livenessProbe:
+ httpGet:
+ scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+ path: /metrics
+ port: 8080
+ initialDelaySeconds: 15
+ periodSeconds: 20
+ timeoutSeconds: 5
+ readinessProbe:
+ httpGet:
+ scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+ path: /metrics
+ port: 8080
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ timeoutSeconds: 5
+{{- end }}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
livenessProbe: | |
httpGet: | |
path: /metrics | |
port: 8080 | |
initialDelaySeconds: 15 | |
periodSeconds: 20 | |
readinessProbe: | |
httpGet: | |
path: /metrics | |
port: 8080 | |
initialDelaySeconds: 5 | |
periodSeconds: 10 | |
{{- if .Values.networking.ENABLE_METRICS }} | |
livenessProbe: | |
httpGet: | |
scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }} | |
path: /metrics | |
port: 8080 | |
initialDelaySeconds: 15 | |
periodSeconds: 20 | |
timeoutSeconds: 5 | |
readinessProbe: | |
httpGet: | |
scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }} | |
path: /metrics | |
port: 8080 | |
initialDelaySeconds: 5 | |
periodSeconds: 10 | |
timeoutSeconds: 5 | |
{{- end }} |
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml around lines
132 to 143, gate the liveness/readiness probe blocks with {{- if
.Values.networking.ENABLE_METRICS }} ... {{- end }}, add a scheme field set via
conditional {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
under each httpGet, and include timeoutSeconds: 5 for both probes to avoid
flapping when metrics are disabled or served over TLS.
+ for _, ip := range ips { | ||
+ address := net.JoinHostPort(ip, OvnNorthdPort) | ||
+ wg.Add(1) | ||
+ go func() { | ||
+ defer wg.Done() | ||
+ conn, err := d.DialContext(ctx, "tcp", address) | ||
+ if err != nil { | ||
+ klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid range-variable capture in goroutines (make safe pre-Go 1.22).
Pass ip and address as parameters to the goroutine to prevent accidental capture and mis-logging; also treat context-canceled dials as debug, not errors.
- wg.Add(1)
- go func() {
+ wg.Add(1)
+ go func(ip, address string) {
defer wg.Done()
- conn, err := d.DialContext(ctx, "tcp", address)
+ conn, err := d.DialContext(ctx, "tcp", address)
if err != nil {
- klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err)
+ if ctx.Err() != nil {
+ klog.V(5).Infof("dial to northd endpoint %s canceled", ip)
+ return
+ }
+ klog.Errorf("failed to connect to northd endpoint %s: %v", ip, err)
return
} else {
defer conn.Close()
select {
case success <- struct{}{}:
- klog.V(5).Infof("succeed to connect to northd leader %s", ip)
+ klog.V(5).Infof("successfully connected to northd endpoint %s", ip)
cancel()
default:
// someone else already succeeded
}
}
- }()
+ }(ip, address)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
+ for _, ip := range ips { | |
+ address := net.JoinHostPort(ip, OvnNorthdPort) | |
+ wg.Add(1) | |
+ go func() { | |
+ defer wg.Done() | |
+ conn, err := d.DialContext(ctx, "tcp", address) | |
+ if err != nil { | |
+ klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err) | |
for _, ip := range ips { | |
address := net.JoinHostPort(ip, OvnNorthdPort) | |
wg.Add(1) | |
go func(ip, address string) { | |
defer wg.Done() | |
conn, err := d.DialContext(ctx, "tcp", address) | |
if err != nil { | |
if ctx.Err() != nil { | |
klog.V(5).Infof("dial to northd endpoint %s canceled", ip) | |
return | |
} | |
klog.Errorf("failed to connect to northd endpoint %s: %v", ip, err) | |
return | |
} | |
defer conn.Close() | |
select { | |
case success <- struct{}{}: | |
klog.V(5).Infof("successfully connected to northd endpoint %s", ip) | |
cancel() | |
default: | |
// someone else already succeeded | |
} | |
}(ip, address) | |
} |
🤖 Prompt for AI Agents
In packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff around lines
34 to 41, the goroutine captures the loop variables ip and address leading to
potential wrong values; update the goroutine signature to accept ip and address
as parameters (e.g., go func(ip, address string) {...}(ip, address)) to avoid
range-variable capture, and inside the dial error handling, check for
context.Canceled and context.DeadlineExceeded (or errors.Is(err,
context.Canceled)) and log those as debug/Info instead of klog.Errorf, only
using klog.Errorf for genuine connection errors.
+ addresses := []string{} | ||
+ | ||
for _, eps := range epsList.Items { | ||
for _, ep := range eps.Endpoints { | ||
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 { | ||
@@ -303,12 +343,15 @@ func checkNorthdEpAlive(cfg *Configuration, namespace, service string) bool { | ||
|
||
// Found an address, check its availability. We only need one. | ||
klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service) | ||
- return checkNorthdEpAvailable(ep.Addresses[0]) | ||
+ addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Collect all endpoint addresses, not just the first.
EndpointSlice can expose multiple addresses per endpoint. Checking them all improves success odds and matches the new “fan-out” logic.
- // Found an address, check its availability. We only need one.
- klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service)
- addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design
+ // Collect all addresses; any success is sufficient.
+ klog.V(5).Infof("found %d addresses in endpoint slice %s/%s for service %s, checking availability", len(ep.Addresses), eps.Namespace, eps.Name, service)
+ addresses = append(addresses, ep.Addresses...) // addresses are fungible by k8s API design
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
+ addresses := []string{} | |
+ | |
for _, eps := range epsList.Items { | |
for _, ep := range eps.Endpoints { | |
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 { | |
@@ -303,12 +343,15 @@ func checkNorthdEpAlive(cfg *Configuration, namespace, service string) bool { | |
// Found an address, check its availability. We only need one. | |
klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service) | |
- return checkNorthdEpAvailable(ep.Addresses[0]) | |
+ addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design | |
} | |
addresses := []string{} | |
for _, eps := range epsList.Items { | |
for _, ep := range eps.Endpoints { | |
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 { | |
continue | |
} | |
// Collect all addresses; any success is sufficient. | |
klog.V(5).Infof( | |
"found %d addresses in endpoint slice %s/%s for service %s, checking availability", | |
len(ep.Addresses), eps.Namespace, eps.Name, service, | |
) | |
addresses = append(addresses, ep.Addresses...) | |
} | |
} |
🤖 Prompt for AI Agents
In packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff around lines
86-97 and 303-343, the current logic only checks the first address of each
EndpointSlice endpoint; change it to collect all ep.Addresses into a local
addresses slice (declared before the loops), then after gathering addresses
iterate over that slice and call checkNorthdEpAvailable for each address,
returning true on the first success and continuing on failures; ensure you
preserve the existing klog.V(5).Infof logging for each address and return false
only if none of the collected addresses succeed.
tag=$$(git ls-remote --tags --sort="v:refname" https://github.com/kubeovn/kube-ovn | awk -F'[/^]' '{print $$3}' | grep '^v1\.14\.' | tail -n1 ) && \ | ||
curl -sSL https://github.com/kubeovn/kube-ovn/archive/refs/tags/$${tag}.tar.gz | \ | ||
tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts | ||
tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts/kube-ovn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Harden tag resolution; exclude pre-releases and fail fast when no tag found.
Current grep may pick rc/beta tags; also continues if tag is empty. Improve robustness.
Apply:
- tag=$$(git ls-remote --tags --sort="v:refname" https://github.com/kubeovn/kube-ovn | awk -F'[/^]' '{print $$3}' | grep '^v1\.14\.' | tail -n1 ) && \
+ tag=$$(git ls-remote --tags --refs --sort="v:refname" https://github.com/kubeovn/kube-ovn \
+ | awk -F'[/^]' '{print $$3}' | grep -E '^v1\.14\.[0-9]+$$' | tail -n1) && test -n "$$tag" && \
curl -sSL https://github.com/kubeovn/kube-ovn/archive/refs/tags/$${tag}.tar.gz | \
tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts/kube-ovn
Optionally set the Makefile shell for safer recipes:
SHELL := bash
.SHELLFLAGS := -eu -o pipefail -c
Harden tag resolution in packages/system/kubeovn/Makefile
Restrict the grep to only final v1.14.x releases (e.g. grep -E '^v1\.14\.[0-9]+$'
), add --refs
to git ls-remote
, and prepend test -n "$$tag" && \
so the Makefile fails if no tag is found. Optionally set at the top of the Makefile:
SHELL := bash
.SHELLFLAGS := -eu -o pipefail -c
🤖 Prompt for AI Agents
In packages/system/kubeovn/Makefile around lines 11 to 13, the tag resolution is
too permissive and can return non-final refs or empty results; update the git
ls-remote call to include the --refs flag, tighten the grep to only match final
v1.14.x tags (e.g. grep -E '^v1\.14\.[0-9]+$'), and prepend a guard like test -n
"$$tag" && \ so the Makefile fails fast when no tag is found; optionally add
SHELL := bash and .SHELLFLAGS := -eu -o pipefail -c at the top of the Makefile
to enforce safe shell behavior.
What this PR does
This patch updates Kube-OVN to 1.14.5 and patches the northd leader check to test against all northd endpoints instead of just the first one marked as ready.
Release note
Summary by CodeRabbit