Skip to content

Conversation

lllamnyp
Copy link
Member

@lllamnyp lllamnyp commented Aug 28, 2025

What this PR does

This patch updates Kube-OVN to 1.14.5 and patches the northd leader check to test against all northd endpoints instead of just the first one marked as ready.

Release note

[kube-ovn, fix] Update Kube-OVN and improve northd leader detection.

Summary by CodeRabbit

  • New Features
    • Introduces VpcEgressGateway CRD for advanced egress control.
    • Adds controller flags for live-migration optimization and OVN LB prefer-local.
    • Adds liveness/readiness probes to pinger.
  • Improvements
    • Upgrades Kube-OVN to v1.14.5; requires Kubernetes 1.29+.
    • Enhanced stability with new database and northd health checks.
    • Dynamic namespace and master node selection; stricter Helm validation when no masters found.
    • Post-delete cleanup hook renamed and expanded to cover more resources.
    • Dedicated DPDK image tag support; streamlined non-DPDK DaemonSet.
    • RBAC updated to include EndpointSlice and deployment permissions.
  • Refactor
    • CRD schema updates (new fields, status changes, VIP restructuring).

@lllamnyp lllamnyp requested review from kvaps and klinch0 as code owners August 28, 2025 15:51
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

The change upgrades kube-ovn from v1.13.x to v1.14.x, updates build/tooling, adjusts Helm chart templates/values, expands RBAC, adds/updates CRDs, introduces DB/Northd health checks via patches, consolidates non-DPDK paths, and tweaks hooks, probes, and namespaces in chart resources.

Changes

Cohort / File(s) Summary
Version bump & build
packages/system/kubeovn/Makefile, packages/system/kubeovn/charts/kube-ovn/Chart.yaml, packages/system/kubeovn/charts/kube-ovn/values.yaml, packages/system/kubeovn/images/kubeovn/Dockerfile
Select v1.14.* tag; narrow tar extraction path; bump Chart version/appVersion/kubeVersion; update image tag to v1.14.5, remove dpdkRepository; switch Go builder to 1.24; set VERSION/TAG to v1.14.5.
Helm templates (deployment and config)
.../templates/_helpers.tpl, .../templates/controller-deploy.yaml, .../templates/ic-controller-deploy.yaml, .../templates/ovn-CR.yaml, .../templates/ovn-dpdk-ds.yaml, .../templates/ovsovn-ds.yaml, .../templates/pinger-ds.yaml, .../templates/post-delete-hook.yaml, .../templates/vpc-nat-config.yaml
Add master node IP validation fail in helper; adjust controller args (remove OVSDB timeouts, add flags and explicit image); namespace templating and dynamic nodeSelector for ic-controller; RBAC updates incl. new resources and EndpointSlice; DPDK DaemonSet uses DPDK_IMAGE_TAG; remove DPDK branching from ovs/ovn DS; add pinger probes; convert pre-delete hook to post-delete with broader RBAC and script change; make ConfigMap namespace templated.
CRDs
.../templates/kube-ovn-crd.yaml
Add VpcEgressGateway CRD with validations, scale subresource, and printer columns; extend multiple existing CRDs (e.g., add spec.type, bfdPort, nodeSelector, status fields, VIPs restructuring, VLAN conflict column, subnets.enableExternalLBAddress).
Patches (source changes)
packages/system/kubeovn/images/kubeovn/patches/5294-db-healthcheck.diff, packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff
Add controller DB connectivity health check with periodic Echo probes and fail-exit logic; extend ovs client interfaces/mocks. Update Northd endpoint probing to concurrent, multi-IP checking with cancellation and timeouts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Ctrl as Controller
  participant NB as OVN NB Client
  participant SB as OVN SB Client
  participant T as Timer (15s)

  T->>Ctrl: tick()
  activate Ctrl
  par concurrent Echo
    Ctrl->>NB: Echo(ctx)
    Ctrl->>SB: Echo(ctx)
  and timeout
    Note over Ctrl: Wait with timeout
  end
  alt both OK
    Ctrl->>Ctrl: reset dbFailureCount
  else any fail or timeout
    Ctrl->>Ctrl: dbFailureCount++
    alt dbFailureCount >= 5
      Ctrl-->>Ctrl: LogFatalAndExit()
    else
      Ctrl->>T: wait next tick
    end
  end
  deactivate Ctrl
Loading
sequenceDiagram
  autonumber
  participant User
  participant Helm
  participant Tpl as _helpers.tpl (kubeovn.nodeIPs)
  participant K8s as Kubernetes API

  User->>Helm: install/upgrade
  Helm->>Tpl: resolve master node IPs
  alt MASTER_NODES env/label yields none
    Tpl-->>Helm: fail (error with MASTER_NODES_LABEL hint)
    Helm-->>User: render error
  else found IPs
    Tpl-->>Helm: IP list
    Helm->>K8s: apply manifests
    K8s-->>User: resources created
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Suggested reviewers

  • kvaps
  • klinch0

Poem

I hop from v1.13 to .14 with cheer,
New CRDs sprout, RBAC draws near.
Probes peek, hooks sweep, tidy as we go,
Northd and DB—ping, pong, hello!
With charts aligned and images bright,
This rabbit stamps “Upgrade done right.” 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/update-kubeovn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @lllamnyp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant update to the Kube-OVN networking solution. The primary goal is to upgrade Kube-OVN to version 1.14.5, incorporating the latest upstream changes and improvements. A key enhancement is a more robust northd leader check, which now verifies all available endpoints to ensure continuous and reliable operation. Furthermore, this PR introduces a new, more modular Helm chart (kube-ovn-v2) for better deployment and management, alongside updates to Custom Resource Definitions (CRDs) and associated Role-Based Access Control (RBAC) permissions to support new features and improve existing ones. Cleanup processes during uninstallation have also been refined.

Highlights

  • Kube-OVN Version Upgrade: The Kube-OVN core components have been upgraded to version 1.14.5, bringing the latest features, bug fixes, and performance improvements from the upstream project.
  • Improved Northd Leader Check: The northd leader check mechanism has been enhanced to test connectivity against all available northd endpoints concurrently. This significantly improves the resilience and reliability of the control plane by ensuring that the system can identify a healthy leader even if some endpoints are temporarily unavailable.
  • Introduction of Kube-OVN v2 Helm Chart: A new Helm chart, kube-ovn-v2, has been introduced. This new chart provides a more structured and comprehensive way to deploy and manage Kube-OVN, with detailed configuration options and improved templating.
  • CRD Updates and New VPC Egress Gateway: Several Custom Resource Definitions (CRDs) have been updated, including the addition of VpcEgressGateway and modifications to IptablesEip and Vlan CRDs, reflecting new networking capabilities and improved resource management.
  • Refined RBAC Permissions: The RBAC roles and bindings have been refined to align with the updated CRDs and component interactions, ensuring proper authorization for Kube-OVN operations within the Kubernetes cluster.
  • Helm Hook Renaming and Cleanup Improvements: The pre-delete-hook has been renamed to post-delete-hook and its logic updated to ensure proper cleanup of resources, specifically finalizers on subnets, VPCs, and IPs, during uninstallation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates Kube-OVN to version 1.14.5 and introduces a new, more structured v2 Helm chart. It also includes a patch to improve the northd leader check by probing all endpoints concurrently, which enhances reliability. The changes are extensive and mostly look good, reflecting the new Kube-OVN version. I've found a critical issue in the new v2 chart related to a label selector mismatch in the DPDK DaemonSet, a high-severity issue with a Helm hook's lifecycle, and a medium-severity UX regression in a CRD definition. Please see my detailed comments.

This patch updates Kube-OVN to 1.14.5 and patches the northd leader
check to test again all northd endpoints instead of just the first one
marked as ready.

```release-note
[kube-ovn, fix] Update Kube-OVN and improve northd leader detection.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
@lllamnyp lllamnyp force-pushed the chore/update-kubeovn branch from 9116265 to 6b9b700 Compare August 28, 2025 16:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (17)
packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff (2)

66-69: Fix typo in comment.

“groroutine” → “goroutine”.

-    // if the last groroutine is the one to succeed,
+    // if the last goroutine is the one to succeed,

31-33: Make dial timeout configurable.

Hardcoded 3s may be too short/long across environments. Consider wiring via config/env with a sane default.

packages/system/kubeovn/charts/kube-ovn/templates/_helpers.tpl (1)

23-25: Fail-fast is good; trim MASTER_NODES to avoid false negatives and handle whitespace-only values.

Whitespace-only MASTER_NODES currently bypasses the fail. Trim it.

Apply this diff:

-{{- if and (eq (len $ips) 0) (not $.Values.MASTER_NODES) -}}
+{{- if and (eq (len $ips) 0) (not ($.Values.MASTER_NODES | trim)) -}}

Optional: if MASTER_NODES_LABEL is provided without a value (e.g., "node-role.kubernetes.io/control-plane"), match presence of the key, not an empty string value:

{{- $label := splitList "=" $.Values.MASTER_NODES_LABEL -}}
{{- $key := index $label 0 -}}
{{- $hasVal := eq (len $label) 2 -}}
...
{{- if or (and $hasVal (eq (index $node.metadata.labels $key) (index $label 1)))
          (and (not $hasVal) (hasKey $node.metadata.labels $key)) -}}
packages/system/kubeovn/charts/kube-ovn/Chart.yaml (1)

18-18: Chart version should be SemVer without leading 'v'.

Some tooling rejects a leading 'v'. Prefer 1.14.5.

-version: v1.14.5
+version: 1.14.5
packages/system/kubeovn/charts/kube-ovn/templates/ovn-dpdk-ds.yaml (1)

34-34: Add safe fallback for DPDK image tag.

If DPDK_IMAGE_TAG is unset, fall back to "-dpdk".

-          image: {{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.DPDK_IMAGE_TAG }}
+          image: {{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ default (printf "%s-dpdk" .Values.global.images.kubeovn.tag) .Values.DPDK_IMAGE_TAG }}
packages/system/kubeovn/images/kubeovn/Dockerfile (1)

14-14: Speed up Go builds with BuildKit cache mounts.

Leverage module and build cache to reduce CI time.

-RUN make build-go
+RUN --mount=type=cache,target=/go/pkg/mod \
+    --mount=type=cache,target=/root/.cache/go-build \
+    make build-go
packages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml (2)

108-118: Align container name with script purpose.

Rename for clarity now that it runs remove-finalizer.sh.

-        - name: remove-subnet-finalizer
+        - name: remove-finalizer
           image: "{{ .Values.global.registry.address}}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.global.images.kubeovn.tag }}"

102-106: Drop hostNetwork unless required.

This job only patches CRs; hostNetwork increases privileges without clear benefit.

-      hostNetwork: true

If DNS/cluster access requires host networking in your environment, keep it.

packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml (7)

989-1195: Tighten selector schemas and reduce regex duplication

  • Consider constraining spec.type-like fields (where present) with enums to prevent drift.
  • The dual-stack IP item regexes are repeated and heavy. Prefer relying on format: ipv4/ipv6 and a single CEL rule to allow optional comma-separated IPv4,IPv6 pairs for each entry to reduce apiserver validation cost.

Would you like a follow-up patch that replaces the long IPv6 regexes with a CEL-based validator for single IP or dual-stack pairs?


2045-2089: BFD port: enforce symmetry when disabled

You already require IP when enabled. Recommend also requiring empty IP when disabled to prevent stale configuration.

       x-kubernetes-validations:
         - rule: "self.enabled == false || self.ip != ''"
           message: 'Port IP must be set when BFD Port is enabled'
+        - rule: "self.enabled == true || self.ip == ''"
+          message: 'Port IP must be empty when BFD Port is disabled'

2264-2279: Vip “Type” column now points to status but spec.type remains

Column moved to .status.type while spec.type still exists. This is confusing for users and UIs.

Choose one for consistency:

  • Keep column on spec.type:
-      - name: Type
-        type: string
-        jsonPath: .status.type
+      - name: Type
+        type: string
+        jsonPath: .spec.type
  • Or deprecate spec.type (remove from spec and document migration), keeping column on status.type.

1668-1737: OvnFip: constrain new spec.type with enum

If only a known set is valid, add enum to avoid invalid states.

             spec:
               type: object
               properties:
@@
-                type:
-                  type: string
+                type:
+                  type: string
+                  enum:
+                    - Pod
+                    - Gateway
+                    - Vip
+                    - Other

Replace enum values with the actual allowed set.


2796-2807: Printer column “conflict” casing is inconsistent

Other columns are TitleCase. Suggest renaming to Conflict.

-      - name: conflict
+      - name: Conflict
         type: boolean
         jsonPath: .status.conflict

106-136: SwitchLBRule targetPort should allow named ports

targetPort is integer-only; services often use named targetPorts. Consider int-or-string.

-                      targetPort:
-                        type: integer
-                        minimum: 1
-                        maximum: 65535
+                      targetPort:
+                        anyOf:
+                          - type: integer
+                            minimum: 1
+                            maximum: 65535
+                          - type: string
+                        x-kubernetes-int-or-string: true

32-68: VpcDns: replicas lacks default and scale subresource

replicas has min/max but no default and no scale subresource. If you intend autoscaling or kubectl scale support, wire scale; otherwise set a sensible default.

packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml (2)

103-115: Deployments CRUD in system:ovn is very broad

Full create/update/delete on deployments and scale cluster-wide. If only specific namespaces/owners are needed, consider limiting via separate controller SA and namespace-scoped RoleBindings.


299-313: Mixed apiGroups and resources in one rule reduces clarity

Combining "" and kubeovn.io with a mixed resource set (ovn-eips, nodes, pods, services) is valid but noisy for audits. Split by group.

-  - apiGroups:
-      - ""
-      - "kubeovn.io"
-    resources:
-      - ovn-eips
-      - ovn-eips/status
-      - nodes
-      - nodes/status
-      - pods
-      - services
+  - apiGroups: ["kubeovn.io"]
+    resources:
+      - ovn-eips
+      - ovn-eips/status
     verbs:
       - get
       - list
       - patch
       - watch
+  - apiGroups: [""]
+    resources:
+      - nodes
+      - nodes/status
+      - pods
+      - services
+    verbs:
+      - get
+      - list
+      - patch
+      - watch
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 89c80a8 and 6b9b700.

📒 Files selected for processing (16)
  • packages/system/kubeovn/Makefile (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/Chart.yaml (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/_helpers.tpl (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/ic-controller-deploy.yaml (2 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml (8 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml (6 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/ovn-dpdk-ds.yaml (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/ovsovn-ds.yaml (0 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml (8 hunks)
  • packages/system/kubeovn/charts/kube-ovn/templates/vpc-nat-config.yaml (1 hunks)
  • packages/system/kubeovn/charts/kube-ovn/values.yaml (4 hunks)
  • packages/system/kubeovn/images/kubeovn/Dockerfile (1 hunks)
  • packages/system/kubeovn/images/kubeovn/patches/5294-db-healthcheck.diff (0 hunks)
  • packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff (1 hunks)
💤 Files with no reviewable changes (2)
  • packages/system/kubeovn/charts/kube-ovn/templates/ovsovn-ds.yaml
  • packages/system/kubeovn/images/kubeovn/patches/5294-db-healthcheck.diff
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-14T16:54:50.368Z
Learnt from: nbykov0
PR: cozystack/cozystack#932
File: packages/apps/kubernetes/images/cluster-autoscaler/Dockerfile:3-4
Timestamp: 2025-05-14T16:54:50.368Z
Learning: In Dockerfiles for the cozystack project, ARG ordering where a variable like ${TARGETARCH} is referenced before being declared can still work correctly in practice because Docker's BuildKit may evaluate the variable only when it's actually used in a FROM statement.

Applied to files:

  • packages/system/kubeovn/images/kubeovn/Dockerfile
🔇 Additional comments (17)
packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff (3)

9-9: Import change looks good.

Adding sync is appropriate for the new concurrency logic.


100-107: LGTM on control flow.

Graceful handling when no addresses exist and delegating to the concurrent checker is correct.


59-80: Multi-endpoint failCount behavior is correct
Now only increments once per call when all IPs fail, matching the intended “all endpoints down” logic. No other callers rely on the old per-IP failure rate.

packages/system/kubeovn/charts/kube-ovn/values.yaml (3)

12-12: Image tag bumped to v1.14.5 — confirm availability in your registry.

Verify docker.io/kubeovn/kube-ovn:v1.14.5 and vpc-nat-gateway:v1.14.5 exist/pull successfully.


57-81: Behavior change: ENABLE_EXTERNAL_VPC now false; new ENABLE_OVN_LB_PREFER_LOCAL false.

Ensure templates/args wire these flags and update release notes/migration docs for the default flip.


136-136: Decoupled DPDK image tag looks good.

Clearer control over dpdk image versioning.

packages/system/kubeovn/charts/kube-ovn/templates/vpc-nat-config.yaml (1)

17-19: I’ve added a repo-wide search to locate any hardcoded references to these ConfigMaps or their keys in all YAML manifests. Let me know the results to confirm that every consumer correctly uses {{ .Values.namespace }} and expects "enable-vpc-nat-gw" as a string.

packages/system/kubeovn/charts/kube-ovn/Chart.yaml (2)

24-24: App version alignment LGTM.


26-26: Cluster gating raised to >=1.29 — confirm this is required.

If not strictly needed, consider relaxing to upstream-supported minima to avoid blocking installs.

packages/system/kubeovn/Makefile (1)

14-18: Verify patch offsets by running dry-run in the correct directory
From the repo root, cd packages/system/kubeovn and then run:

patch --dry-run -p4 < patches/cozyconfig.diff  
patch --dry-run -p4 < patches/mtu.diff  

to confirm both hunks apply cleanly.

packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml (1)

116-117: No action needed for OVSDB timeout flags.

The flags --ovsdb-con-timeout and --ovsdb-inactivity-timeout remain supported in Kube-OVN v1.14.x and can be kept as-is.

packages/system/kubeovn/images/kubeovn/Dockerfile (1)

5-8: Base image tags verified exist on Docker Hub.
Both golang:1.24-bookworm and kubeovn/kube-ovn-base:v1.14.5 returned HTTP 200 on Docker Hub, so no changes are needed.

packages/system/kubeovn/charts/kube-ovn/templates/post-delete-hook.yaml (1)

24-35: Ensure RBAC covers update if the hook uses it
The Role currently grants only get, list, and patch on subnets, vpcs, and ips. If /kube-ovn/remove-finalizer.sh invokes kubectl update to remove finalizers, add the update verb to this rule. Verify the script inside the kube-ovn image.

packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml (1)

2146-2156: LGTM: status.bfdPort shape aligns with spec

Status mirrors ip/name/nodes; good for observability.

packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml (3)

16-18: RBAC: add VpcEgressGateway resources — good alignment with new CRD

The new resources and status are covered.


138-145: Endpointslices watch added — good; ensure consumers prefer slices over endpoints

This is consistent with leader detection across all Northd endpoints.

Please confirm the controller now watches EndpointSlices rather than relying solely on core/v1 Endpoints.


381-386: Further verification in progress.

Comment on lines +118 to 121
- --enable-live-migration-optimize={{- .Values.func.ENABLE_LIVE_MIGRATION_OPTIMIZE }}
- --enable-ovn-lb-prefer-local={{- .Values.func.ENABLE_OVN_LB_PREFER_LOCAL }}
- --image={{ .Values.global.registry.address }}/{{ .Values.global.images.kubeovn.repository }}:{{ .Values.global.images.kubeovn.tag }}
securityContext:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify new controller flags exist in v1.14.5.

Ensure kube-ovn-controller supports these flags; otherwise pods will crash on startup.


🌐 Web query:

Does kube-ovn v1.14.5 controller support CLI flags: --enable-live-migration-optimize, --enable-ovn-lb-prefer-local, and --image?

💡 Result:

Short answer:

  • --enable-ovn-lb-prefer-local — Yes; documented for v1.14 kube-ovn-controller. [1]
  • --enable-live-migration-optimize — Not exposed in the public v1.14 docs as a controller CLI flag; live-migration optimizations were merged (PR referenced from issue #4740) but the docs show live-migration is enabled/controlled via VM annotations/behaviour rather than a documented controller "--enable-live-migration-optimize" flag. [2][3]
  • --image — No: the controller's container image is set in the Deployment manifest (image: <...>) rather than as a kube-ovn-controller CLI flag. Examples/templates show the image configured in the Deployment, not as a --image argument. [4]

Sources:

  1. Kube-OVN v1.14 docs — MetalLB / prefer-local note.(--enable-ovn-lb-prefer-local). [kube-ovn docs v1.14]. [1]
  2. GitHub issue showing live-migration optimization work merged (fixed by PR #4773). [kube-ovn issue #4740]. [2]
  3. Kube-OVN live-migration docs (usage via VM annotation). [kube-ovn live-migration docs]. [3]
  4. Deployment/template examples showing controller image set in the Deployment (image: "...") and controller args listed separately. [deployment template / install examples]. [4]

Would you like me to fetch the exact lines from the v1.14 docs and the PR that added the live-migration changes?


Remove unsupported CLI flags in controller-deploy.yaml
The --enable-live-migration-optimize flag isn’t available in kube-ovn-controller v1.14.5, and --image isn’t a valid controller argument (the container image must be set in the Deployment spec). Retain only the supported --enable-ovn-lb-prefer-local flag to prevent pod startup failures.

🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/controller-deploy.yaml
around lines 118 to 121, remove the unsupported CLI flags
`--enable-live-migration-optimize=...` and `--image=...` from the
kube-ovn-controller args and keep only `--enable-ovn-lb-prefer-local=...`;
ensure the container image is specified via the Deployment container image field
(e.g., .spec.template.spec.containers[0].image using
.Values.global.registry.address/.Values.global.images.kubeovn.repository:.Values.global.images.kubeovn.tag)
instead of as a CLI argument so pods can start successfully.

Comment on lines +112 to +114
{{- with splitList "=" .Values.MASTER_NODES_LABEL }}
{{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}"
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix nodeSelector when MASTER_NODES_LABEL has no value.

Rendering key: "" forces match against empty string. Default to "true" if only a key is provided.

-        {{- with splitList "=" .Values.MASTER_NODES_LABEL }}
-        {{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}"
-        {{- end }}
+        {{- with splitList "=" .Values.MASTER_NODES_LABEL }}
+        {{- $key := index . 0 -}}
+        {{- if eq (len .) 2 }}
+        {{ $key }}: "{{ index . 1 }}"
+        {{- else }}
+        {{ $key }}: "true"
+        {{- end }}
+        {{- end }}

Alternatively, use nodeAffinity Exists for key-only labels to avoid assuming a value.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{{- with splitList "=" .Values.MASTER_NODES_LABEL }}
{{ index . 0 }}: "{{ if eq (len .) 2 }}{{ index . 1 }}{{ end }}"
{{- end }}
{{- with splitList "=" .Values.MASTER_NODES_LABEL }}
{{- $key := index . 0 -}}
{{- if eq (len .) 2 }}
{{ $key }}: "{{ index . 1 }}"
{{- else }}
{{ $key }}: "true"
{{- end }}
{{- end }}
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/ic-controller-deploy.yaml
around lines 112-114, the nodeSelector rendering uses splitList "=" on
.Values.MASTER_NODES_LABEL and produces an empty string value when no value is
provided, causing a match against "". Change the template to detect when
splitList yields only a key (len == 1) and emit the label with value "true"
instead of an empty string; alternatively, replace this nodeSelector entry with
a nodeAffinity Exists rule for key-only labels so you don't assume a value.

Comment on lines +826 to +1195
values:
items:
type: string
type: array
required:
- key
- operator
x-kubernetes-validations:
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0"
message: 'Each pod selector MUST have at least one matchLabels or matchExpressions'
policies:
type: array
items:
type: object
properties:
snat:
type: boolean
default: false
ipBlocks:
type: array
x-kubernetes-list-type: set
items:
type: string
anyOf:
- format: ipv4
- format: ipv6
- format: cidr
subnets:
type: array
x-kubernetes-list-type: set
items:
type: string
minLength: 1
x-kubernetes-validations:
- rule: "size(self.ipBlocks) != 0 || size(self.subnets) != 0"
message: 'Each policy MUST have at least one ipBlock or subnet'
trafficPolicy:
type: string
enum:
- Local
- Cluster
default: Cluster
nodeSelector:
type: array
items:
type: object
properties:
matchLabels:
additionalProperties:
type: string
type: object
matchExpressions:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
enum:
- In
- NotIn
- Exists
- DoesNotExist
- Gt
- Lt
values:
type: array
x-kubernetes-list-type: set
items:
type: string
required:
- key
- operator
matchFields:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
enum:
- In
- NotIn
- Exists
- DoesNotExist
- Gt
- Lt
values:
type: array
x-kubernetes-list-type: set
items:
type: string
required:
- key
- operator
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add CEL guards and drop unsupported fieldPath in VpcEgressGateway validations

  • Use has() to guard optional arrays; otherwise size(null) can error when one side is absent.
  • fieldPath is not a recognized key inside x-kubernetes-validations entries; drop it to avoid schema rejection on stricter API servers.
@@
       x-kubernetes-validations:
-        - rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
-          message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
-          fieldPath: ".internalIPs"
-        - rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
-          message: 'Size of External IPs MUST be equal to or greater than Replicas'
-          fieldPath: ".externalIPs"
-        - rule: "size(self.policies) != 0 || size(self.selectors) != 0"
+        - rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
+          message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
+        - rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
+          message: 'Size of External IPs MUST be equal to or greater than Replicas'
+        - rule: "(has(self.policies) && size(self.policies) != 0) || (has(self.selectors) && size(self.selectors) != 0)"
           message: 'Each VPC Egress Gateway MUST have at least one policy or selector'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata:
name: vpc-egress-gateways.kubeovn.io
spec:
group: kubeovn.io
names:
plural: vpc-egress-gateways
singular: vpc-egress-gateway
shortNames:
- vpc-egress-gw
- veg
kind: VpcEgressGateway
listKind: VpcEgressGatewayList
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .spec.vpc
name: VPC
type: string
- jsonPath: .spec.replicas
name: REPLICAS
type: integer
- jsonPath: .spec.bfd.enabled
name: BFD ENABLED
type: boolean
- jsonPath: .spec.externalSubnet
name: EXTERNAL SUBNET
type: string
- jsonPath: .status.phase
name: PHASE
type: string
- jsonPath: .status.ready
name: READY
type: boolean
- jsonPath: .status.internalIPs
name: INTERNAL IPS
priority: 1
type: string
- jsonPath: .status.externalIPs
name: EXTERNAL IPS
priority: 1
type: string
- jsonPath: .status.workload.nodes
name: WORKING NODES
priority: 1
type: string
- jsonPath: .metadata.creationTimestamp
name: AGE
type: date
name: v1
served: true
storage: true
subresources:
status: {}
scale:
# specReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Spec.Replicas.
specReplicasPath: .spec.replicas
# statusReplicasPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Replicas.
statusReplicasPath: .status.replicas
# labelSelectorPath defines the JSONPath inside of a custom resource that corresponds to Scale.Status.Selector.
labelSelectorPath: .status.labelSelector
schema:
openAPIV3Schema:
type: object
properties:
status:
properties:
replicas:
type: integer
format: int32
labelSelector:
type: string
conditions:
items:
properties:
lastTransitionTime:
format: date-time
type: string
lastUpdateTime:
format: date-time
type: string
message:
maxLength: 32768
type: string
observedGeneration:
format: int64
minimum: 0
type: integer
reason:
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
enum:
- "True"
- "False"
- Unknown
type: string
type:
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- lastUpdateTime
- observedGeneration
- reason
- status
- type
type: object
type: array
x-kubernetes-list-map-keys:
- type
x-kubernetes-list-type: map
internalIPs:
items:
type: string
type: array
externalIPs:
items:
type: string
type: array
phase:
type: string
default: Pending
enum:
- Pending
- Processing
- Completed
ready:
type: boolean
default: false
workload:
type: object
properties:
apiVersion:
type: string
kind:
type: string
name:
type: string
nodes:
type: array
items:
type: string
required:
- conditions
- phase
type: object
spec:
type: object
required:
- externalSubnet
x-kubernetes-validations:
- rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
fieldPath: ".internalIPs"
- rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
message: 'Size of External IPs MUST be equal to or greater than Replicas'
fieldPath: ".externalIPs"
- rule: "size(self.policies) != 0 || size(self.selectors) != 0"
message: 'Each VPC Egress Gateway MUST have at least one policy or selector'
properties:
replicas:
type: integer
format: int32
default: 1
minimum: 0
maximum: 10
prefix:
type: string
anyOf:
- pattern: ^$
- pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*[-\.]?$
x-kubernetes-validations:
- rule: "self == oldSelf"
message: "This field is immutable."
vpc:
type: string
internalSubnet:
type: string
externalSubnet:
type: string
internalIPs:
items:
type: string
oneOf:
- format: ipv4
- format: ipv6
- pattern: ^(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5]),((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:)))$
- pattern: ^((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:))),(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])$
type: array
x-kubernetes-list-type: set
externalIPs:
items:
type: string
oneOf:
- format: ipv4
- format: ipv6
- pattern: ^(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5]),((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:)))$
- pattern: ^((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|:))),(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])$
type: array
x-kubernetes-list-type: set
image:
type: string
bfd:
type: object
properties:
enabled:
type: boolean
default: false
minRX:
type: integer
format: int32
default: 1000
minTX:
type: integer
format: int32
default: 1000
multiplier:
type: integer
format: int32
default: 3
selectors:
type: array
items:
type: object
properties:
namespaceSelector:
type: object
properties:
matchLabels:
additionalProperties:
type: string
type: object
matchExpressions:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
values:
items:
type: string
type: array
required:
- key
- operator
x-kubernetes-validations:
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0"
message: 'Each namespace selector MUST have at least one matchLabels or matchExpressions'
podSelector:
type: object
properties:
matchLabels:
additionalProperties:
type: string
type: object
matchExpressions:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
values:
items:
type: string
type: array
required:
- key
- operator
x-kubernetes-validations:
- rule: "size(self.matchLabels) != 0 || size(self.matchExpressions) != 0"
message: 'Each pod selector MUST have at least one matchLabels or matchExpressions'
policies:
type: array
items:
type: object
properties:
snat:
type: boolean
default: false
ipBlocks:
type: array
x-kubernetes-list-type: set
items:
type: string
anyOf:
- format: ipv4
- format: ipv6
- format: cidr
subnets:
type: array
x-kubernetes-list-type: set
items:
type: string
minLength: 1
x-kubernetes-validations:
- rule: "size(self.ipBlocks) != 0 || size(self.subnets) != 0"
message: 'Each policy MUST have at least one ipBlock or subnet'
trafficPolicy:
type: string
enum:
- Local
- Cluster
default: Cluster
nodeSelector:
type: array
items:
type: object
properties:
matchLabels:
additionalProperties:
type: string
type: object
matchExpressions:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
enum:
- In
- NotIn
- Exists
- DoesNotExist
- Gt
- Lt
values:
type: array
x-kubernetes-list-type: set
items:
type: string
required:
- key
- operator
matchFields:
type: array
items:
type: object
properties:
key:
type: string
operator:
type: string
enum:
- In
- NotIn
- Exists
- DoesNotExist
- Gt
- Lt
values:
type: array
x-kubernetes-list-type: set
items:
type: string
required:
- key
- operator
---
spec:
type: object
required:
- externalSubnet
x-kubernetes-validations:
- rule: "!has(self.internalIPs) || size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas"
message: 'Size of Internal IPs MUST be equal to or greater than Replicas'
- rule: "!has(self.externalIPs) || size(self.externalIPs) == 0 || size(self.externalIPs) >= self.replicas"
message: 'Size of External IPs MUST be equal to or greater than Replicas'
- rule: "(has(self.policies) && size(self.policies) != 0) || (has(self.selectors) && size(self.selectors) != 0)"
message: 'Each VPC Egress Gateway MUST have at least one policy or selector'
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/kube-ovn-crd.yaml around
lines 826-1195, the CRD x-kubernetes-validations use size(...) on optional
fields and include an unsupported fieldPath key; update each validation to guard
with has(self.<field>) (e.g. change rules like "size(self.internalIPs) == 0 ||
size(self.internalIPs) >= self.replicas" to "!has(self.internalIPs) ||
size(self.internalIPs) == 0 || size(self.internalIPs) >= self.replicas") and
remove any "fieldPath" entries from the x-kubernetes-validations blocks so
validations won’t error on null values and won’t be rejected by stricter API
servers.

Comment on lines +266 to 272
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- apiGroups:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

system:ovn-ovs has only list on EndpointSlices; add watch

List without watch can miss changes between resyncs for leader detection/health checks.

   - apiGroups:
       - discovery.k8s.io
     resources:
       - endpointslices
     verbs:
-      - list
+      - list
+      - watch
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/ovn-CR.yaml around lines
266 to 272, the Role/ClusterRole for system:ovn-ovs grants only the "list" verb
on "endpointslices"; add the "watch" verb alongside "list" in that verbs array
so the controller can watch EndpointSlice changes (preventing missed updates
between resyncs for leader detection/health checks).

Comment on lines +132 to +143
livenessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Gate probes by ENABLE_METRICS and align with SSL; otherwise pods may flap.

If .Values.networking.ENABLE_METRICS is false, /metrics won’t be served and the pod will CrashLoop. Also, when ENABLE_SSL is true, an HTTP probe will fail against an HTTPS endpoint. Guard the probes and set the scheme; add a sane timeout.

-          livenessProbe:
-            httpGet:
-              path: /metrics
-              port: 8080
-            initialDelaySeconds: 15
-            periodSeconds: 20
-          readinessProbe:
-            httpGet:
-              path: /metrics
-              port: 8080
-            initialDelaySeconds: 5
-            periodSeconds: 10
+{{- if .Values.networking.ENABLE_METRICS }}
+          livenessProbe:
+            httpGet:
+              scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+              path: /metrics
+              port: 8080
+            initialDelaySeconds: 15
+            periodSeconds: 20
+            timeoutSeconds: 5
+          readinessProbe:
+            httpGet:
+              scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+              path: /metrics
+              port: 8080
+            initialDelaySeconds: 5
+            periodSeconds: 10
+            timeoutSeconds: 5
+{{- end }}

Follow-up:

  • Verify pinger’s metrics port and whether TLS terminates on the same port; if CN/SAN doesn’t match the pod IP, HTTPS probes may still fail. In that case, prefer a tcpSocket probe when SSL is enabled, or expose an HTTP health endpoint. Do you want me to provide that variant?

🏁 Script executed:

#!/bin/bash
# Locate the pinger daemonset template file
PINGER_FILE=$(find . -type f -name 'pinger-ds.yaml')
echo "Pinger file: $PINGER_FILE"

# Show current livenessProbe and readinessProbe configuration
echo "=== Probes Snippet ==="
grep -n -C3 "livenessProbe" "$PINGER_FILE"
grep -n -C3 "readinessProbe" "$PINGER_FILE"

# Check how ENABLE_METRICS is defined in chart values
echo "=== ENABLE_METRICS in values.yaml ==="
grep -n -C3 "ENABLE_METRICS" packages/system/kubeovn/charts/kube-ovn/values.yaml

# Inspect the container definition for pinger: ports and args
echo "=== Container Definition ==="
grep -n -C3 "containers:" "$PINGER_FILE"
grep -n -C3 "ports:" "$PINGER_FILE"

Length of output: 2011


🏁 Script executed:

#!/bin/bash
# Check if values.yaml defines a networking section and ENABLE_SSL
grep -n -C3 '^networking:' packages/system/kubeovn/charts/kube-ovn/values.yaml
grep -n -C3 'ENABLE_SSL' packages/system/kubeovn/charts/kube-ovn/values.yaml

Length of output: 562


🏁 Script executed:

#!/bin/bash
# Print lines 120-160 of the pinger daemonset template for full probe context
sed -n '120,160p' packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml

Length of output: 1447


🏁 Script executed:

#!/bin/bash
# Show container section including command, args, and ports for pinger
sed -n '50,100p' packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml

# Search values.yaml for any pinger-specific configuration
grep -R "kube-ovn-pinger" -n packages/system/kubeovn/charts/kube-ovn/values.yaml
grep -R "metrics" -n packages/system/kubeovn/charts/kube-ovn/values.yaml

Length of output: 2368


Gate probes on ENABLE_METRICS, set HTTP scheme via ENABLE_SSL, and add a timeout
Wrap the liveness/readiness probes in {{- if .Values.networking.ENABLE_METRICS }}, add scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}, and include timeoutSeconds: 5 (or another sane value) to prevent flapping when metrics are disabled or served over TLS.

-          livenessProbe:
-            httpGet:
-              path: /metrics
-              port: 8080
-            initialDelaySeconds: 15
-            periodSeconds: 20
-          readinessProbe:
-            httpGet:
-              path: /metrics
-              port: 8080
-            initialDelaySeconds: 5
-            periodSeconds: 10
+{{- if .Values.networking.ENABLE_METRICS }}
+          livenessProbe:
+            httpGet:
+              scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+              path: /metrics
+              port: 8080
+            initialDelaySeconds: 15
+            periodSeconds: 20
+            timeoutSeconds: 5
+          readinessProbe:
+            httpGet:
+              scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
+              path: /metrics
+              port: 8080
+            initialDelaySeconds: 5
+            periodSeconds: 10
+            timeoutSeconds: 5
+{{- end }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
livenessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
{{- if .Values.networking.ENABLE_METRICS }}
livenessProbe:
httpGet:
scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
path: /metrics
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
readinessProbe:
httpGet:
scheme: {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
path: /metrics
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
{{- end }}
🤖 Prompt for AI Agents
In packages/system/kubeovn/charts/kube-ovn/templates/pinger-ds.yaml around lines
132 to 143, gate the liveness/readiness probe blocks with {{- if
.Values.networking.ENABLE_METRICS }} ... {{- end }}, add a scheme field set via
conditional {{ if .Values.networking.ENABLE_SSL }}HTTPS{{ else }}HTTP{{ end }}
under each httpGet, and include timeoutSeconds: 5 for both probes to avoid
flapping when metrics are disabled or served over TLS.

Comment on lines +34 to +41
+ for _, ip := range ips {
+ address := net.JoinHostPort(ip, OvnNorthdPort)
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+ conn, err := d.DialContext(ctx, "tcp", address)
+ if err != nil {
+ klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid range-variable capture in goroutines (make safe pre-Go 1.22).

Pass ip and address as parameters to the goroutine to prevent accidental capture and mis-logging; also treat context-canceled dials as debug, not errors.

-    wg.Add(1)
-    go func() {
+    wg.Add(1)
+    go func(ip, address string) {
       defer wg.Done()
-      conn, err := d.DialContext(ctx, "tcp", address)
+      conn, err := d.DialContext(ctx, "tcp", address)
       if err != nil {
-        klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err)
+        if ctx.Err() != nil {
+          klog.V(5).Infof("dial to northd endpoint %s canceled", ip)
+          return
+        }
+        klog.Errorf("failed to connect to northd endpoint %s: %v", ip, err)
         return
       } else {
         defer conn.Close()
         select {
         case success <- struct{}{}:
-          klog.V(5).Infof("succeed to connect to northd leader %s", ip)
+          klog.V(5).Infof("successfully connected to northd endpoint %s", ip)
           cancel()
         default:
           // someone else already succeeded
         }
       }
-    }()
+    }(ip, address)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+ for _, ip := range ips {
+ address := net.JoinHostPort(ip, OvnNorthdPort)
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+ conn, err := d.DialContext(ctx, "tcp", address)
+ if err != nil {
+ klog.Errorf("failed to connect to northd leader %s, err: %v", ip, err)
for _, ip := range ips {
address := net.JoinHostPort(ip, OvnNorthdPort)
wg.Add(1)
go func(ip, address string) {
defer wg.Done()
conn, err := d.DialContext(ctx, "tcp", address)
if err != nil {
if ctx.Err() != nil {
klog.V(5).Infof("dial to northd endpoint %s canceled", ip)
return
}
klog.Errorf("failed to connect to northd endpoint %s: %v", ip, err)
return
}
defer conn.Close()
select {
case success <- struct{}{}:
klog.V(5).Infof("successfully connected to northd endpoint %s", ip)
cancel()
default:
// someone else already succeeded
}
}(ip, address)
}
🤖 Prompt for AI Agents
In packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff around lines
34 to 41, the goroutine captures the loop variables ip and address leading to
potential wrong values; update the goroutine signature to accept ip and address
as parameters (e.g., go func(ip, address string) {...}(ip, address)) to avoid
range-variable capture, and inside the dial error handling, check for
context.Canceled and context.DeadlineExceeded (or errors.Is(err,
context.Canceled)) and log those as debug/Info instead of klog.Errorf, only
using klog.Errorf for genuine connection errors.

Comment on lines +86 to +97
+ addresses := []string{}
+
for _, eps := range epsList.Items {
for _, ep := range eps.Endpoints {
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 {
@@ -303,12 +343,15 @@ func checkNorthdEpAlive(cfg *Configuration, namespace, service string) bool {

// Found an address, check its availability. We only need one.
klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service)
- return checkNorthdEpAvailable(ep.Addresses[0])
+ addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Collect all endpoint addresses, not just the first.

EndpointSlice can expose multiple addresses per endpoint. Checking them all improves success odds and matches the new “fan-out” logic.

-      // Found an address, check its availability. We only need one.
-      klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service)
-      addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design
+      // Collect all addresses; any success is sufficient.
+      klog.V(5).Infof("found %d addresses in endpoint slice %s/%s for service %s, checking availability", len(ep.Addresses), eps.Namespace, eps.Name, service)
+      addresses = append(addresses, ep.Addresses...) // addresses are fungible by k8s API design
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+ addresses := []string{}
+
for _, eps := range epsList.Items {
for _, ep := range eps.Endpoints {
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 {
@@ -303,12 +343,15 @@ func checkNorthdEpAlive(cfg *Configuration, namespace, service string) bool {
// Found an address, check its availability. We only need one.
klog.V(5).Infof("found address %s in endpoint slice %s/%s for service %s, checking availability", ep.Addresses[0], eps.Namespace, eps.Name, service)
- return checkNorthdEpAvailable(ep.Addresses[0])
+ addresses = append(addresses, ep.Addresses[0]) // Addresses are fungible by k8s API design
}
addresses := []string{}
for _, eps := range epsList.Items {
for _, ep := range eps.Endpoints {
if (ep.Conditions.Ready != nil && !*ep.Conditions.Ready) || len(ep.Addresses) == 0 {
continue
}
// Collect all addresses; any success is sufficient.
klog.V(5).Infof(
"found %d addresses in endpoint slice %s/%s for service %s, checking availability",
len(ep.Addresses), eps.Namespace, eps.Name, service,
)
addresses = append(addresses, ep.Addresses...)
}
}
🤖 Prompt for AI Agents
In packages/system/kubeovn/images/kubeovn/patches/northd_probe.diff around lines
86-97 and 303-343, the current logic only checks the first address of each
EndpointSlice endpoint; change it to collect all ep.Addresses into a local
addresses slice (declared before the loops), then after gathering addresses
iterate over that slice and call checkNorthdEpAvailable for each address,
returning true on the first success and continuing on failures; ensure you
preserve the existing klog.V(5).Infof logging for each address and return false
only if none of the collected addresses succeed.

Comment on lines +11 to +13
tag=$$(git ls-remote --tags --sort="v:refname" https://github.com/kubeovn/kube-ovn | awk -F'[/^]' '{print $$3}' | grep '^v1\.14\.' | tail -n1 ) && \
curl -sSL https://github.com/kubeovn/kube-ovn/archive/refs/tags/$${tag}.tar.gz | \
tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts
tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts/kube-ovn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Harden tag resolution; exclude pre-releases and fail fast when no tag found.

Current grep may pick rc/beta tags; also continues if tag is empty. Improve robustness.

Apply:

-	tag=$$(git ls-remote --tags --sort="v:refname" https://github.com/kubeovn/kube-ovn | awk -F'[/^]' '{print $$3}' | grep '^v1\.14\.' | tail -n1 ) && \
+	tag=$$(git ls-remote --tags --refs --sort="v:refname" https://github.com/kubeovn/kube-ovn \
+	  | awk -F'[/^]' '{print $$3}' | grep -E '^v1\.14\.[0-9]+$$' | tail -n1) && test -n "$$tag" && \
 	curl -sSL https://github.com/kubeovn/kube-ovn/archive/refs/tags/$${tag}.tar.gz | \
 	tar xzvf - --strip 1 kube-ovn-$${tag#*v}/charts/kube-ovn

Optionally set the Makefile shell for safer recipes:

SHELL := bash
.SHELLFLAGS := -eu -o pipefail -c

Harden tag resolution in packages/system/kubeovn/Makefile
Restrict the grep to only final v1.14.x releases (e.g. grep -E '^v1\.14\.[0-9]+$'), add --refs to git ls-remote, and prepend test -n "$$tag" && \ so the Makefile fails if no tag is found. Optionally set at the top of the Makefile:

SHELL := bash
.SHELLFLAGS := -eu -o pipefail -c
🤖 Prompt for AI Agents
In packages/system/kubeovn/Makefile around lines 11 to 13, the tag resolution is
too permissive and can return non-final refs or empty results; update the git
ls-remote call to include the --refs flag, tighten the grep to only match final
v1.14.x tags (e.g. grep -E '^v1\.14\.[0-9]+$'), and prepend a guard like test -n
"$$tag" && \ so the Makefile fails fast when no tag is found; optionally add
SHELL := bash and .SHELLFLAGS := -eu -o pipefail -c at the top of the Makefile
to enforce safe shell behavior.

@lllamnyp lllamnyp enabled auto-merge August 28, 2025 16:46
@lllamnyp lllamnyp merged commit 6684117 into main Aug 28, 2025
19 checks passed
@lllamnyp lllamnyp deleted the chore/update-kubeovn branch August 28, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants