Skip to content

Improve uptime during agent upgrades (with L7 policy) #13194

@lrouquette

Description

@lrouquette

Proposal / RFE

It is known that agent upgrades will disrupt connectivity if L7 policy is in place. I'm wondering if there's an opportunity to improve this though, perhaps by leveraging the pre-flight check process and the SO_REUSEPORT socket option.

Today, the preflight-check runs a dummy agent: perhaps, that agent could take over handling the traffic (HTTP, DNS, etc.), listening on the same port(s) as the "real" cilium-agent. An upgrade process could look like this:

  1. run the pre-flight check (and leave it running)
  2. upgrade/restart the cilium-agent
  3. un-install the pre-flight check

I haven't really looked at what this would mean from code perspective (cilium-agent, envoy etc), but in concept, it seems it may work and get rid of the downtime altogether, so I thought I'd throw this here and get some feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.area/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.kind/cfpCilium Feature Proposalkind/community-reportThis was reported by a user in the Cilium community, eg via Slack.kind/featureThis introduces new functionality.pinnedThese issues are not marked stale by our issue bot.roadmapThis functionality is planned for a future release of Cilium.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions