Skip to content

Conversation

jrife
Copy link
Contributor

@jrife jrife commented Sep 30, 2024

Add known issue documenting problems using endpoint routes with netkit (#35060, #34042, #33875).

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 30, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Sep 30, 2024
@jrife jrife marked this pull request as ready for review September 30, 2024 19:42
@jrife jrife requested review from a team as code owners September 30, 2024 19:42
@jrife jrife requested review from gentoo-root and a user September 30, 2024 19:42
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a little bit that this feature is brand new and subject to considerable shifts, and the docs will get easily out of date. I'm actually surprised that the feature isn't marked as beta somewhere here in the docs as it hasn't been as thoroughly tested as other functionality.

Here's my counterproposal: We have a netkit label, and issues with the netkit implementation get tagged with them. How about rather than linking all the individual items, we link to https://github.com/cilium/cilium/issues?q=is%3Aopen+is%3Aissue+label%3Afeature%2Fnetkit instead? This way, the link will be up-to-date with the latest status of the changes.

@jrife
Copy link
Contributor Author

jrife commented Oct 1, 2024

Here's my counterproposal: We have a netkit label, and issues with the netkit implementation get tagged with them

I have no strong objection to this, but I worry a bit that issues in that list may get closed after a fix is in place somewhere (e.g. new feature added to the kernel) making users dig for information more on issues they hit. My feeling is that it's probably worth pointing users directly to issues that require an update to their kernel or Cilium version to resolve. We could later add a "fixed in version" note here as well. WDYT?

@jrife jrife force-pushed the jrife/netkit-known-issue branch from afcb3e7 to 198a93d Compare October 1, 2024 16:52
@joestringer
Copy link
Member

I don't think the docs are a great way to track known issues unless those known issues are inherent in the design or expected to be long-term problems. We already struggle to maintain the large amount of docs that we have. We have an issue tracker, and we can even use a link that refers to closed issues as well. If it helps, we could clone issues for individual branches/versions to help track the status of those fixes as they flow through to releases.

I empathize with the goal to minimize the burden on users to find out the stability status, but we're also talking about a cutting edge feature. If there's this many issues with the feature, we should be clearly setting expectations with the users in general terms by marking it as beta.

@joestringer
Copy link
Member

joestringer commented Oct 1, 2024

My feeling is that it's probably worth pointing users directly to issues that require an update to their kernel or Cilium version to resolve.

This I do agree with, if there are specific kernel version minimums for the feature to work correctly then that makes sense to describe in the docs.

@jrife
Copy link
Contributor Author

jrife commented Oct 1, 2024

Alright, sounds fine to me then. I will link to the issue tracker instead then.

marking it as beta

Should we go ahead and make this explicit in the docs then?

This I do agree with, if there are specific kernel version minimums for the feature to work correctly then that makes sense to describe in the docs.

It looks like we already specify a minimum kernel version here, but this may need updating later on. Additionally, we could add guardrails in later Cilium releases to probe the kernel and make sure any required features are supported (speaking specifically about Daniel's proposed patch here).

@joestringer
Copy link
Member

The beta thing makes sense to me, seems like an oversight. /cc @borkmann if you had a different perspective on it.

@borkmann
Copy link
Member

borkmann commented Oct 1, 2024

The beta thing makes sense to me, seems like an oversight. /cc @borkmann if you had a different perspective on it.

yeah that's fine, maybe we can work towards fixing all known issues until 1.17 is out. :) p.s. the netkit patch will go out this week as start

@jrife
Copy link
Contributor Author

jrife commented Oct 1, 2024

if there are specific kernel version minimums for the feature to work correctly

Somewhat unrelated to this PR, but I'm generally curious if Cilium drives backports of kernel patches to various distros. Is ensuring fixes get backported to distro-specific kernels generally left up to end users and distros themselves or does Cilium make an effort to request backports? For RHEL 8 and 9 do we expect new features such as netkit to make it into their kernels given that the kernel versions are 4.18 and 5.10 respectively? It seems like RHEL in particular has backported a lot of features into their kernels that aren't necessarily reflected by the version number.

@borkmann
Copy link
Member

borkmann commented Oct 2, 2024

if there are specific kernel version minimums for the feature to work correctly

Somewhat unrelated to this PR, but I'm generally curious if Cilium drives backports of kernel patches to various distros. Is ensuring fixes get backported to distro-specific kernels generally left up to end users and distros themselves or does Cilium make an effort to request backports? For RHEL 8 and 9 do we expect new features such as netkit to make it into their kernels given that the kernel versions are 4.18 and 5.10 respectively? It seems like RHEL in particular has backported a lot of features into their kernels that aren't necessarily reflected by the version number.

In general it is up to the community. For upstream kernel fixes, we ensure that a proper Fixes tag is present such that these patches automatically get picked up by the kernel stable team for affected kernels. Occasionally, we'll also ask stable to pick up kernel patches when something potentially affects Cilium, but we don't really have much visibility in what downstream distros pull from the upstream stable. For the case of netkit, I've opened a launchpad ticket some time ago to make sure all fixes land in Ubuntu 24.04 LTS where I've been told they regularly pull in everything which comes via stable. For RHEL, their customers typically drive the asks for feature backports, it's more fuzzy there. For Amazon Linux as another example, I've seen them actively backporting some of the eBPF relevant patches to upstream stable where they eventually pull from it once this lands. tl;dr: It's certainly organic and community driven, and its best to send backports to kernel stable when needed and not there yet so that distros eventually pull these in.

@jrife jrife force-pushed the jrife/netkit-known-issue branch from 198a93d to 3879c7d Compare October 3, 2024 15:35
@jrife
Copy link
Contributor Author

jrife commented Oct 3, 2024

Alright, I went ahead and added a warning that calls this out of as a beta feature and links to the list of netkit-related issues.

@joestringer joestringer enabled auto-merge October 3, 2024 16:39
Signed-off-by: Jordan Rife <jrife@google.com>
auto-merge was automatically disabled October 3, 2024 16:55

Head branch was pushed to by a user without write access

@jrife jrife force-pushed the jrife/netkit-known-issue branch from 3879c7d to b1c622c Compare October 3, 2024 16:55
@joestringer joestringer added the release-note/misc This PR makes changes that have no direct user impact. label Oct 3, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 3, 2024
@joestringer joestringer added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 3, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 3, 2024
@joestringer joestringer enabled auto-merge October 3, 2024 16:56
@joestringer joestringer dismissed gentoo-root’s stale review October 3, 2024 16:56

Feedback is addressed

@joestringer
Copy link
Member

/test

@joestringer joestringer added this pull request to the merge queue Oct 3, 2024
Merged via the queue into cilium:main with commit a89e86b Oct 3, 2024
58 checks passed
@giorio94 giorio94 mentioned this pull request Oct 7, 2024
15 tasks
@giorio94 giorio94 added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 7, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. kind/community-contribution This was a contribution made by a community member. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants