-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.15 Backports 2024-05-16 #32568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
v1.15 Backports 2024-05-16 #32568
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit f437b70 ] Let's ensure consistent ordering by sorting the slice of remote clusters status information, as otherwise undefined given that it is generated iterating over map values. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 796bb18 ] [ backporter's notes: skipped the Makefile.defs hunk, as the comment is not present. ] Introduce a new KVStoreMesh API definition, which currently exposes a /clusters path to provide information about the status of the connection to remote clusters, mimicking the data exposed by Cilium agents. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit c8389d5 ] [ backporter's notes: hit minor conflicts due to different surrounding contexts, solved accepting the combination of changes, and with trivial manual adaptations. ] Let's mimic the same logic already provided by the clustermesh subsystem part of the Cilium agent, which allows to retrieve key information about the connection to and data retrieval from each remote cluster. A subsequent commit is going to wire it to the /clusters API, so that it can then be accessed through a dedicated CLI. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit f7bd2b4 ] Wire the API server logic to expose the kvstoremesh API, and register the handler which returns the remote clusters status information. By default, the API is served on http://localhost:9889, although the address can be tuned through a dedicated parameter. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit e670ca6 ] Extract this logic into a separate function, so that it can be reused for the kvstoremesh-dbg command as well. Similarly, let's also slightly refactor and export the NumReadyClusters helper function. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 7cf1f29 ] Extend the remote clusters output logic to support an additional verbosity level, to be leveraged by the kvstoremesh-dbg command. Specifically, supported verbosity levels are: * verbose: outputs the full information for all clusters; * brief: outputs the full information for non-ready clusters, and a brief one-line summary for ready ones; * non-ready-only: outputs the full information for non-ready clusters, and omits the ready ones. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4857bc7 ] Extend the clustermesh-apiserver binary with a new kvstoremesh-dbg subcommand to interact with the kvstoremesh API, and specifically allow to query and output the status of the connection to remote clusters. The command can be invoked through something along the lines of: $ kubectl exec -it -n kube-system deploy/clustermesh-apiserver -c kvstoremesh \ -- clustermesh-apiserver kvstoremesh-dbg status And outputs the status using the same format of the clustermesh section reported by cilium-dbg status --all-clusters. By default, the output includes a brief one-line report for ready clusters, and full information for non ready ones. Full information for all clusters can be retrieved specifying the --verbose flag. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit d0af3d7 ] We shouldn't import testing code into production code, as it can lead to unexpected side effects due to e.g., init functions. Let's address this by hard-coding the "PolicyEnforcement" constant, rather than importing it. This is consistent with the same usage as part of the "config" command. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit cfb3b8a ] It is intended to be used by CLI tools to retrieve the configuration files of all remote clusters in a given directory, to be used, e.g., for troubleshooting purposes. While being there, let's also replace the path package with the filepath one, which is more appropriate in this context, and it would allow to theoretically handle Windows paths as well. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 2d07cfc ] [ backporter's notes: replaced cmp.Or usage, as not yet available in go 1.21. ] Troubleshooting etcd connectivity issues, regardless of whether to the Cilium kvstore or to a remote cluster, is a complex activity, as issues can concern network connectivity, TLS certificates mismatch, authn/authz policies and so on. As an effort to simplify this process, let's introduce a new utility responsible for performing a set of sanity checks, and outputting the result in a user-friendly way. This utility is intended to be then leveraged by dedicated CLI commands integrated with the various components. More in detail, this utility performs the following operations: * Asserts that the etcd configuration can be correctly parsed; * For each endpoint: - Outputs the DNS resolution; - Assert that the endpoint is reachable at the network level (i.e., that a TCP connection can be successfully established); - When https is enabled, asserts that a TLS connection can be correctly established to the endpoint (i.e., that the provided certificates are valid); the check includes both server and client (if enabled) authentication; additionally outputs TLS specific information; - Outputs the version of the endpoint, as returned by GET /version; * Outputs information regarding Root CAs and client certificates, if configured; additionally checks whether the client certificate is valid according to the root CAs; * Asserts that the etcd client can correctly establish a connection; * Asserts that the heartbeat key can be retrieved, as a basic authorization check. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 9654576 ] Introduce two new cilium-dbg commands, namely "troubleshoot kvstore" and "troubleshoot clustermesh", responsible for running a set of sanity checks to help troubleshoot etcd connectivity issues, covering network connectivity, TLS authentication, authn/authz policies and so on. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 9156e23 ] As useful to troubleshoot kvstore and clustermesh issues. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 6fae9e7 ] Extend the clustermesh-apiserver binary with a new clustermesh-dbg troubleshoot subcommand, responsible for running a set of sanity checks to help troubleshoot etcd connectivity issues, covering network connectivity, TLS authentication, authn/authz policies and so on. The command can be invoked through something along the lines of: $ kubectl exec -it -n kube-system deploy/clustermesh-apiserver -c apiserver \ -- clustermesh-apiserver clustermesh-dbg troubleshoot Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit f575f94 ] Introduce a new troubleshoot subcommand to "clustermesh-apiserver kvstoremesh-dbg", responsible for running a set of sanity checks to help troubleshoot etcd connectivity issues, covering network connectivity, TLS authentication, authn/authz policies and so on. The command can be invoked through something along the lines of: $ kubectl exec -it -n kube-system deploy/clustermesh-apiserver -c kvstoremesh \ -- clustermesh-apiserver kvstoremesh-dbg troubleshoot [--include-local] Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4172c62 ] Document the usage of the newly introduced troubleshoot command to investigate connectivity issues towards the clustermesh control plane (i.e., etcd) in remote clusters. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 48b36f5 ] When KVStoreMesh is enabled, this component is responsible for connecting to the remote clusters. Document the command which can be used to inspect its status and validate whether connection are established correctly. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 189e8ba ] Add a clarification note that the manual steps presented in the guide are mostly alternative to using the automatic tools described in the previous section. Additionally, drop the example errors from the TLS certificates step, as potentially misleading. Users shall leverage the troubleshoot command instead. Finally, let's fix a couple of typos. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 913e41b ] They apply only when Cilium is configured in kvstore mode, which is seldom the case these days. The lack of local information is also not clustermesh specific, and would imply other serious issues. Moreover, the given checks would not work, and lead to additional confusion when Cilium operates in CRD mode. Hence, let's just replace them with the suggestion of checking whether both Cilium agents and KVStoreMesh (if enabled) are correctly connected to all remote clusters, and the synchronization has completed. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
f210127
to
196facb
Compare
/test-backport-1.15 |
squeed
approved these changes
May 16, 2024
nathanjsweet
approved these changes
May 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
backport/1.15
This PR represents a backport for Cilium 1.15.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, a GitHub action will update the labels of these PRs: