Skip to content

Fetch, save, and upload cluster trace logs from multiple services #440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 19, 2017

Conversation

chxchx
Copy link
Contributor

@chxchx chxchx commented Jun 30, 2017

@sebastienvas @yutongz

This PR addresses the issue: Enable cloud tracing in e2e framework #241

It fetches logs using Stackdriver golang client v2 and writes to log path just like test logs.
Other nit includes renaming some private functions in istio/tests/e2e/framework/framework.go. One might expect get*() only to access/retrieve members without modification. Renaming to pop*() and remove*() makes our intention unambiguous.

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@googlebot googlebot added the cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. label Jun 30, 2017
@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. labels Jun 30, 2017
@yutongz
Copy link
Contributor

yutongz commented Jul 4, 2017

You may want to run bin/linters.sh and fix warnings.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the stackdriver plugin be optional or something configurable? This e2e code needs to run
on other environments (non google cloud). Ideally, we could have a pluggable log uploader interface that can be configured per cloud provider (e.g., runtests.sh --log-provider stackdriver|foo|bar).

@@ -103,7 +103,7 @@ func (t *testCleanup) RegisterCleanable(c Cleanable) {
t.Cleanables = append(t.Cleanables, c)
}

func (t *testCleanup) getCleanable() Cleanable {
func (t *testCleanup) removeCleanable() Cleanable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find removeCleanble more confusing. I think pop would probably be best.

tests/e2e.sh Outdated
@@ -30,14 +30,18 @@ TESTS_TARGETS=($(bazel query 'tests(//tests/e2e/tests/...)'))
FAILURE_COUNT=0
SUMMARY='Tests Summary'



# query project-id and put on environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let s not make this a default. Let s have the user set it only when testing on GCP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not set the project id here. Have someone set it.

func (t testInfo) FetchAndSaveClusterLogs() error {
ctx := context.Background()
glog.Info("Fetching cluster logs")
var PROJECT_ID string = *projectId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to redeclare projectID ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please use MixedCaps or mixedCaps.

defer f.Close()
// fetch log entries with pagination
var entries []*logging.Entry
const PAGE_SIZE = 50
Copy link
Contributor

@sebastienvas sebastienvas Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably define this const at the beginning of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// This list of log identifiers is cherry-picked based on Dev team's interest
// To access the comprehensive list, try in your shell
// $ gcloud beta logging logs list
LOG_IDS := []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to define those as a const at the beginning of the file and add a flag for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to the front in latest commit. Do you mind clarifying what it means by "add a flag" for this?

@sebastienvas
Copy link
Contributor

Can you also add information about deployments, ie list all pods, services, ingress, etc uses in test and save the yaml files somewhere to help debugging ?

@chxchx
Copy link
Contributor Author

chxchx commented Jul 6, 2017

I add another "log_provider" flag as @rshriram suggested and the features in this PR will not be triggered unless "--log_provider=stackdriver" is specified.

Also improved the util.Shell() function to not just run the command but do so in a shell so that one may take advantage of the shell script features such as pipe(|), dup(<, >), fork(&), etc. This makes fetching deployment information (pods, services, ingress, etc, per request by @sebastienvas ) much simpler.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Minor nits. Will leave it to Seb for further reviews

"istio-ingress",
"mixer",
"prometheus",
"statesd-to-prometheus",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

)

const (
tmpPrefix = "istio.e2e."
idMaxLength = 36
pageSize = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment describing what is pageSize

)

var (
logsBucketPath = flag.String("logs_bucket_path", "", "Cloud Storage Bucket path to use to store logs")
projectID = flag.String("project_id", "istio-testing", "Project ID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set this to "". This value only make sense for Google.

@@ -111,6 +134,71 @@ func (t testInfo) Update(r int) error {
return t.createStatusFile(r)
}

func (t testInfo) FetchAndSaveClusterLogs() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if projectID is not empty before starting.

Copy link
Contributor Author

@chxchx chxchx Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted e2e.sh and extract to another shell script (getProjectID.sh) the section that gets project id with gcloud command.
Here, I check if projectID is empty. If so, I call getProjectID. Users can always supersede this value with --project_id flag, but auto-reading the project id is important because the user may have multiple clusters configured and the getProjectID script always identifies the right cluster where the tests are being run, so users do not have to constantly keep tract this value.

@chxchx chxchx force-pushed the durable_logs branch 2 times, most recently from bf34b59 to 0da7e59 Compare July 7, 2017 03:37
@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

Copy link
Contributor

@sebastienvas sebastienvas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

func (t testInfo) FetchAndSaveClusterLogs() error {
// initialize projectID if user did not specify
if *projectID == "" {
pjID, err := util.Shell("getProjectID.sh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry If I confused you. If projectID is not set, you should just skip this (ie not do anything). You don't need tests/e2e/framework/getProjectID.sh script either. You might want to set it in Jenkinsfile when you call e2e.sh in https://github.com/istio/istio/blob/master/Jenkinsfile#L51. You can use env.PROJECT inside Jenkinsfile to find the project.

Copy link
Contributor

@sebastienvas sebastienvas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look

// This list of log identifiers is cherry-picked based on Dev team's interest
// To access the comprehensive list, try in your shell
// $ gcloud beta logging logs list
logIDs = []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to get this list dynamically using api?

@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

@chxchx
Copy link
Contributor Author

chxchx commented Jul 8, 2017

getProjectID.sh is removed and jenkinsfile updated. I tried env.PROJECT but it didn't work. Assuming jenkins only runs on istio-testing, I hard coded the string.

Much time was invested to get log ids at runtime thru stackdriver api. Turns out one need v2 api to do so, which is incompatible with what I have been using. Migration has been painful but is done.

Jenkinsfile Outdated
"--istioctl_url=${istioctlUrl}"
"--istioctl_url=${istioctlUrl} " +
"--log_provider=${logHost} " +
"--project_id=${projID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a run when env.PROJECT did not work. There is no reason why it should not.

@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

1 similar comment
@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

@nlandolfi
Copy link
Contributor

I may be misinterpreting, but although the Prow job passed, it appears the logs failed to upload https://storage.googleapis.com/istio-prow/pull/istio_istio/440/istio-presubmit/56/build-log.txt

grep for "Failed to save logs"

Perhaps this should return a non-zero exit status? Or should a failure here not matter?

@sebastienvas
Copy link
Contributor

I don t think we want to fail the test if we cannot get the logs. On success failing to get the logs should not block a PR, it matters in failure.

@sebastienvas
Copy link
Contributor

sebastienvas commented Jul 18, 2017

@chxchx On your last commit it takes 20 minute per test to get the logs. I don t think that s acceptable. This tripples the presubmit time. I think we might want to keep the kubectl get, but disable the fetching from stackdriver until we understand why this is so slow. Have you considered creating a sink before the job start and exporting the data at the end ?

@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

@istio-testing
Copy link
Collaborator

Jenkins job istio/presubmit passed

@chxchx
Copy link
Contributor Author

chxchx commented Jul 19, 2017

PTAL
Logs fetching has been slow despite edging towards stackdriver api quota limit. To mitigate extended delay, logs are only fetched if any test fails.
Errors during log fetching now surfaces as warnings and will by no means block an otherwise successful PR.

@sebastienvas sebastienvas merged commit 7a9bc3a into istio:master Jul 19, 2017
mandarjog pushed a commit to mandarjog/istio that referenced this pull request Oct 30, 2017
istio#440)

* Add descriptor.Finder to the aspect.Manager.NewAspect interface method. Open question is how to get a descriptor finder into the adapterManager.

* Plumb descriptors through the request path by modifying the config.ChangeListener interface to provide the api handler both a config resolver and the descriptor finder for the current config.

* remove deprecated comment


Former-commit-id: 06a46bd535746de5c40376ebb55264f505d6c1c9
rshriram pushed a commit that referenced this pull request Oct 30, 2017
* Fetch, save, and upload cluster trace logs from multiple services

* Make fetching log optional and not just stackdriver

* project id flag set thru jenkinsfile

* Update jenkensfile with proper flags and pull logs in runtime

* Fetch, save, and upload cluster trace logs from multiple services

* Make fetching log optional and not just stackdriver

* project id flag set thru jenkinsfile

* Update jenkensfile with proper flags and pull logs in runtime

* Expand page size to mitigate api out of resource

* Extend page size and rearrange if branches

* Include the namespace to get deployment info

* Use multierror to append errs

* Turn on log uploads in prow

* Better practice of using multierror

* use go routine to fetch logs in paralell

* Limit the number of concurrent jobs fetching logs

* Fetch log only when test failed


Former-commit-id: 7a9bc3a
mandarjog pushed a commit that referenced this pull request Oct 31, 2017
#440)

* Add descriptor.Finder to the aspect.Manager.NewAspect interface method. Open question is how to get a descriptor finder into the adapterManager.

* Plumb descriptors through the request path by modifying the config.ChangeListener interface to provide the api handler both a config resolver and the descriptor finder for the current config.

* remove deprecated comment


Former-commit-id: b1f31767aa1bf9946cbfb0634d038e698f674231
mandarjog pushed a commit that referenced this pull request Nov 2, 2017
* Fetch, save, and upload cluster trace logs from multiple services

* Make fetching log optional and not just stackdriver

* project id flag set thru jenkinsfile

* Update jenkensfile with proper flags and pull logs in runtime

* Fetch, save, and upload cluster trace logs from multiple services

* Make fetching log optional and not just stackdriver

* project id flag set thru jenkinsfile

* Update jenkensfile with proper flags and pull logs in runtime

* Expand page size to mitigate api out of resource

* Extend page size and rearrange if branches

* Include the namespace to get deployment info

* Use multierror to append errs

* Turn on log uploads in prow

* Better practice of using multierror

* use go routine to fetch logs in paralell

* Limit the number of concurrent jobs fetching logs

* Fetch log only when test failed


Former-commit-id: 7a9bc3a
kyessenov pushed a commit to kyessenov/istio that referenced this pull request Aug 13, 2018
* Add support files to test LDS manually.

* fix indent.
howardjohn pushed a commit to howardjohn/istio that referenced this pull request Jan 12, 2020
howardjohn pushed a commit to howardjohn/istio that referenced this pull request Jan 12, 2020
* add ignore file

* fix for 18130

* generated file update
libesz pushed a commit to libesz/istio that referenced this pull request Nov 11, 2021
luksa pushed a commit to luksa/istio that referenced this pull request Feb 22, 2022
MAISTRA-2423 update federation api to v1

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2424 minor updates to federation api

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2427 configure locality info on imported services

Signed-off-by: rcernich <rcernich@redhat.com>

Cherry-pick multi-root support (istio#387)

* Update go-control-plane to v0.9.9

* Support multiple roots

Squashed commit, contains:
- MAISTRA-2325 Distribute trust bundles over SDS
- MAISTRA-2390 Push trust bundle updates through xDS (istio#357)

MAISTRA-2425 move spec.security.certificateChain to ConfigMap reference; add ability to specify ports for service and discovery (istio#392)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2426 move FederationStatus into MeshFederation (istio#393)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2513 federation API refinements

Signed-off-by: rcernich <rcernich@redhat.com>

[federation] MAISTRA-2237 Encrypt service discovery traffic (istio#411)

MAISTRA-2610 Prefix federation discovery endpoints with /v1/ (istio#422)

MAISTRA-2297 Support updates of federation resources (istio#417)

MAISTRA-2375: Do not create automatic routes for Federation Gateways

Remove a redundant call

`setHostname()` is already being called within `NameForService()`

see
https://github.com/maistra/istio/blob/21ee900cf8825711f70d88dc97afcf6862ed2626/pkg/servicemesh/federation/common/namemapping.go
lines 83, 120, 129

Remove techPreview.meshConfig from PoC example

It's set by default now.

MAISTRA-2611 Fix deletion of service exports to federated mesh (istio#421)

Fix test

MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil (istio#437)

* MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil

* Fix test

MAISTRA-2682 Fix watch mechanism in federation (istio#439)

Previously, no events were read from the watch response, because the read started with an endless loop that waited for data to be available in the decoder's buffer. This never happened, because the buffer is only written to when you call decoder.Decode(); this function was never called because the code waited for the buffer to have data.

MAISTRA-2683 Properly close incoming watch connections when shutting down (istio#440)

Log actual error returned by pollServices() (istio#441)

Previously, instead of the actual error, only the following error message was logged: "expected condition not met".

MAISTRA-2439: Prevent federation from exporting services that are not visible to the federation gateway (istio#432)

By taking into consideration the service annotation
`networking.istio.io/exportTo`.

This annotation restricts where this service is visible: https://istio.io/latest/docs/reference/config/annotations/

If a service is not reachable from the federation gateway namespace due
to this annotation, it should not be exported.

MAISTRA-2617: Do not watch all namespaces in Extensions controller (istio#425)

When using MemberRoll, we should rely on it to provide the list
of namespaces to watch. If not using it, defaults to command line
arguments.

This fixes an istiod startup error as seen in the logs:
```
github.com/maistra/xns-informer/pkg/informers/informer.go:204: Failed to watch *v1.ServiceMeshExtension: failed to list *v1.ServiceMeshExtension: servicemeshextensions.maistra.io is forbidden: User "system:serviceaccount:i1:istiod-service-account-basic" cannot list resource "servicemeshextensions" in API group "maistra.io" at the cluster scope
```
luksa pushed a commit to luksa/istio that referenced this pull request Apr 29, 2022
MAISTRA-2423 update federation api to v1

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2424 minor updates to federation api

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2427 configure locality info on imported services

Signed-off-by: rcernich <rcernich@redhat.com>

Cherry-pick multi-root support (istio#387)

* Update go-control-plane to v0.9.9

* Support multiple roots

Squashed commit, contains:
- MAISTRA-2325 Distribute trust bundles over SDS
- MAISTRA-2390 Push trust bundle updates through xDS (istio#357)

MAISTRA-2425 move spec.security.certificateChain to ConfigMap reference; add ability to specify ports for service and discovery (istio#392)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2426 move FederationStatus into MeshFederation (istio#393)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2513 federation API refinements

Signed-off-by: rcernich <rcernich@redhat.com>

[federation] MAISTRA-2237 Encrypt service discovery traffic (istio#411)

MAISTRA-2610 Prefix federation discovery endpoints with /v1/ (istio#422)

MAISTRA-2297 Support updates of federation resources (istio#417)

MAISTRA-2375: Do not create automatic routes for Federation Gateways

Remove a redundant call

`setHostname()` is already being called within `NameForService()`

see
https://github.com/maistra/istio/blob/21ee900cf8825711f70d88dc97afcf6862ed2626/pkg/servicemesh/federation/common/namemapping.go
lines 83, 120, 129

Remove techPreview.meshConfig from PoC example

It's set by default now.

MAISTRA-2611 Fix deletion of service exports to federated mesh (istio#421)

Fix test

MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil (istio#437)

* MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil

* Fix test

MAISTRA-2682 Fix watch mechanism in federation (istio#439)

Previously, no events were read from the watch response, because the read started with an endless loop that waited for data to be available in the decoder's buffer. This never happened, because the buffer is only written to when you call decoder.Decode(); this function was never called because the code waited for the buffer to have data.

MAISTRA-2683 Properly close incoming watch connections when shutting down (istio#440)

Log actual error returned by pollServices() (istio#441)

Previously, instead of the actual error, only the following error message was logged: "expected condition not met".

MAISTRA-2439: Prevent federation from exporting services that are not visible to the federation gateway (istio#432)

By taking into consideration the service annotation
`networking.istio.io/exportTo`.

This annotation restricts where this service is visible: https://istio.io/latest/docs/reference/config/annotations/

If a service is not reachable from the federation gateway namespace due
to this annotation, it should not be exported.

MAISTRA-2617: Do not watch all namespaces in Extensions controller (istio#425)

When using MemberRoll, we should rely on it to provide the list
of namespaces to watch. If not using it, defaults to command line
arguments.

This fixes an istiod startup error as seen in the logs:
```
github.com/maistra/xns-informer/pkg/informers/informer.go:204: Failed to watch *v1.ServiceMeshExtension: failed to list *v1.ServiceMeshExtension: servicemeshextensions.maistra.io is forbidden: User "system:serviceaccount:i1:istiod-service-account-basic" cannot list resource "servicemeshextensions" in API group "maistra.io" at the cluster scope
```
luksa added a commit to luksa/istio that referenced this pull request Sep 15, 2022
* [federation] Initial federation implementation

* MAISTRA-2194 Add server/client code for Federation Service Discovery v1

* MAISTRA-2195 Implement /watch endpoint

* MAISTRA-2293 add CRD and controller for federating meshes

* MAISTRA-2294 create CRD for federation ServiceExport (istio#324)

* MAISTRA-2294 update example VirtualService resources for ratings and mongodb (istio#333)

* [federation] MAISTRA-2295 create CRD for federation ServiceImport (istio#336)

Signed-off-by: rcernich <rcernich@redhat.com>

* [misc] Use objects and clients from maistra/api repo

- Remove local objects and clients
- Update Makefile

* [federation] MAISTRA-2309 create CRD for FederationStatus (istio#348)

Signed-off-by: rcernich <rcernich@redhat.com>

* [federation] Federation fixes and improvements

MAISTRA-2423 update federation api to v1

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2424 minor updates to federation api

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2427 configure locality info on imported services

Signed-off-by: rcernich <rcernich@redhat.com>

Cherry-pick multi-root support (istio#387)

* Update go-control-plane to v0.9.9

* Support multiple roots

Squashed commit, contains:
- MAISTRA-2325 Distribute trust bundles over SDS
- MAISTRA-2390 Push trust bundle updates through xDS (istio#357)

MAISTRA-2425 move spec.security.certificateChain to ConfigMap reference; add ability to specify ports for service and discovery (istio#392)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2426 move FederationStatus into MeshFederation (istio#393)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2513 federation API refinements

Signed-off-by: rcernich <rcernich@redhat.com>

[federation] MAISTRA-2237 Encrypt service discovery traffic (istio#411)

MAISTRA-2610 Prefix federation discovery endpoints with /v1/ (istio#422)

MAISTRA-2297 Support updates of federation resources (istio#417)

MAISTRA-2375: Do not create automatic routes for Federation Gateways

Remove a redundant call

`setHostname()` is already being called within `NameForService()`

see
https://github.com/maistra/istio/blob/21ee900cf8825711f70d88dc97afcf6862ed2626/pkg/servicemesh/federation/common/namemapping.go
lines 83, 120, 129

Remove techPreview.meshConfig from PoC example

It's set by default now.

MAISTRA-2611 Fix deletion of service exports to federated mesh (istio#421)

Fix test

MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil (istio#437)

* MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil

* Fix test

MAISTRA-2682 Fix watch mechanism in federation (istio#439)

Previously, no events were read from the watch response, because the read started with an endless loop that waited for data to be available in the decoder's buffer. This never happened, because the buffer is only written to when you call decoder.Decode(); this function was never called because the code waited for the buffer to have data.

MAISTRA-2683 Properly close incoming watch connections when shutting down (istio#440)

Log actual error returned by pollServices() (istio#441)

Previously, instead of the actual error, only the following error message was logged: "expected condition not met".

MAISTRA-2439: Prevent federation from exporting services that are not visible to the federation gateway (istio#432)

By taking into consideration the service annotation
`networking.istio.io/exportTo`.

This annotation restricts where this service is visible: https://istio.io/latest/docs/reference/config/annotations/

If a service is not reachable from the federation gateway namespace due
to this annotation, it should not be exported.

MAISTRA-2617: Do not watch all namespaces in Extensions controller (istio#425)

When using MemberRoll, we should rely on it to provide the list
of namespaces to watch. If not using it, defaults to command line
arguments.

This fixes an istiod startup error as seen in the logs:
```
github.com/maistra/xns-informer/pkg/informers/informer.go:204: Failed to watch *v1.ServiceMeshExtension: failed to list *v1.ServiceMeshExtension: servicemeshextensions.maistra.io is forbidden: User "system:serviceaccount:i1:istiod-service-account-basic" cannot list resource "servicemeshextensions" in API group "maistra.io" at the cluster scope
```

* Remove package export and extensions

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix creating discovery.Controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix calling nil ResourceManager

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove panicing from AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [misc] OSSM-774 Fix flaky TestStatusManager (istio#456)

This adds a little sleep to our unit tests for the StatusManager,
because without it, we're running into the issue that we're updating
a ServiceMeshPeer's status very quickly, and in some cases it might be
that the last change has not been propagated when we're generating
the patch for the next status change, which can lead to failures.

This can happen in the real world, but you would need to change a
ServiceMeshPeer's status within a few milliseconds, I doubt that it
affects users. It would also be fixed with the next status update.
For those reasons, I'm only fixing it in the test, with a Sleep()
call.

* Refactor manager_test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-1150 Fix flaky TestStatusManager unit test (istio#478)

Co-authored-by: Marko Lukša <marko.luksa@gmail.com>

* OSSM-1252 Fix federation status updates (istio#512)

* Copy federation privileges from base to istio-discovery

* Remove unnecessary ServiceMeshExtensions CRD

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add model.NetworkGatewaysHandler to federation controller to implement AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] MAISTRA-2640 Add federation integration test (istio#447)

* Fix building federation test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add package gogo from maistra-2.2 to temporarily fix TbdsGenerator

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable configuring remote cluster in federation deployment

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] OSSM-1128 Fix federation (istio#480)

* Fix SecretCacheClient

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Send initial XDS request for trust bundle from proxy

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable using EndpointSlices to fix error on getting federation-egress endpoints

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove unused serviceMeshExtensionController

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix lint errors

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Update maistra CRDs

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* fedration_cp_version_update (istio#521)

* OSSM-1529 Improve federation example script (istio#522)

* OSSM-1529 Improve federation example install.sh

Previously, the script would fall back to using nodeports when the load balancer IP wasn't set. This meant that if the provision of the load balancer took too long, the SMCPs would be misconfigured and you had to run the install script again.

With this change, the script now waits for the load balancer IP to appear. It never falls back to using node ports, because they never really worked (the nodes' hostnames typically aren't FQDN and the node ports are typically protected by firewalls).

If the user wants to expose the federation ingresses in a different way, they can now set the environment variables MESH1_ADDRESS, MESH1_DISCOVERY_PORT, and MESH2_SERVICE_PORT (likewise for MESH2) and run the script.

* Update Federation example README

* Better "Waiting for load balancer" message

* OSSM-1211 Fix federation locality failover issues (istio#561)

Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>

Signed-off-by: rcernich <rcernich@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: maistra-bot <57098434+maistra-bot@users.noreply.github.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <xuyuanlin_00@hotmail.com>
luksa pushed a commit to luksa/istio that referenced this pull request Apr 11, 2024
…stio#699)

* [federation] Introduces federation deployment (istio#585)

* [federation] Initial federation implementation

* MAISTRA-2194 Add server/client code for Federation Service Discovery v1

* MAISTRA-2195 Implement /watch endpoint

* MAISTRA-2293 add CRD and controller for federating meshes

* MAISTRA-2294 create CRD for federation ServiceExport (istio#324)

* MAISTRA-2294 update example VirtualService resources for ratings and mongodb (istio#333)

* [federation] MAISTRA-2295 create CRD for federation ServiceImport (istio#336)

Signed-off-by: rcernich <rcernich@redhat.com>

* [misc] Use objects and clients from maistra/api repo

- Remove local objects and clients
- Update Makefile

* [federation] MAISTRA-2309 create CRD for FederationStatus (istio#348)

Signed-off-by: rcernich <rcernich@redhat.com>

* [federation] Federation fixes and improvements

MAISTRA-2423 update federation api to v1

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2424 minor updates to federation api

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2427 configure locality info on imported services

Signed-off-by: rcernich <rcernich@redhat.com>

Cherry-pick multi-root support (istio#387)

* Update go-control-plane to v0.9.9

* Support multiple roots

Squashed commit, contains:
- MAISTRA-2325 Distribute trust bundles over SDS
- MAISTRA-2390 Push trust bundle updates through xDS (istio#357)

MAISTRA-2425 move spec.security.certificateChain to ConfigMap reference; add ability to specify ports for service and discovery (istio#392)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2426 move FederationStatus into MeshFederation (istio#393)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2513 federation API refinements

Signed-off-by: rcernich <rcernich@redhat.com>

[federation] MAISTRA-2237 Encrypt service discovery traffic (istio#411)

MAISTRA-2610 Prefix federation discovery endpoints with /v1/ (istio#422)

MAISTRA-2297 Support updates of federation resources (istio#417)

MAISTRA-2375: Do not create automatic routes for Federation Gateways

Remove a redundant call

`setHostname()` is already being called within `NameForService()`

see
https://github.com/maistra/istio/blob/21ee900cf8825711f70d88dc97afcf6862ed2626/pkg/servicemesh/federation/common/namemapping.go
lines 83, 120, 129

Remove techPreview.meshConfig from PoC example

It's set by default now.

MAISTRA-2611 Fix deletion of service exports to federated mesh (istio#421)

Fix test

MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil (istio#437)

* MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil

* Fix test

MAISTRA-2682 Fix watch mechanism in federation (istio#439)

Previously, no events were read from the watch response, because the read started with an endless loop that waited for data to be available in the decoder's buffer. This never happened, because the buffer is only written to when you call decoder.Decode(); this function was never called because the code waited for the buffer to have data.

MAISTRA-2683 Properly close incoming watch connections when shutting down (istio#440)

Log actual error returned by pollServices() (istio#441)

Previously, instead of the actual error, only the following error message was logged: "expected condition not met".

MAISTRA-2439: Prevent federation from exporting services that are not visible to the federation gateway (istio#432)

By taking into consideration the service annotation
`networking.istio.io/exportTo`.

This annotation restricts where this service is visible: https://istio.io/latest/docs/reference/config/annotations/

If a service is not reachable from the federation gateway namespace due
to this annotation, it should not be exported.

MAISTRA-2617: Do not watch all namespaces in Extensions controller (istio#425)

When using MemberRoll, we should rely on it to provide the list
of namespaces to watch. If not using it, defaults to command line
arguments.

This fixes an istiod startup error as seen in the logs:
```
github.com/maistra/xns-informer/pkg/informers/informer.go:204: Failed to watch *v1.ServiceMeshExtension: failed to list *v1.ServiceMeshExtension: servicemeshextensions.maistra.io is forbidden: User "system:serviceaccount:i1:istiod-service-account-basic" cannot list resource "servicemeshextensions" in API group "maistra.io" at the cluster scope
```

* Remove package export and extensions

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix creating discovery.Controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix calling nil ResourceManager

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove panicing from AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [misc] OSSM-774 Fix flaky TestStatusManager (istio#456)

This adds a little sleep to our unit tests for the StatusManager,
because without it, we're running into the issue that we're updating
a ServiceMeshPeer's status very quickly, and in some cases it might be
that the last change has not been propagated when we're generating
the patch for the next status change, which can lead to failures.

This can happen in the real world, but you would need to change a
ServiceMeshPeer's status within a few milliseconds, I doubt that it
affects users. It would also be fixed with the next status update.
For those reasons, I'm only fixing it in the test, with a Sleep()
call.

* Refactor manager_test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-1150 Fix flaky TestStatusManager unit test (istio#478)

Co-authored-by: Marko Lukša <marko.luksa@gmail.com>

* OSSM-1252 Fix federation status updates (istio#512)

* Copy federation privileges from base to istio-discovery

* Remove unnecessary ServiceMeshExtensions CRD

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add model.NetworkGatewaysHandler to federation controller to implement AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] MAISTRA-2640 Add federation integration test (istio#447)

* Fix building federation test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add package gogo from maistra-2.2 to temporarily fix TbdsGenerator

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable configuring remote cluster in federation deployment

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] OSSM-1128 Fix federation (istio#480)

* Fix SecretCacheClient

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Send initial XDS request for trust bundle from proxy

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable using EndpointSlices to fix error on getting federation-egress endpoints

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove unused serviceMeshExtensionController

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix lint errors

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Update maistra CRDs

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* fedration_cp_version_update (istio#521)

* OSSM-1529 Improve federation example script (istio#522)

* OSSM-1529 Improve federation example install.sh

Previously, the script would fall back to using nodeports when the load balancer IP wasn't set. This meant that if the provision of the load balancer took too long, the SMCPs would be misconfigured and you had to run the install script again.

With this change, the script now waits for the load balancer IP to appear. It never falls back to using node ports, because they never really worked (the nodes' hostnames typically aren't FQDN and the node ports are typically protected by firewalls).

If the user wants to expose the federation ingresses in a different way, they can now set the environment variables MESH1_ADDRESS, MESH1_DISCOVERY_PORT, and MESH2_SERVICE_PORT (likewise for MESH2) and run the script.

* Update Federation example README

* Better "Waiting for load balancer" message

* OSSM-1211 Fix federation locality failover issues (istio#561)

Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>

Signed-off-by: rcernich <rcernich@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: maistra-bot <57098434+maistra-bot@users.noreply.github.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <xuyuanlin_00@hotmail.com>

* fix: removes deprecated gogo protobuf conversion

* fix: goimport format

* fix(lint): removes unused funcs

* fix(lint): removes deprecated io/ioutil

* fix(lint): disables staticcheck for federation tests

it requires at least two clusters to make sense

* fix(lint): use anypb.UnmarshalTo instead of ptypes

* fix: no need to exclude grpcgen_test.go

it seems to be fixed in v1.39

see: grpc/grpc-go#4476

* chore(backoff): aligns backoff dependency with v4 used by upstream

* chore: reverts removed blank line - irrelevant for merge

* chore(revive): adds explanation why json:inline is skipped from linting

* OSSM-1962: Use EndpointSlices instead of Endpoints in federation controller (istio#614)

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-2049: Fix handling ServiceAccounts in federation controller (istio#627)

* Fix collecting empty or repeated ServiceAccounts in federation controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Collect ServiceAccounts in sorted order

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-1093 Shorten exported resource name (istio#653)

* Shorten exported resource name

* Fix import createResourceName + unit tests

* Rearrange unit tests + renaming function

* Gen and lint

* Rearrange unit tests + renaming function

Gen and lint

* Fix minor changes

* Error message RFC 1123

* Reorganize structs in TestStatusManager

Instead of having three arrays (events, expectedStatuses, and assertions), we now only have a single array, where each entry is a triplet containing the event, the expected status and the assertion. This allows you to see the event and its expected effects together and not have to scroll up and down, matching the indexes of the three arrays.

* OSSM-2193 Fix flaky TestStatusManager

See comment in https://issues.redhat.com/browse/OSSM-2193 to understand why this change fixes the test.

* fix: runs make gen

* chore: explains why staticcheck linter is disabled for federation_test

* OSSM-728 Configuration scripts for Federation on Z and P, and bare metal (istio#670)

* add config example scripts for IBM Systems Z and P

* update multi-arch bookinfo deployment README, remove src

* Update README.md

* these are provided in the IBM repo

* so README.md passes mdlinter

* so README.md passes mdlinter

* so README.md passes mdlinter

* Update README.md

* Move federation examples to samples/ directory

* Rename template YAMLs to .yaml.template

This makes the linter happy

Co-authored-by: cfillekes <cfilleke@redhat.com>
Co-authored-by: Cheryl Fillekes <cfillekes@ibm.com>

* chore: removes obsolute TODO

* chore: simplifies bool return expression

* chore: removes redundant kubeClient check

if initialization fails this func will not be reached anyway

* chore(pkg): moves kube ctrl under servicemesh pkg folder

* OSSM-2338: Remove env ISTIO_META_ROUTER_MODE from federation test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-2338: Remove "routerMode: sni-dnat" from federation samples

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* fix: adds operator.go customizations to kube.go

clearly with cherry-pick we lost information about the file rename and thus
the changes we made specificually for testing federation got lost

* fix(test/operator): checks if east-west gw needs to be deployed

* fix(federation): uses Unwrap to get instance of Federation registry

* fix(tests): sets istiod-less remote flag to false

that was the behaviour for maistra-2.3 and we need istiod to be present in order to have federation working

* chore: gets registry just before it is needed

* chore: explains why istiodlessremotes is needed to be set to false

* chore: removes redundant import aliases

* chore: removes name collisions

* chore: removes redundant type conversion

* fix: disables staticcheck linter for cluster req tests.

* fix(tests): reverts timeout to original (but in minutes)

* fix(tests): removes extra logging

* chore: removes unnecessary logging

* fix: uses existing CRD file references in charts

* chore: removes multicluster label

* chore: uses built-in namespace.NewOrFail instead of our impl

* chore: introduces defaultTimeout const for federation tests

* fix(lint): fixes go imports

* fix(lint): removes unused variable

* fix: naively wait 5s hoping that kind network will show up

Signed-off-by: rcernich <rcernich@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>
Co-authored-by: Jacek Ewertowski <jewertow@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: maistra-bot <57098434+maistra-bot@users.noreply.github.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <xuyuanlin_00@hotmail.com>
Co-authored-by: bmangoen <bmangoen@redhat.com>
Co-authored-by: cfillekes <cfilleke@redhat.com>
Co-authored-by: Cheryl Fillekes <cfillekes@ibm.com>

OSSM-2376: Move kube controller to the federation package (istio#718)

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

OSSM-2376: Don't start federation controllers until informers have synced (istio#717) (istio#720)

* OSSM-2376: Don't start federation-discovery-controller until kube informer has synced

Federation discovery controller fetches config map with remote CA root
cert, so if the controller started before the informer has synced, it
would fail to fetch the config map.

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Store ConfigMap informer in a field of the discovery.Controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Refactor ResourceManager and don't start federation controller until informers has synced

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Simplify Start and HasSynced functions in federation controllers

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Move kube controller to the federation package

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Federation example fixes (istio#758)

* Use default version in federation example SMCPs

* Fix paths in federation example

OSSM-3599 Federation egress-gateway gets wrong network gateway endpoints (istio#775)

* OSSM-3599 Federation egress-gateway gets wrong update of network gateway endpoints

* Deprecate GatewayEndpoints on server side

* Remove resyncNetworkGateways in unit tests

* Fix lint

* Deprecate NetworkGatewayEndpoints and fix tests

Refactor federation tests (istio#841)

* Refactor federation tests

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add more test cases

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

---------

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
dgn added a commit to dgn/istio that referenced this pull request Jun 13, 2024
…stio#844)

* [federation] Introduces federation deployment (istio#585)

* [federation] Initial federation implementation

* MAISTRA-2194 Add server/client code for Federation Service Discovery v1

* MAISTRA-2195 Implement /watch endpoint

* MAISTRA-2293 add CRD and controller for federating meshes

* MAISTRA-2294 create CRD for federation ServiceExport (istio#324)

* MAISTRA-2294 update example VirtualService resources for ratings and mongodb (istio#333)

* [federation] MAISTRA-2295 create CRD for federation ServiceImport (istio#336)

Signed-off-by: rcernich <rcernich@redhat.com>

* [misc] Use objects and clients from maistra/api repo

- Remove local objects and clients
- Update Makefile

* [federation] MAISTRA-2309 create CRD for FederationStatus (istio#348)

Signed-off-by: rcernich <rcernich@redhat.com>

* [federation] Federation fixes and improvements

MAISTRA-2423 update federation api to v1

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2424 minor updates to federation api

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2427 configure locality info on imported services

Signed-off-by: rcernich <rcernich@redhat.com>

Cherry-pick multi-root support (istio#387)

* Update go-control-plane to v0.9.9

* Support multiple roots

Squashed commit, contains:
- MAISTRA-2325 Distribute trust bundles over SDS
- MAISTRA-2390 Push trust bundle updates through xDS (istio#357)

MAISTRA-2425 move spec.security.certificateChain to ConfigMap reference; add ability to specify ports for service and discovery (istio#392)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2426 move FederationStatus into MeshFederation (istio#393)

Signed-off-by: rcernich <rcernich@redhat.com>

MAISTRA-2513 federation API refinements

Signed-off-by: rcernich <rcernich@redhat.com>

[federation] MAISTRA-2237 Encrypt service discovery traffic (istio#411)

MAISTRA-2610 Prefix federation discovery endpoints with /v1/ (istio#422)

MAISTRA-2297 Support updates of federation resources (istio#417)

MAISTRA-2375: Do not create automatic routes for Federation Gateways

Remove a redundant call

`setHostname()` is already being called within `NameForService()`

see
https://github.com/maistra/istio/blob/21ee900cf8825711f70d88dc97afcf6862ed2626/pkg/servicemesh/federation/common/namemapping.go
lines 83, 120, 129

Remove techPreview.meshConfig from PoC example

It's set by default now.

MAISTRA-2611 Fix deletion of service exports to federated mesh (istio#421)

Fix test

MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil (istio#437)

* MAISTRA-2658 Ensure ImportedServiceSet.status.importedServices is never nil

* Fix test

MAISTRA-2682 Fix watch mechanism in federation (istio#439)

Previously, no events were read from the watch response, because the read started with an endless loop that waited for data to be available in the decoder's buffer. This never happened, because the buffer is only written to when you call decoder.Decode(); this function was never called because the code waited for the buffer to have data.

MAISTRA-2683 Properly close incoming watch connections when shutting down (istio#440)

Log actual error returned by pollServices() (istio#441)

Previously, instead of the actual error, only the following error message was logged: "expected condition not met".

MAISTRA-2439: Prevent federation from exporting services that are not visible to the federation gateway (istio#432)

By taking into consideration the service annotation
`networking.istio.io/exportTo`.

This annotation restricts where this service is visible: https://istio.io/latest/docs/reference/config/annotations/

If a service is not reachable from the federation gateway namespace due
to this annotation, it should not be exported.

MAISTRA-2617: Do not watch all namespaces in Extensions controller (istio#425)

When using MemberRoll, we should rely on it to provide the list
of namespaces to watch. If not using it, defaults to command line
arguments.

This fixes an istiod startup error as seen in the logs:
```
github.com/maistra/xns-informer/pkg/informers/informer.go:204: Failed to watch *v1.ServiceMeshExtension: failed to list *v1.ServiceMeshExtension: servicemeshextensions.maistra.io is forbidden: User "system:serviceaccount:i1:istiod-service-account-basic" cannot list resource "servicemeshextensions" in API group "maistra.io" at the cluster scope
```

* Remove package export and extensions

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix creating discovery.Controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix calling nil ResourceManager

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove panicing from AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [misc] OSSM-774 Fix flaky TestStatusManager (istio#456)

This adds a little sleep to our unit tests for the StatusManager,
because without it, we're running into the issue that we're updating
a ServiceMeshPeer's status very quickly, and in some cases it might be
that the last change has not been propagated when we're generating
the patch for the next status change, which can lead to failures.

This can happen in the real world, but you would need to change a
ServiceMeshPeer's status within a few milliseconds, I doubt that it
affects users. It would also be fixed with the next status update.
For those reasons, I'm only fixing it in the test, with a Sleep()
call.

* Refactor manager_test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-1150 Fix flaky TestStatusManager unit test (istio#478)

Co-authored-by: Marko Lukša <marko.luksa@gmail.com>

* OSSM-1252 Fix federation status updates (istio#512)

* Copy federation privileges from base to istio-discovery

* Remove unnecessary ServiceMeshExtensions CRD

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add model.NetworkGatewaysHandler to federation controller to implement AppendNetworkGatewayHandler

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] MAISTRA-2640 Add federation integration test (istio#447)

* Fix building federation test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add package gogo from maistra-2.2 to temporarily fix TbdsGenerator

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable configuring remote cluster in federation deployment

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* [federation] OSSM-1128 Fix federation (istio#480)

* Fix SecretCacheClient

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Send initial XDS request for trust bundle from proxy

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Disable using EndpointSlices to fix error on getting federation-egress endpoints

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Remove unused serviceMeshExtensionController

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Fix lint errors

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Update maistra CRDs

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* fedration_cp_version_update (istio#521)

* OSSM-1529 Improve federation example script (istio#522)

* OSSM-1529 Improve federation example install.sh

Previously, the script would fall back to using nodeports when the load balancer IP wasn't set. This meant that if the provision of the load balancer took too long, the SMCPs would be misconfigured and you had to run the install script again.

With this change, the script now waits for the load balancer IP to appear. It never falls back to using node ports, because they never really worked (the nodes' hostnames typically aren't FQDN and the node ports are typically protected by firewalls).

If the user wants to expose the federation ingresses in a different way, they can now set the environment variables MESH1_ADDRESS, MESH1_DISCOVERY_PORT, and MESH2_SERVICE_PORT (likewise for MESH2) and run the script.

* Update Federation example README

* Better "Waiting for load balancer" message

* OSSM-1211 Fix federation locality failover issues (istio#561)

Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>

Signed-off-by: rcernich <rcernich@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: maistra-bot <57098434+maistra-bot@users.noreply.github.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <xuyuanlin_00@hotmail.com>

* fix: removes deprecated gogo protobuf conversion

* fix: goimport format

* fix(lint): removes unused funcs

* fix(lint): removes deprecated io/ioutil

* fix(lint): disables staticcheck for federation tests

it requires at least two clusters to make sense

* fix(lint): use anypb.UnmarshalTo instead of ptypes

* fix: no need to exclude grpcgen_test.go

it seems to be fixed in v1.39

see: grpc/grpc-go#4476

* chore(backoff): aligns backoff dependency with v4 used by upstream

* chore: reverts removed blank line - irrelevant for merge

* chore(revive): adds explanation why json:inline is skipped from linting

* OSSM-1962: Use EndpointSlices instead of Endpoints in federation controller (istio#614)

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-2049: Fix handling ServiceAccounts in federation controller (istio#627)

* Fix collecting empty or repeated ServiceAccounts in federation controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Collect ServiceAccounts in sorted order

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-1093 Shorten exported resource name (istio#653)

* Shorten exported resource name

* Fix import createResourceName + unit tests

* Rearrange unit tests + renaming function

* Gen and lint

* Rearrange unit tests + renaming function

Gen and lint

* Fix minor changes

* Error message RFC 1123

* Reorganize structs in TestStatusManager

Instead of having three arrays (events, expectedStatuses, and assertions), we now only have a single array, where each entry is a triplet containing the event, the expected status and the assertion. This allows you to see the event and its expected effects together and not have to scroll up and down, matching the indexes of the three arrays.

* OSSM-2193 Fix flaky TestStatusManager

See comment in https://issues.redhat.com/browse/OSSM-2193 to understand why this change fixes the test.

* fix: runs make gen

* chore: explains why staticcheck linter is disabled for federation_test

* OSSM-728 Configuration scripts for Federation on Z and P, and bare metal (istio#670)

* add config example scripts for IBM Systems Z and P

* update multi-arch bookinfo deployment README, remove src

* Update README.md

* these are provided in the IBM repo

* so README.md passes mdlinter

* so README.md passes mdlinter

* so README.md passes mdlinter

* Update README.md

* Move federation examples to samples/ directory

* Rename template YAMLs to .yaml.template

This makes the linter happy

Co-authored-by: cfillekes <cfilleke@redhat.com>
Co-authored-by: Cheryl Fillekes <cfillekes@ibm.com>

* chore: removes obsolute TODO

* chore: simplifies bool return expression

* chore: removes redundant kubeClient check

if initialization fails this func will not be reached anyway

* chore(pkg): moves kube ctrl under servicemesh pkg folder

* OSSM-2338: Remove env ISTIO_META_ROUTER_MODE from federation test

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* OSSM-2338: Remove "routerMode: sni-dnat" from federation samples

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* fix: adds operator.go customizations to kube.go

clearly with cherry-pick we lost information about the file rename and thus
the changes we made specificually for testing federation got lost

* fix(test/operator): checks if east-west gw needs to be deployed

* fix(federation): uses Unwrap to get instance of Federation registry

* fix(tests): sets istiod-less remote flag to false

that was the behaviour for maistra-2.3 and we need istiod to be present in order to have federation working

* chore: gets registry just before it is needed

* chore: explains why istiodlessremotes is needed to be set to false

* chore: removes redundant import aliases

* chore: removes name collisions

* chore: removes redundant type conversion

* fix: disables staticcheck linter for cluster req tests.

* fix(tests): reverts timeout to original (but in minutes)

* fix(tests): removes extra logging

* chore: removes unnecessary logging

* fix: uses existing CRD file references in charts

* chore: removes multicluster label

* chore: uses built-in namespace.NewOrFail instead of our impl

* chore: introduces defaultTimeout const for federation tests

* fix(lint): fixes go imports

* fix(lint): removes unused variable

* fix: naively wait 5s hoping that kind network will show up

Signed-off-by: rcernich <rcernich@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Yuanlin <yuanlin.xu@redhat.com>
Co-authored-by: Jacek Ewertowski <jewertow@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: maistra-bot <57098434+maistra-bot@users.noreply.github.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <xuyuanlin_00@hotmail.com>
Co-authored-by: bmangoen <bmangoen@redhat.com>
Co-authored-by: cfillekes <cfilleke@redhat.com>
Co-authored-by: Cheryl Fillekes <cfillekes@ibm.com>

OSSM-2376: Move kube controller to the federation package (istio#718)

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

OSSM-2376: Don't start federation controllers until informers have synced (istio#717) (istio#720)

* OSSM-2376: Don't start federation-discovery-controller until kube informer has synced

Federation discovery controller fetches config map with remote CA root
cert, so if the controller started before the informer has synced, it
would fail to fetch the config map.

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Store ConfigMap informer in a field of the discovery.Controller

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Refactor ResourceManager and don't start federation controller until informers has synced

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Simplify Start and HasSynced functions in federation controllers

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Move kube controller to the federation package

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Federation example fixes (istio#758)

* Use default version in federation example SMCPs

* Fix paths in federation example

OSSM-3599 Federation egress-gateway gets wrong network gateway endpoints (istio#775)

* OSSM-3599 Federation egress-gateway gets wrong update of network gateway endpoints

* Deprecate GatewayEndpoints on server side

* Remove resyncNetworkGateways in unit tests

* Fix lint

* Deprecate NetworkGatewayEndpoints and fix tests

Refactor federation tests (istio#841)

* Refactor federation tests

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

* Add more test cases

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>

Reimplement InstancesByPort method

`InstancesByPort` is used by the federation server, which was removed in the upstream istio#46329. We reimplement it to support the federation server.

---------

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Co-authored-by: Jacek Ewertowski <jewertow@redhat.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: Rob Cernich <rcernich@redhat.com>
Co-authored-by: Jonh Wendell <jonh.wendell@redhat.com>
Co-authored-by: Marko Lukša <marko.luksa@gmail.com>
Co-authored-by: Praneeth Bajjuri <pbajjuri@redhat.com>
Co-authored-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Co-authored-by: Brian Mangoenpawiro <bmangoen@redhat.com>
Co-authored-by: Cheryl Fillekes <cfillekes@ibm.com>
Signed-off-by: Yann Liu <yannliu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants