Skip to content

Conversation

costinm
Copy link
Contributor

@costinm costinm commented Feb 15, 2018

Based on test results, in separate PR I may make it customisable, for now looking to see if it has
any impact on CPU/memory use.

@costinm costinm requested review from ZackButcher, rshriram and a team February 15, 2018 00:31
@istio-merge-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: kyessenov

Assign the PR to them by writing /assign @kyessenov in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@@ -481,6 +489,19 @@ func (ds *DiscoveryService) ClearCacheStats(_ *restful.Request, _ *restful.Respo
// clearCache will clear all envoy caches. Called by service, instance and config handlers.
// This will impact the performance, since envoy will need to recalculate.
func (ds *DiscoveryService) clearCache() {
clearCacheMutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for mutex, this is run on a single event queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just feel safer with a mutex - with the other changes in multicluster and other optimizations
we may have multiple callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this mutex work with the cache lock? there are things reading from the cache in parallel, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mutex only syncs the clearCache calls.
Additional mutexes or other sync may be needed - we'll discuss it when that work is done, for now
it's useful to stop assuming everything is in one thread... Is there any harm in adding a mutex to clearCache ?

if time.Since(lastClearCache) < 60*time.Second {
if !clearCacheTimerSet {
clearCacheTimerSet = true
time.AfterFunc(61*time.Second, func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this going to block the event queue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the danger here is the unbound growth of the incoming events. i think we need to run a secondary worker thread to refresh the cache like @rshriram suggested. they only need to share one bit for the "dirtiness" of the cache.

@costinm costinm requested a review from a team February 15, 2018 00:58
@istio-merge-robot
Copy link

@costinm PR needs rebase

@istio-merge-robot istio-merge-robot added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 15, 2018
istio.VERSION Outdated
export MIXER_TAG="replace-by-actual-tag"
export PILOT_HUB="gcr.io/istio-testing"
export PILOT_TAG="replace-by-actual-tag"
export CA_HUB="costinm"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please look at your PR/diff before and after submission :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, istio.VERSION again...

@istio-merge-robot istio-merge-robot removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 15, 2018
@kyessenov
Copy link
Contributor

kyessenov commented Feb 15, 2018 via email

clearCacheMutex.Lock()
defer clearCacheMutex.Unlock()

if time.Since(lastClearCache) < time.Duration(clearCacheTime) * time.Second {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greater than ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the time since last clear is less than x seconds - we schedule the event.
I think it's <

@istio-testing
Copy link
Collaborator

istio-testing commented Feb 15, 2018

@costinm: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/e2e-smoke.sh 7849cda link /test e2e-smoke

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@costinm
Copy link
Contributor Author

costinm commented Feb 16, 2018

ping ?

@rshriram
Copy link
Member

I would rename this clear cache squash to batchCacheEvictionEvents for clarity.

@costinm
Copy link
Contributor Author

costinm commented Feb 16, 2018

Agreed, will try to do it in the next PR related to this ( to avoid another lgtm/approve/build cycle...)

@ldemailly
Copy link
Member

the errors in https://k8s-gubernator.appspot.com/build/istio-prow/pull/istio_istio/3506/e2e-smoke/3037/ seems a bit concerning
/test e2e-smoke

@ldemailly ldemailly deleted the costin-clearcache branch February 21, 2018 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants