Skip to content

Conversation

akhilac1
Copy link
Contributor

@akhilac1 akhilac1 commented Nov 24, 2022

Signed-off-by: Akhila Chetlapalle akhila@Akhilas-MacBook-Pro.local

Description

Issue #5481 limits usage of Dapr for a subset of users, especially during rolling upgrade. The load balancer continues to send requests even after SIGTERM is sent to the Pod. The app being written in spring can support a pre-stop hook and delay shutdown to ensure the requests coming un while load balancer identifies the pod as Terminating and removes it from the list of active pods.

Dapr, however, closes inbound channels and hence will not receive any requests once SIGTERM is received. So many failed requests are noticed in this environment.

This PR moves the closing of inbound channels to after the graceperiod expiry to address the immediate closure and hence dropping of requests.

Issue reference

#5481

Please reference the issue this PR will close: #5481

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
@akhilac1 akhilac1 requested review from a team as code owners November 24, 2022 16:23
@akhilac1 akhilac1 marked this pull request as draft November 24, 2022 16:23
@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Merging #5562 (f0ec48f) into master (64d468a) will decrease coverage by 0.01%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master    #5562      +/-   ##
==========================================
- Coverage   65.08%   65.06%   -0.02%     
==========================================
  Files         143      143              
  Lines       15286    15295       +9     
==========================================
+ Hits         9949     9952       +3     
- Misses       4637     4640       +3     
- Partials      700      703       +3     
Impacted Files Coverage Δ
pkg/runtime/runtime.go 67.47% <70.58%> (-0.05%) ⬇️
pkg/runtime/trace.go 100.00% <100.00%> (ø)
pkg/components/pubsub/pluggable.go 64.63% <0.00%> (-3.66%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@ItalyPaleAle ItalyPaleAle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not support this change, sorry.

A few months ago we went through an extensive amount of work to be able to shut down PubSub and input binding before the grace period, as this is the most correct behavior for the majority of users.

Imagine a pod that gets shut down. Both daprd and the app get the SIGKILL at the same time and should immediately begin the shutdown sequence. Usually apps terminate instantly. If dapr were to still take messages from PubSub and input bindings, we'd try to deliver them to apps that are already shut down, and that's not polite (and could cause messages to go to the DLQ incorrectly).

We have the grace period on output things just so the app can complete the work and store the output. But we shouldn't try to bring more work to the app during the grace period.

@mukundansundar
Copy link
Contributor

I do not support this change, sorry.

A few months ago we went through an extensive amount of work to be able to shut down PubSub and input binding before the grace period, as this is the most correct behavior for the majority of users.

Imagine a pod that gets shut down. Both daprd and the app get the SIGKILL at the same time and should immediately begin the shutdown sequence. Usually apps terminate instantly. If dapr were to still take messages from PubSub and input bindings, we'd try to deliver them to apps that are already shut down, and that's not polite (and could cause messages to go to the DLQ incorrectly).

We have the grace period on output things just so the app can complete the work and store the output. But we shouldn't try to bring more work to the app during the grace period.

This makes sense. We are closing all Dapr APIs, and Dapr to App communication.
Only the output components doing any work will be closed after graceful time period.

Though I have once question, most of the output component APIs are synchronous in Dapr, so what happens to in-flight requests if we suddenly close down the servers?
Can we configure the servers to not accept new requests during the time period but keep the server still running till the in-flight requests are complete?

@ItalyPaleAle WDYT?

@ItalyPaleAle
Copy link
Contributor

Though I have once question, most of the output component APIs are synchronous in Dapr, so what happens to in-flight requests if we suddenly close down the servers?
Can we configure the servers to not accept new requests during the time period but keep the server still running till the in-flight requests are complete?

Right now, the servers are shut down at the end of the grace period.

Interesting idea about not accepting new requests, but I am not sure how I feel about it. The app should be able to invoke Dapr APIs during the graceful shutdown period if it needs to store its output somewhere IMHO.

Here's how I see the shutdown sequence working:

  1. When the pod starts getting terminated, the app should immediately stop accepting new work. This is outside of the control of Dapr, except for the part where we should not be sending more work to the app during the graceful shutdown period: that's what we do with input bindings, PubSub, and input service invocation.
  2. The app may still need some time to process in-progress work. For example, imagine that a half-second before the SIGKILL, the app received a message from PubSub and that takes 3 seconds to process. That's why there's the graceful shutdown period in Dapr, after all.
  3. After the message is done processing, the app may want to store its output somewhere, using Dapr.

That's why I don't think Dapr should stop accepting new work (from the app), as it may still be needed. It's the app's responsibility to complete all work within the grace period, however.

@mukundansundar
Copy link
Contributor

mukundansundar commented Nov 26, 2022

That's why I don't think Dapr should stop accepting new work (from the app), as it may still be needed. It's the app's responsibility to complete all work within the grace period, however.

But with the current sequence in the code, the Dapr APIs are closed immediately. We are closing the APIs even before the graceful time period. (closer.Close()) method.

log.Info("dapr shutting down.")

log.Info("Stopping PubSub subscribers and input bindings")
a.stopSubscriptions()
a.stopReadingFromBindings()
a.cancel()
a.stopActor()
log.Info("Stopping Dapr APIs")
for _, closer := range a.apiClosers {
  if err := closer.Close(); err != nil {
	  log.Warnf("error closing API: %v", err)
  }
}
shutdownCtx, shutdownCancel := context.WithCancel(context.Background())
go func() {
  if a.tracerProvider != nil {
	  a.tracerProvider.Shutdown(shutdownCtx)
   }
}()

log.Infof("Waiting %s to finish outstanding operations", duration)
<-time.After(duration)

Shouldn't the <-time.After(duration) wait be called before closing the Dapr APIs?

Additionally, if DAPR APIs are still available during graceful shutdown period, shouldn't actors (a.stopActor()) also be available?

We are also immediately calling the cancel() function of the overall context a.ctx that is being shared in runtime. Shouldn't that also be after the timeout?

Should the line <-time.After(duration) be moved after calling a.stopReadingFromBindings()?

@ItalyPaleAle
Copy link
Contributor

Yes I agree with you.

Shouldn't the <-time.After(duration) wait be called before closing the Dapr APIs?

I think that would probably be correct

Additionally, if DAPR APIs are still available during graceful shutdown period, shouldn't actors (a.stopActor()) also be available?

Not 100% sure on this, but I think what you're saying makes sense

We are also immediately calling the cancel() function of the overall context a.ctx that is being shared in runtime. Shouldn't that also be after the timeout?

Possibly, but I'm not exactly sure what the context is used for right now. But you are probably correct.

@yaron2
Copy link
Member

yaron2 commented Nov 27, 2022

Shouldn't the <-time.After(duration) wait be called before closing the Dapr APIs?

Yes, that is the correct behavior.

@mukundansundar
Copy link
Contributor

Shouldn't the <-time.After(duration) wait be called before closing the Dapr APIs?

Yes, that is the correct behavior.

Also @yaron2 ... what is your thought on the following two lines ?

Additionally, if DAPR APIs are still available during graceful shutdown period, shouldn't actors (a.stopActor()) also be available?

We are also immediately calling the cancel() function of the overall context a.ctx that is being shared in runtime. Shouldn't that also be after the timeout?

@akhilac1 akhilac1 marked this pull request as ready for review November 29, 2022 03:45
@yaron2
Copy link
Member

yaron2 commented Nov 29, 2022

Additionally, if DAPR APIs are still available during graceful shutdown period, shouldn't actors (a.stopActor()) also be available?

Actors are different, the actor runtime should be stopped when the signal is received to give the runtime the chance to disconnect from placement as soon as possible and finish the rehashing in a controlled manner. Since Dapr is the actual compute orchestrator here, I think that makes sense. Ongoing requests will be drained as part of the actor runtime behavior.

If ongoing requests getting drained isn't guaranteed by the actor runtime, then the above isn't true and we should stop actors after the grace period has elapsed.

We are also immediately calling the cancel() function of the overall context a.ctx that is being shared in runtime. Shouldn't that also be after the timeout?

It should probably be called after the timeout.

Copy link
Contributor

@ItalyPaleAle ItalyPaleAle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhilac1 please make the changes as discussed above

…ndings

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
@akhilac1
Copy link
Contributor Author

@akhilac1 please make the changes as discussed above

@ItalyPaleAle @yaron2 @mukundansundar - tagging for review

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Copy link
Contributor

@mukundansundar mukundansundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the UTs. They seem to be flaky in windows. please take a look.
Have added comments.

Akhila Chetlapalle added 2 commits November 30, 2022 21:52
…hutdown for windows tests

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
@akhilac1
Copy link
Contributor Author

Thanks for adding the UTs. They seem to be flaky in windows. please take a look.
Have added comments.

Yes. Sending SIGTERM fails and hence the assert operation fails. Fixed this to call rt.Shutdown in case we are unable to send SIGTERM

Akhila Chetlapalle added 2 commits December 1, 2022 18:25
Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Akhila Chetlapalle and others added 4 commits December 5, 2022 23:29
Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
…nto graceperiod_5481

Resolve merge conflict and add pubsub shutdown order check
Akhila Chetlapalle and others added 3 commits December 6, 2022 18:35
@akhilac1
Copy link
Contributor Author

akhilac1 commented Dec 7, 2022

@ItalyPaleAle @mukundansundar - Pinging for attention

@mukundansundar mukundansundar dismissed their stale review December 7, 2022 09:42

changes made.

@mukundansundar mukundansundar requested a review from yaron2 December 7, 2022 09:42
Akhila Chetlapalle and others added 4 commits December 9, 2022 19:44
…e method, moving trace shutdown to after api shutdown and removing go routine

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
@mukundansundar
Copy link
Contributor

/ok-to-test

1 similar comment
@mukundansundar
Copy link
Contributor

/ok-to-test

@dapr-bot
Copy link
Collaborator

dapr-bot commented Dec 10, 2022

Dapr E2E test

🔗 Link to Action run

Commit ref: 45a75fd

✅ Build succeeded for linux/amd64

  • Image tag: dapre2e333c6bf7a1l
  • Test image tag: dapre2e333c6bf7a1l

✅ Infrastructure deployed

Cluster Resource group name Azure region
Linux Dapr-E2E-dapre2e333c6bf7a1l westus3
Windows Dapr-E2E-dapre2e333c6bf7a1w westus3
Linux/arm64 Dapr-E2E-dapre2e333c6bf7a1la eastus

✅ Build succeeded for linux/arm64

  • Image tag: dapre2e333c6bf7a1la
  • Test image tag: dapre2e333c6bf7a1la

✅ Build succeeded for windows/amd64

  • Image tag: dapre2e333c6bf7a1w
  • Test image tag: dapre2e333c6bf7a1w

⚠️ Tests skipped on linux/arm64

  • Image tag: dapre2e333c6bf7a1la
  • Test image tag: dapre2e333c6bf7a1la

✅ Tests succeeded on linux/amd64

  • Image tag: dapre2e333c6bf7a1l
  • Test image tag: dapre2e333c6bf7a1l

✅ Tests succeeded on windows/amd64

  • Image tag: dapre2e333c6bf7a1w
  • Test image tag: dapre2e333c6bf7a1w

@dapr-bot
Copy link
Collaborator

dapr-bot commented Dec 10, 2022

Dapr E2E test

🔗 Link to Action run

Commit ref: 45a75fd

✅ Build succeeded for linux/amd64

  • Image tag: dapre2efa46b81260l
  • Test image tag: dapre2efa46b81260l

✅ Infrastructure deployed

Cluster Resource group name Azure region
Linux Dapr-E2E-dapre2efa46b81260l westus3
Windows Dapr-E2E-dapre2efa46b81260w westus3
Linux/arm64 Dapr-E2E-dapre2efa46b81260la eastus

✅ Build succeeded for linux/arm64

  • Image tag: dapre2efa46b81260la
  • Test image tag: dapre2efa46b81260la

✅ Build succeeded for windows/amd64

  • Image tag: dapre2efa46b81260w
  • Test image tag: dapre2efa46b81260w

⚠️ Tests skipped on linux/arm64

  • Image tag: dapre2efa46b81260la
  • Test image tag: dapre2efa46b81260la

✅ Tests succeeded on windows/amd64

  • Image tag: dapre2efa46b81260w
  • Test image tag: dapre2efa46b81260w

✅ Tests succeeded on linux/amd64

  • Image tag: dapre2efa46b81260l
  • Test image tag: dapre2efa46b81260l

}

func sendSigterm(rt *DaprRuntime) {
rt.runtimeConfig.GracefulShutdownDuration = 5 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to 2? Just so tests end quicker.

Copy link
Contributor

@mukundansundar mukundansundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@artursouza artursouza merged commit 374a582 into dapr:master Dec 13, 2022
mcandeia pushed a commit to mcandeia/dapr that referenced this pull request Jan 11, 2023
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Artur Souza <artursouza.ms@outlook.com>

Initiate Dapr shutdown after expiry of grace period - Issue 5481 (dapr#5562)

* Kick off shutdown after expiry of grace period

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* update shutdown with grace period. Add tests for pubsub actors and bindings

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* fix linting and windows incompatibility

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* fixed tests on windows. SIGTERM sending fails on windows. So invoke shutdown for windows tests

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* review comments incorporated

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* review comments incorporated

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* removed comment

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* update branch and add pubsub order check

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* Fixed trace initiation and shutdown. Updated trace Registration interface to return Tracer

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* reverting timeout pushed in test and moving trace shutdown to seperate method, moving trace shutdown to after api shutdown and removing go routine

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

* re-trigger pipeline

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>

Signed-off-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Co-authored-by: Akhila Chetlapalle <akhila@Akhilas-MacBook-Pro.local>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>
Co-authored-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Loong Dai <long.dai@intel.com>
Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>
Co-authored-by: Artur Souza <artursouza.ms@outlook.com>

Misc refactorings extracted from dapr#5170 (dapr#5609)

Changes to the resiliency.NewRunner (dapr#5645)

Remove dapr local replacement for pluggable apps (dapr#5642)

* Remove dapr local replacement for pluggable apps

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Pin v0.0.8 on k6 operator

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Set actor stress tests thresholds based on previous run (dapr#5657)

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Fix ping method invoked before Init method for pluggable components (dapr#5659)

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>
Co-authored-by: Artur Souza <artursouza.ms@outlook.com>

feature: add context to lock&pubsub API (dapr#5640)

* feature: add context to lock&pubsub API

Signed-off-by: seachen <seachen@tencent.com>

* feature: add context to lock&pubsub API

Signed-off-by: seachen <seachen@tencent.com>

* Updated pinned components-contrib

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* upgrade sigs.k8s.io/controller-runtime to v0.14.1

Signed-off-by: seachen <seachen@tencent.com>

* fixed golangci-lint

Signed-off-by: seachen <seachen@tencent.com>

Signed-off-by: seachen <seachen@tencent.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Update protoc version (dapr#5663)

Extracted from dapr#5648

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Misc test fixes (dapr#5664)

* Misc test fixes

1. Fixes some (not all) race conditions in tests for pkg/runtime
2. Improvements to test platform and the actorfeatures test to make testing locally (outside of K8s) easier
3. Some more logging in E2E test apps

Extracted from dapr#5648

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Tailscale needs a bit more resources or it can crash with OOM

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Revert change to test per review feedback.

However, this re-introduces a race condition (test fails `go test -race`) that will need to be fixed

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

optimize bulkpub resp processing (dapr#5498)

* refactor code: bulk pub res from component only contains failed entries

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* fixing dependency. fixing unit test.

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* fix error response in gRPC bulk publish API

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* fix pluggable comps go.mod

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* change to point to correct contrib commit

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* fix dependency for components-contrib

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* update contrib to latest commit

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* address review comments.

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

* remove new line

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

Register new Cloudflare KV state store and Queues binding (dapr#5632)

* Register new Cloudflare KV state store and Queues binding

See dapr/components-contrib#2363 for the new components

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Naming: workerskv

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Mod tidy

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>

fix error message typo (dapr#5681)

Signed-off-by: yaron2 <schneider.yaron@live.com>

Signed-off-by: yaron2 <schneider.yaron@live.com>

Replace `go.uber.org/atomic` and `github.com/pkg/errors` with standard library packages (dapr#5678)

* Replace `go.uber.org/atomic` and `github.com/pkg/errors` with standard library packages

The packages are now forbidden by a linter rule

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed E2E tests failing

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>

Do not start Dapr Watchdog runnable unless it's enabled (dapr#5689)

Currently, the Dapr Watchdog runnable is added to the manager whether the watchdog is enabled or not. This forces the Dapr Operator service to request leadership election in all cases, even if the disable-leader-election flag is set.

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

More realistic metric for actor id stress tests (dapr#5687)

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Bump github.com/fasthttp/router from 1.4.13 to 1.4.14 (dapr#5666)

Bumps [github.com/fasthttp/router](https://github.com/fasthttp/router) from 1.4.13 to 1.4.14.
- [Release notes](https://github.com/fasthttp/router/releases)
- [Commits](fasthttp/router@v1.4.13...v1.4.14)

---
updated-dependencies:
- dependency-name: github.com/fasthttp/router
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Streaming support in `InternalInvokeRequest` / `InternalInvokeResponse` (dapr#5648)

* WIP

- Updated pkg/messaging to make InvokeMethodRequest and InvokeMethodResponse replayable
- Updated protos

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* WIP: custom io.MultiReader with io.Closer

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* WIP

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* 💄

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* More WIP

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Use a pool for buffers in replayableRequest too

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Enabling replays where necessary

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* 💄

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixes in code and tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* More fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed the remaining unit tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Not yer time for CallLocalStream

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* These protos are unused for now

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* More currently-unused code

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Various fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Update protoc version

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Update protos

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Updated version here too

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Remove unused proto import

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Mini tweaks

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixes & other improvements-tests should now pass

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* More unit tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixes a possible panic

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Misc fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Changes to actors and to allow the test app to run in self-hosted

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* More fixes

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Some fixes for race conditions in unit tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* DRY

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Update protoc version

Extracted from dapr#5648

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Changed per review feedback

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Added unit test

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Changed per review feedback

Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Misc tweaks

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>
Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com>

Fixed: replay buffer not resized (dapr#5697)

* Fixed: replay buffer not resized

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Add unit test

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Bump github.com/prometheus/common from 0.37.0 to 0.39.0 (dapr#5694)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.37.0 to 0.39.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.37.0...v0.39.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Fix call to fetch subscriptions over gRPC but not implemented. (dapr#5652)

* Fix call to fetch subscriptions over gRPC but not implemented.

Signed-off-by: Artur Souza <artursouza.ms@outlook.com>

* Guarantee non-null subscriptions when not implemented on gRPC.

Signed-off-by: Artur Souza <artursouza.ms@outlook.com>

Signed-off-by: Artur Souza <artursouza.ms@outlook.com>
Co-authored-by: Artur Souza <artursouza.ms@outlook.com>
Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>
Co-authored-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>

Add resiliency to bulk publish API (dapr#5646)

remove unsupported k8s version, update kind action (dapr#5649)

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

Signed-off-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>

Allow passing Dapr trust bundle flags via Helm charts (dapr#5470)

* allow to pass sentry issuer related flags into charts for each components

Signed-off-by: Marco <bardelli.marco@gmail.com>

* Update according to feedback into dapr#5470

Signed-off-by: Marco <bardelli.marco@gmail.com>

* quote added flags

Signed-off-by: Marco <bardelli.marco@gmail.com>

* improve explanation in README and remove too generic not strictly needed args

Signed-off-by: Marco Bardelli <bardelli.marco@gmail.com>

Signed-off-by: Marco <bardelli.marco@gmail.com>
Signed-off-by: Marco Bardelli <bardelli.marco@gmail.com>
Co-authored-by: Alessandro (Ale) Segala <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Fixes metric grouping for CPU usage graphs. (dapr#5525)

"Total CPU Usage" graph is displaying `container_cpu_usage_seconds_total`
but it is not grouping it by application and is instead using the
`pod` field as its discriminating field.

Each application is re-deployed daily and receives a new pod-id. As
this metric graph uses the `pod` id in its legend, the same
application ends up in represented as a series of disconnected
metrics.

This PR fixes the metric to grouping distinct "pods" under the same
by "application id" using some metric-math. This in turn will allow
us to observe how a given application behaves over time.

Fixes dapr#5524

Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>

Signed-off-by: Tiago Alves Macambira <tmacam@burocrata.org>
Co-authored-by: Loong Dai <long.dai@intel.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Allow enabling preview features at build-time (dapr#5677)

* Allow enabling preview features at build-time

Added the `ENABLED_FEATURES` env var to the Makefile to define a (comma-separated) list of features that are always enabled, regardless of what's in the Configuration spec.

The `Resiliency` feature was added to the list of always-enabled features for now (replacing the previous "hack" to have it always enabled - see dapr#5523).

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Fixed unit tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Features in unit tests

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Dapr Bot <56698301+dapr-bot@users.noreply.github.com>
Co-authored-by: Artur Souza <asouza.pro@gmail.com>

Fix profile port

Performance tests for pubsub http (dapr#5683)

* Add pubsub perf test for multiples message size and delayed requests

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Add pubsub test on dapr_test mk

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Support appID parameter

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Preallocate VUs

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* se in memory broker for pubsub http tests perf

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Set start time and use shared array

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Fix line formating

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Using realistic thresholds for actor type stress test (dapr#5710)

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

return error on duplicated entry IDs in gRPC bulk publish (dapr#5672)

Add metrics labels regex rules (dapr#5716)

* add metrics labels regex rules

Signed-off-by: yaron2 <schneider.yaron@live.com>

* linter

Signed-off-by: yaron2 <schneider.yaron@live.com>

* update header to correct year

Signed-off-by: yaron2 <schneider.yaron@live.com>

* linter

Signed-off-by: yaron2 <schneider.yaron@live.com>

Signed-off-by: yaron2 <schneider.yaron@live.com>

Deprecation notice for gRPC service invocation API (dapr#5324)

* Deprecation notice for gRPC service invocation API

Signed-off-by: sunzhaochang <zhchsun1992@gmail.com>

* Add deprecation notices automatically when generating release notes

Signed-off-by: sunzhaochang <zhchsun1992@gmail.com>

* Update api.go

Signed-off-by: Yaron Schneider <schneider.yaron@live.com>

* Update api.go

Signed-off-by: Yaron Schneider <schneider.yaron@live.com>

Signed-off-by: sunzhaochang <zhchsun1992@gmail.com>
Signed-off-by: Yaron Schneider <schneider.yaron@live.com>
Co-authored-by: Yaron Schneider <schneider.yaron@live.com>

Resiliency Support for Bulk Subscribe (dapr#5603)

* Add filter for resiliency policy

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Delete unrequired

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Add Resiliency Support via Accumulator and misc refactorings

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Fix linting

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Incorporate review comments

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Fix filter in bulkpub_resiliency

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Add cap assertions

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Add locks

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Add locks

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* Incorporate review comments

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

* contenttype correction

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>

Signed-off-by: Deepanshu Agarwal <deepanshu.agarwal1984@gmail.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Co-authored-by: Mukundan Sundararajan <65565396+mukundansundar@users.noreply.github.com>
Co-authored-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

fix flaky test (dapr#5723)

Signed-off-by: yaron2 <schneider.yaron@live.com>

Signed-off-by: yaron2 <schneider.yaron@live.com>

fix flaky test sync issue (dapr#5728)

Signed-off-by: yaron2 <schneider.yaron@live.com>

Signed-off-by: yaron2 <schneider.yaron@live.com>

Misc refactorings and fixes to shutdown sequence (dapr#5729)

This PR contains misc refactorings extracted from the "firewall" branch, including some fixes to the shutdown sequence.

Two user-facing changes:

1. Fixed: cannot stop daprd if it's waiting for the app to come online (often happens if the app crashed while using the Dapr CLI)
2. Can force shutdown (aborting any graceful shutdown sequence) by sending a second SIGTERM/SIGINT.

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Fix pluggable component withblock usage on tests (dapr#5724)

* Fix pluggable component withblock usage on tests

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

* Add grpc server listener

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Signed-off-by: Marcos Candeia <marrcooos@gmail.com>

Make Resiliency stable (dapr#5732)

* Make Resiliency stable

Remove the "Resiliency" feature flag and all the code paths where we biforcated based on whether Resiliency was enabled or not

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

* Remove feature flag from Makefile

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>

Initialize metrics prior to loading resiliency

Resiliency was being loaded before the actual metric views/fields
were being init. This caused the Resiliency init metric to be lost.
This commit moves the init up a bit to go before Resiliency.

dapr#5711

Signed-off-by: halspang <halspang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scale down in environment with Dapr Side Car does not work as expected - Getting 503 Response
7 participants