Skip to content

Improve component init error logs and metrics #4975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Aug 30, 2022

Conversation

addjuarez
Copy link
Contributor

@addjuarez addjuarez commented Aug 4, 2022

Description

Added a componentName tag to the dapr_runtime_component_init_fail_total metric. The metric should now be produced when init failures are logged.

Also have standardized the component failure error logging for creation/init stages. All component init/create errors should now be logged.

Also removed some redundant logging of errors.

A new error type InitErrors was created and was modeled after custom etag errors. We have for initError codes and have a standardized format for error messages.
The new error message will look like :
(InitComponentFailure): initialization error occurred for statestore2 (state.redis/v1): init timeout for component statestore2 exceeded after 5s"
The new error codes.

  • InitComponetFailure : Any component failures to initialize or complete their init() method.
  • InitFailure : Any non component failures like actor init failures
  • CreateComponentFailure : component failed to be added to the registry

Issue reference

Part of the work involved with this.
dapr/components-contrib#1752

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Merging #4975 (c38343a) into master (29233c6) will decrease coverage by 0.03%.
The diff coverage is 46.60%.

@@            Coverage Diff             @@
##           master    #4975      +/-   ##
==========================================
- Coverage   65.47%   65.44%   -0.04%     
==========================================
  Files         113      114       +1     
  Lines       12723    12735      +12     
==========================================
+ Hits         8331     8334       +3     
- Misses       3809     3816       +7     
- Partials      583      585       +2     
Impacted Files Coverage Δ
pkg/runtime/runtime.go 67.62% <40.69%> (-0.12%) ⬇️
pkg/diagnostics/service_monitoring.go 49.23% <50.00%> (ø)
pkg/http/api.go 79.53% <66.66%> (+0.01%) ⬆️
pkg/runtime/initError.go 71.42% <71.42%> (ø)
pkg/grpc/api.go 72.85% <100.00%> (-0.09%) ⬇️
pkg/actors/internal/placement.go 87.56% <0.00%> (-1.56%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
@berndverst
Copy link
Member

For standardizing the error - what if instead we created a new error type DaprComponentInitError or something like that and change the Init interface to return that. And then we update the runtime to just log that error if Init returns an error of this type.

Creating a new error type and expecting this in the return is a much more guaranteed way to keep things consistent going forward. Nothing prevents a new component from failing to return the right error type with your proposal here. This puts unnecessary burden on approvers and maintainers to ensure Init errors are consistent.

addjuarez and others added 5 commits August 10, 2022 14:53
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: addjuarez <addiajuarez@gmail.com>
@addjuarez addjuarez marked this pull request as ready for review August 11, 2022 14:22
@addjuarez addjuarez requested review from a team as code owners August 11, 2022 14:22
@addjuarez addjuarez marked this pull request as draft August 11, 2022 18:35
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
@@ -1356,18 +1356,18 @@ func (a *DaprRuntime) isAppSubscribedToBinding(binding string) bool {
func (a *DaprRuntime) initInputBinding(c components_v1alpha1.Component) error {
binding, err := a.bindingsRegistry.CreateInputBinding(c.Spec.Type, c.Spec.Version)
if err != nil {
log.Warnf("failed to create input binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you removed this logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is log multiple times and will eventually get captured here. https://github.com/dapr/dapr/blob/master/pkg/runtime/runtime.go#L2158

log.Errorf("failed to init input binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err)
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init")
return err
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init", c.ObjectMeta.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is log multiple times and will eventually get captured here. https://github.com/dapr/dapr/blob/master/pkg/runtime/runtime.go#L2158
Can add it back but it seemed redundant to print the same message multiple times

log.Warnf("failed to create output binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err)
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation")
return err
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation", c.ObjectMeta.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above

log.Warnf("failed to init secret store %s/%s named %s: %s", c.Spec.Type, c.Spec.Version, c.ObjectMeta.Name, err)
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init")
return err
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init", c.ObjectMeta.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these breaking changes? The metrics are part of the contracts and should be compatible, it means that someone with alerts on the existing metrics should not have to change their alerts when they upgrade to this version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be breaking changes, I have validated that our dapr grafana dashboards are able to still recieve this metric and trigger alerts unchanged.

type InitErrorKind string

const (
InitFailure InitErrorKind = "InitFailure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want these to be printed out, we should agree if these should be app caps, camel case or pascal case. Last but not least, we should agree on these error types since these become contractual.

I personally think these should be ALL_CAPS and we can keep these failure types here.

/cc @yaron2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on ALL_CAPS. cc @daixiang0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on ALL_CAPS as well.

@addjuarez addjuarez marked this pull request as ready for review August 19, 2022 03:09
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>

func (e *InitError) Error() string {
if e.entity == "" {
return fmt.Sprintf("(%s): %s", e.kind, e.err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use square brackets for e.kind please


func (e *InitError) Error() string {
if e.entity == "" {
return fmt.Sprintf("(%s): %s", e.kind, e.err)
Copy link
Contributor

@artursouza artursouza Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[ ]" is better.

return fmt.Sprintf("(%s): %s", e.kind, e.err)
}

return fmt.Sprintf("(%s): initialization error occurred for %s: %s", e.kind, e.entity, e.err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[ ]" is better.

addjuarez and others added 5 commits August 19, 2022 16:13
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
@artursouza artursouza merged commit 25f74fd into dapr:master Aug 30, 2022
Taction added a commit to Taction/dapr that referenced this pull request Sep 5, 2022
* master: (134 commits)
  Record metric for service updates (dapr#5114)
  chore: remove duplicate word in comments (dapr#5099)
  Run e2e tests in previous version of K8s on AKS to see if it's breaking Windows tests (dapr#5104)
  add {namespace} option to set consumerID
  Add tracing passthrough for bindings (dapr#5049)
  Improve component init error logs and metrics (dapr#4975)
  Improvements to E2E test apps (dapr#5042)
  Fix component name for aws dynamodb (dapr#5095)
  Fix perf tests not building (dapr#5096)
  Fix missing components in Metadata API result. (dapr#5052)
  Fixed log message (dapr#5088)
  Disable IPFS binding for now (dapr#5092)
  optimize/concurrency: pubsub concurrency map writes in runtime unit test. (dapr#5086)
  Configuration API response to use dictionaries - based on PR dapr#4695 (dapr#5083)
  correct compile error
  correct checking for actors with same id
  fix type error in print
  fix typo
  add log when actor start time and end time is not expected
  Components auto-registration (dapr#5062)
  ...

Signed-off-by: zhangchao <zchao9100@gmail.com>

# Conflicts:
#	go.mod
#	pkg/grpc/api.go
#	pkg/grpc/api_test.go
#	pkg/http/api.go
#	pkg/runtime/runtime.go
@addjuarez addjuarez deleted the initLogging branch October 19, 2022 21:16
@addjuarez addjuarez restored the initLogging branch October 19, 2022 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants