-
Notifications
You must be signed in to change notification settings - Fork 2k
Improve component init error logs and metrics #4975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Codecov Report
@@ Coverage Diff @@
## master #4975 +/- ##
==========================================
- Coverage 65.47% 65.44% -0.04%
==========================================
Files 113 114 +1
Lines 12723 12735 +12
==========================================
+ Hits 8331 8334 +3
- Misses 3809 3816 +7
- Partials 583 585 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
For standardizing the error - what if instead we created a new error type DaprComponentInitError or something like that and change the Init interface to return that. And then we update the runtime to just log that error if Init returns an error of this type. Creating a new error type and expecting this in the return is a much more guaranteed way to keep things consistent going forward. Nothing prevents a new component from failing to return the right error type with your proposal here. This puts unnecessary burden on approvers and maintainers to ensure Init errors are consistent. |
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
@@ -1356,18 +1356,18 @@ func (a *DaprRuntime) isAppSubscribedToBinding(binding string) bool { | |||
func (a *DaprRuntime) initInputBinding(c components_v1alpha1.Component) error { | |||
binding, err := a.bindingsRegistry.CreateInputBinding(c.Spec.Type, c.Spec.Version) | |||
if err != nil { | |||
log.Warnf("failed to create input binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you removed this logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is log multiple times and will eventually get captured here. https://github.com/dapr/dapr/blob/master/pkg/runtime/runtime.go#L2158
log.Errorf("failed to init input binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err) | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init") | ||
return err | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init", c.ObjectMeta.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is log multiple times and will eventually get captured here. https://github.com/dapr/dapr/blob/master/pkg/runtime/runtime.go#L2158
Can add it back but it seemed redundant to print the same message multiple times
log.Warnf("failed to create output binding %s (%s/%s): %s", c.ObjectMeta.Name, c.Spec.Type, c.Spec.Version, err) | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation") | ||
return err | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation", c.ObjectMeta.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered above
log.Warnf("failed to init secret store %s/%s named %s: %s", c.Spec.Type, c.Spec.Version, c.ObjectMeta.Name, err) | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init") | ||
return err | ||
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init", c.ObjectMeta.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these breaking changes? The metrics are part of the contracts and should be compatible, it means that someone with alerts on the existing metrics should not have to change their alerts when they upgrade to this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be breaking changes, I have validated that our dapr grafana dashboards are able to still recieve this metric and trigger alerts unchanged.
pkg/runtime/initError.go
Outdated
type InitErrorKind string | ||
|
||
const ( | ||
InitFailure InitErrorKind = "InitFailure" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want these to be printed out, we should agree if these should be app caps, camel case or pascal case. Last but not least, we should agree on these error types since these become contractual.
I personally think these should be ALL_CAPS and we can keep these failure types here.
/cc @yaron2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on ALL_CAPS. cc @daixiang0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on ALL_CAPS as well.
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
pkg/runtime/initError.go
Outdated
|
||
func (e *InitError) Error() string { | ||
if e.entity == "" { | ||
return fmt.Sprintf("(%s): %s", e.kind, e.err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use square brackets for e.kind
please
pkg/runtime/initError.go
Outdated
|
||
func (e *InitError) Error() string { | ||
if e.entity == "" { | ||
return fmt.Sprintf("(%s): %s", e.kind, e.err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[ ]" is better.
pkg/runtime/initError.go
Outdated
return fmt.Sprintf("(%s): %s", e.kind, e.err) | ||
} | ||
|
||
return fmt.Sprintf("(%s): initialization error occurred for %s: %s", e.kind, e.entity, e.err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[ ]" is better.
Signed-off-by: Addison Juarez <adjuarez@microsoft.com>
* master: (134 commits) Record metric for service updates (dapr#5114) chore: remove duplicate word in comments (dapr#5099) Run e2e tests in previous version of K8s on AKS to see if it's breaking Windows tests (dapr#5104) add {namespace} option to set consumerID Add tracing passthrough for bindings (dapr#5049) Improve component init error logs and metrics (dapr#4975) Improvements to E2E test apps (dapr#5042) Fix component name for aws dynamodb (dapr#5095) Fix perf tests not building (dapr#5096) Fix missing components in Metadata API result. (dapr#5052) Fixed log message (dapr#5088) Disable IPFS binding for now (dapr#5092) optimize/concurrency: pubsub concurrency map writes in runtime unit test. (dapr#5086) Configuration API response to use dictionaries - based on PR dapr#4695 (dapr#5083) correct compile error correct checking for actors with same id fix type error in print fix typo add log when actor start time and end time is not expected Components auto-registration (dapr#5062) ... Signed-off-by: zhangchao <zchao9100@gmail.com> # Conflicts: # go.mod # pkg/grpc/api.go # pkg/grpc/api_test.go # pkg/http/api.go # pkg/runtime/runtime.go
Description
Added a componentName tag to the dapr_runtime_component_init_fail_total metric. The metric should now be produced when init failures are logged.
Also have standardized the component failure error logging for creation/init stages. All component init/create errors should now be logged.
Also removed some redundant logging of errors.
A new error type InitErrors was created and was modeled after custom etag errors. We have for initError codes and have a standardized format for error messages.
The new error message will look like :
(InitComponentFailure): initialization error occurred for statestore2 (state.redis/v1): init timeout for component statestore2 exceeded after 5s"
The new error codes.
Issue reference
Part of the work involved with this.
dapr/components-contrib#1752
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: