Skip to content

Conversation

lukedirtwalker
Copy link
Collaborator

@lukedirtwalker lukedirtwalker commented Oct 14, 2019

Adds metrics:

# HELP sd_path_request_duration_seconds The duration of path requests in sciond.
# TYPE sd_path_request_duration_seconds histogram
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.01"} 1
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.02"} 6
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.04"} 11
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.08"} 15
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.16"} 16
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.32"} 17
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.64"} 17
sd_path_request_duration_seconds_bucket{result="ok_success",le="1.28"} 18
sd_path_request_duration_seconds_bucket{result="ok_success",le="2.56"} 20
sd_path_request_duration_seconds_bucket{result="ok_success",le="5.12"} 21
sd_path_request_duration_seconds_bucket{result="ok_success",le="10.24"} 21
sd_path_request_duration_seconds_bucket{result="ok_success",le="+Inf"} 21
sd_path_request_duration_seconds_sum{result="ok_success"} 9.515635177999998
sd_path_request_duration_seconds_count{result="ok_success"} 21
# HELP sd_path_requests_total The amount of path requests sciond received.
# TYPE sd_path_requests_total counter
sd_path_requests_total{dst="1",result="ok_success"} 15
sd_path_requests_total{dst="2",result="ok_success"} 6

Also adds similar metrics for all other SD requests: as_info, if_info, service_info, and revocation.

Contributes #2156


This change is Reviewable

@lukedirtwalker lukedirtwalker requested a review from scrye October 14, 2019 11:25
@lukedirtwalker lukedirtwalker added c/observability Metrics, logging, tracing SCIOND labels Oct 14, 2019
Copy link
Contributor

@scrye scrye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @lukedirtwalker)


go/sciond/internal/metrics/metrics.go, line 45 at r1 (raw file):

// Result values
const (

Are these redefined instead of used directly for documentation purposes? I'm wondering if I'm missing something.

It's a nice idea, I really like it 👍


go/sciond/internal/metrics/metrics.go, line 52 at r1 (raw file):

var (
	resultLabel = []string{"result"}

prom.LabelResult?


go/sciond/internal/metrics/metrics.go, line 119 at r1 (raw file):

	return PathRequest{
		count: prom.NewCounterVec(Namespace, subsystemPath, "requests_total",
			"The amount of path requests sciond received.", PathRequestLabels{}.Labels()),

Can probably drop sciond out of the description, it's clear from the Namespace.

Similar comment for the other descriptions below.


go/sciond/internal/metrics/metrics.go, line 121 at r1 (raw file):

			"The amount of path requests sciond received.", PathRequestLabels{}.Labels()),
		latency: prom.NewHistogramVec(Namespace, subsystemPath, "request_duration_seconds",
			"The duration of path requests in sciond.", resultLabel, prom.DefaultLatencyBuckets),

It's not clear from the help string whether this is the "received request -> responded to client" duration, or the "received request -> request handler finished" duration. Due to early triggers, this gets a bit confusing, because goroutines created by the handler will still show up after the handler finished.

Also, it's not 100% clear if it refers to SCIOND Client API path requests, or segment requests sent out on the network (even though the different terminology does clear that up somewhat).

How do you feel about changing it to "Time between receiving a path request and replying to it" or "Time to handle a path request"?

Similar comment for some of the descriptions below.


go/sciond/internal/metrics/metrics.go, line 143 at r1 (raw file):

func newRevocation() Revocation {
	return Revocation{
		count: prom.NewCounterVec(Namespace, subsystemRevocation+"s", "total",

Can drop the +"s", it's not perfect English but it's cleaner for grepping.

Adds metrics:
```
# HELP sd_path_request_duration_seconds The duration of path requests in sciond.
# TYPE sd_path_request_duration_seconds histogram
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.01"} 1
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.02"} 6
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.04"} 11
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.08"} 15
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.16"} 16
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.32"} 17
sd_path_request_duration_seconds_bucket{result="ok_success",le="0.64"} 17
sd_path_request_duration_seconds_bucket{result="ok_success",le="1.28"} 18
sd_path_request_duration_seconds_bucket{result="ok_success",le="2.56"} 20
sd_path_request_duration_seconds_bucket{result="ok_success",le="5.12"} 21
sd_path_request_duration_seconds_bucket{result="ok_success",le="10.24"} 21
sd_path_request_duration_seconds_bucket{result="ok_success",le="+Inf"} 21
sd_path_request_duration_seconds_sum{result="ok_success"} 9.515635177999998
sd_path_request_duration_seconds_count{result="ok_success"} 21
# HELP sd_path_requests_total The amount of path requests sciond received.
# TYPE sd_path_requests_total counter
sd_path_requests_total{dst="1",result="ok_success"} 15
sd_path_requests_total{dst="2",result="ok_success"} 6
```

Also adds similar metrics for all other SD requests: `as_info`, `if_info`, `service_info`, and `revocation`.

Contributes scionproto#2156
Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @scrye)


go/sciond/internal/metrics/metrics.go, line 45 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

Are these redefined instead of used directly for documentation purposes? I'm wondering if I'm missing something.

It's a nice idea, I really like it 👍

It's in the design doc under https://github.com/scionproto/scion/blob/master/doc/Metrics.md#definition
I would like it more if it would copy the doc string of the prom vars.


go/sciond/internal/metrics/metrics.go, line 52 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

prom.LabelResult?

Done.


go/sciond/internal/metrics/metrics.go, line 119 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

Can probably drop sciond out of the description, it's clear from the Namespace.

Similar comment for the other descriptions below.

Done.


go/sciond/internal/metrics/metrics.go, line 121 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

It's not clear from the help string whether this is the "received request -> responded to client" duration, or the "received request -> request handler finished" duration. Due to early triggers, this gets a bit confusing, because goroutines created by the handler will still show up after the handler finished.

Also, it's not 100% clear if it refers to SCIOND Client API path requests, or segment requests sent out on the network (even though the different terminology does clear that up somewhat).

How do you feel about changing it to "Time between receiving a path request and replying to it" or "Time to handle a path request"?

Similar comment for some of the descriptions below.

Done.


go/sciond/internal/metrics/metrics.go, line 143 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

Can drop the +"s", it's not perfect English but it's cleaner for grepping.

Done.

Copy link
Contributor

@scrye scrye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@lukedirtwalker lukedirtwalker merged commit 76944ae into scionproto:master Oct 16, 2019
@lukedirtwalker lukedirtwalker deleted the pubSDMetrics branch October 16, 2019 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/observability Metrics, logging, tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants