-
Notifications
You must be signed in to change notification settings - Fork 490
otel: fixes and perf adjustments #4827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
to control how often we call batch cbs. Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
3c45564
to
c56a0a2
Compare
response_status = ::grpc::Status(::grpc::StatusCode::UNAVAILABLE, "Server is unavailable"); | ||
break; | ||
} | ||
worker.post(msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly will happen to the clients if we block and don't respond within a reasonable time?
Previously, we answered UNAVAILABLE
, which seemed more native to the gRPC/HTTP environment.
Also, what was the trigger of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a request timeout is not set on the client side, they will wait indefinitely.
If timeout is set, the client will get a DEADLINE_EXCEEDED
result, which should be handled as a temporary error (we do so in the otel dest).
https://grpc.io/docs/guides/deadlines/#deadlines-on-the-client
We have seen that syslog-ng "gives up" too quickly if it cannot forward the message. If we have multiple otel source workers, and they receive requests at the same exact time, the destination, which grabs those messages might not be able to grab them quickly enough. So even if we used a devnull destination, which can process 2 million messages per sec, sometimes it can put backpressure on the otel source, if there is a high transient load coming from it. We can raise log-iw-size()
to buffer more messages and flatten these transients, but it raises the memory footprint considerably.
We could have a timeout for this, like trying blocking post, but surrendering after x seconds and returning unavailable. But I believe the current implementation is completely functional, too, at least on the server side. It would be nice to have options for request timeout on the client side (otel dest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. My only concern is that the original otel-source implementation was not necessarily a "thread-per-client" implementation. By blocking, I think we make it so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
However, concurrent-requests()
makes us able to scale to more clients than workers()
, which should be sufficient. With it the workers()
option detaches from limiting the number of clients (gRPC will handle them for us if there are enough ServiceCalls
registered), and just configures the processing parallelization.
This Pull Request introduces config grammar changessyslog-ng/be526b484d8913b98edbe043e0d5a783d6d0e7d7 -> alltilla/otel-perf-adjustment --- a/destination
+++ b/destination
bigquery(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
loki(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
opentelemetry(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
syslog-ng-otlp(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
)
--- a/source
+++ b/source
opentelemetry(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
+ concurrent-requests(<positive-integer>)
+ log-fetch-limit(<nonnegative-integer>)
)
syslog-ng-otlp(
+ channel-args(
+ <empty>
+ <string> => <number>
+ <string> => <string>
+ )
+ concurrent-requests(<positive-integer>)
+ log-fetch-limit(<nonnegative-integer>)
)
|
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
c56a0a2
to
bdb6990
Compare
Pushed style fixes. |
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
No description provided.