Skip to content

Conversation

alltilla
Copy link
Collaborator

@alltilla alltilla commented Feb 12, 2024

No description provided.

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
to control how often we call batch cbs.

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
alltilla added a commit to alltilla/syslog-ng that referenced this pull request Feb 12, 2024
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
@alltilla alltilla force-pushed the otel-perf-adjustment branch from 3c45564 to c56a0a2 Compare February 12, 2024 09:38
response_status = ::grpc::Status(::grpc::StatusCode::UNAVAILABLE, "Server is unavailable");
break;
}
worker.post(msg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly will happen to the clients if we block and don't respond within a reasonable time?
Previously, we answered UNAVAILABLE, which seemed more native to the gRPC/HTTP environment.

Also, what was the trigger of this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a request timeout is not set on the client side, they will wait indefinitely.
If timeout is set, the client will get a DEADLINE_EXCEEDED result, which should be handled as a temporary error (we do so in the otel dest).

https://grpc.io/docs/guides/deadlines/#deadlines-on-the-client


We have seen that syslog-ng "gives up" too quickly if it cannot forward the message. If we have multiple otel source workers, and they receive requests at the same exact time, the destination, which grabs those messages might not be able to grab them quickly enough. So even if we used a devnull destination, which can process 2 million messages per sec, sometimes it can put backpressure on the otel source, if there is a high transient load coming from it. We can raise log-iw-size() to buffer more messages and flatten these transients, but it raises the memory footprint considerably.

We could have a timeout for this, like trying blocking post, but surrendering after x seconds and returning unavailable. But I believe the current implementation is completely functional, too, at least on the server side. It would be nice to have options for request timeout on the client side (otel dest).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. My only concern is that the original otel-source implementation was not necessarily a "thread-per-client" implementation. By blocking, I think we make it so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

However, concurrent-requests() makes us able to scale to more clients than workers(), which should be sufficient. With it the workers() option detaches from limiting the number of clients (gRPC will handle them for us if there are enough ServiceCalls registered), and just configures the processing parallelization.

Copy link
Contributor

This Pull Request introduces config grammar changes

syslog-ng/be526b484d8913b98edbe043e0d5a783d6d0e7d7 -> alltilla/otel-perf-adjustment

--- a/destination
+++ b/destination

 bigquery(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
 )

 loki(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
 )

 opentelemetry(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
 )

 syslog-ng-otlp(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
 )

--- a/source
+++ b/source

 opentelemetry(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
+    concurrent-requests(<positive-integer>)
+    log-fetch-limit(<nonnegative-integer>)
 )

 syslog-ng-otlp(
+    channel-args(
+        <empty>
+        <string> => <number>
+        <string> => <string>
+    )
+    concurrent-requests(<positive-integer>)
+    log-fetch-limit(<nonnegative-integer>)
 )

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
@alltilla alltilla force-pushed the otel-perf-adjustment branch from c56a0a2 to bdb6990 Compare February 12, 2024 10:37
@alltilla
Copy link
Collaborator Author

Pushed style fixes.

@MrAnno MrAnno merged commit 79ed5c9 into syslog-ng:master Feb 12, 2024
bshifter pushed a commit to bshifter/syslog-ng that referenced this pull request Feb 19, 2024
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
bshifter pushed a commit to bshifter/syslog-ng that referenced this pull request Feb 22, 2024
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants