Skip to content

Conversation

alltilla
Copy link
Collaborator

@alltilla alltilla commented Dec 15, 2023

BigQuery does not let us send a batch with more than 10 MB data: https://cloud.google.com/bigquery/quotas#write-api-limits

The new batch-bytes() option let's the user set an upper bytes size limit of the batch. By default it is 10 MB, and syslog-ng does not let the user set a larger value.

Please note that due to a framework limitation, the batch is possible to be at most 1 message larger than the set limit. You can set this limit to a lower value, but BigQuery never dropped a batch because of this in my manual E2E tests. This might be because maybe Google meant 10 MiB as the limit not 10 MB, but we are following what the doc says.


Example config:

  bigquery(
    project("syslog-ng-bigquery-project")
    dataset("syslog_ng_bigquery_dataset")
    table("syslog-ng-bigquery-table")
    schema(
      "my_string" STRING => "${MESSAGE}"
      "my_int" INTEGER => "${R_MSEC}"
      "my_bool" bool => "$(% ${R_MSEC} 2) == 0"
    )
    workers(16)
    log-fifo-size(1000000)

    batch-timeout(5000) # ms
    batch-lines(1000000) # Huge limit, batch-bytes() will limit us sooner

    # batch-bytes(21MiB) # syslog-ng will not start as it is larger than 10MB
    batch-bytes(1MB) # closes and flushes the batch after the last message pushed it above the 1MB limit
    # not setting batch-bytes() defaults to 10MB, which is the maximal allowed by BigQuery
  );

New stats are also added to the legacy CSV output:

$ ./sbin/syslog-ng-ctl stats | grep "msg\|batch"
dst.bigquery;d_bq#0;bigquery,bigquerystorage.googleapis.com,syslog-ng-test-project,syslog_ng_test_dataset,syslog-ng-test-table;a;batch_size_avg;27828
dst.bigquery;d_bq#0;bigquery,bigquerystorage.googleapis.com,syslog-ng-test-project,syslog_ng_test_dataset,syslog-ng-test-table;a;batch_size_max;27828
dst.bigquery;d_bq#0;bigquery,bigquerystorage.googleapis.com,syslog-ng-test-project,syslog_ng_test_dataset,syslog-ng-test-table;a;msg_size_avg;6957
dst.bigquery;d_bq#0;bigquery,bigquerystorage.googleapis.com,syslog-ng-test-project,syslog_ng_test_dataset,syslog-ng-test-table;a;msg_size_max;6957
global;msg_clones;;a;processed;0

No news file needed as bigquery() was not released in a stable version, yet.

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Copy link
Contributor

This Pull Request introduces config grammar changes

syslog-ng/a0eeab6bda21cc29dc73a56b468dff8a6947b8ca -> alltilla/bigquery-batch-bytes

--- a/destination
+++ b/destination

 bigquery(
+    batch-bytes(<positive-integer>)
 )

bazsi
bazsi previously approved these changes Dec 15, 2023
Copy link
Collaborator

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I had just one question but I'm fine with being this as is

@alltilla
Copy link
Collaborator Author

Added a comment to the code:

  /*
   * We check the last added row instead of storing it in a variable, checking its size then adding it to the batch,
   * because if we did that, add_serialized_rows() would copy the std::string instead of moving it, which hurts perf.
   */

@alltilla alltilla force-pushed the bigquery-batch-bytes branch from 6cef03d to 7efbdcd Compare December 15, 2023 12:41
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Unfortunately, currently only the legacy CSV stats output shows these
counters.

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
@alltilla alltilla force-pushed the bigquery-batch-bytes branch from 7efbdcd to e9c7850 Compare December 15, 2023 13:37
@alltilla
Copy link
Collaborator Author

Thanks to @OverOrion, I managed to make it cleaner with std::move() :)

@alltilla alltilla requested review from bazsi and OverOrion December 15, 2023 13:39
Copy link
Collaborator

@OverOrion OverOrion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this!

@OverOrion OverOrion merged commit 1857223 into syslog-ng:master Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants