-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add (CSV) file logger #17692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add (CSV) file logger #17692
Conversation
3495cf6
to
a0c94ed
Compare
Partially fixes #17714. The problem was that we broke the existing http logging infrastructure when moving to the new logger. In this PR i partially the old behaviour for stdout logging by enabling the new http logger with the `stdout` storage whenever the `enable_http_logging` setting is set. This means that the stdout logging now works similarly to how it did before. I did not manage to find a good way to restore the http logging to a file though. However in DuckDB v1.4 we will be able to do this by using #17692
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes - looks great, some more comments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Looks great - some more comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good - some minor comments then this is good to go
Thanks! |
Question / idea: should there be by default a UUID (randomized once per duckdb instance) that decides the name of the files it gets logged to? This solves both N invocation of the same process keep adding to the same file, and multiple parallel processes conflicting on the log resources. |
Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)
Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)
This PR introduces a major change to the LogStorage
file
logging storage that appends log entries to csv file(s)file
logging supports both a normalized and denormalized mode which results in a single or multiple filesfile
andstdout
log storages)duckdb_logs
view back to be a bind-replace function instead of a view: this means we can efficiently scan normalized log tables as well as denormalized log storages with the same function. (using a join if normalized and a simple scan if denormalized)How to use the new enable_logging function
This PR moves towards using the
enable_logging
function as the primary way of configure logging. The simplest way to use the file log storage is:This will enable logging in it's default config, which is currently the
memory
storage with log levelINFO
.To use a different storage, one can use:
The following will make sure DuckDB sends all log entries at the default log level or higher (by default
D_INFO
) are written to two csv files./my_logs/duckdb_log_entries.csv
and./my_logs/duckdb_log_contexts.csv
which are written in the same schema as theduckdb_logs()
andduckdb_log_contexts()
table functions.To query the logs, the same table functions
duckdb_logs()
andduckdb_log_contexts()
can be used, meaning that the default viewduckdb_logs
works as well:FROM duckdb_logs
The above query uses
bind_replace
to return a scan + join over the two csv files automatically.More ways to enable logging
The
enable_logging
pragma has been expanded to take various parameters, including a struct fieldstorage_config
which is passed through to the log storage to configure it in an extensible way. Let's go over some examples:Note that in the last query we can see that some log storage config params are pulled up into the
enable_logging
function as named parameters with astorage_
prefix. These clean up the UX for some common options. Currently available are:storage_path
storage_normalize
storage_buffer_size
Another important thing to realize is that
enable_logging
will completely reset the logging config. So any manually specified logging related config (e.g. using the SET variables) will be lost once called. I feel this is a desirable trait though given the complexity of the number of states the logging mechanism can be in, allowing configuring the logging only once on enabling it keeps things sane.Buffer sizes
As seen in section before, buffer sizes are configurable. The default values can differ between log storages:
memory
: STANDARD_VECTOR_SIZEstdout
: 1 (flush on every write)file
: STANDARD_VECTOR_SIZETODOs
There are some problems around deadlocking that need further attention / could use better testing.
For example, the code can easily deadlock when a LogStorage tries to grab a lock while flushing. We need to make sure that writing a log entry is guaranteed to not grab any locks that the code calling the logger might hold. There's a hotfix coming in to fix this in v1.3-ossivalis, which needs to be applied to this PR too where we can not use the buffermanager while flushing logs. Instead we should rely on the default allocator to avoid grabbing a log when flushing.