Skip to content

Conversation

samansmink
Copy link
Contributor

@samansmink samansmink commented May 28, 2025

This PR introduces a major change to the LogStorage

  • adds a new file logging storage that appends log entries to csv file(s)
    • file logging supports both a normalized and denormalized mode which results in a single or multiple files
  • reworks log storage to make implementing new log storages (hopefully) a blast
    • 2 base classes can be inherited from to reuse shared log storage code
      • BufferingLogStorage (used by all 3 available log storages)
      • CSVLogStorage (used by file and stdout log storages)
  • adds a mechanism for passing/parsing configuration parameters to log storages in an extensible way
  • adds switched the enable_logging, disable_logging and truncate_duckdb_logs to be table functions and not pragmas
  • changes the duckdb_logs view back to be a bind-replace function instead of a view: this means we can efficiently scan normalized log tables as well as denormalized log storages with the same function. (using a join if normalized and a simple scan if denormalized)

How to use the new enable_logging function

This PR moves towards using the enable_logging function as the primary way of configure logging. The simplest way to use the file log storage is:

CALL enable_logging();

This will enable logging in it's default config, which is currently the memory storage with log level INFO.

To use a different storage, one can use:

CALL enable_logging(storage='file', storage_path='./my_logs');

The following will make sure DuckDB sends all log entries at the default log level or higher (by default D_INFO) are written to two csv files ./my_logs/duckdb_log_entries.csv and ./my_logs/duckdb_log_contexts.csv which are written in the same schema as the duckdb_logs() and duckdb_log_contexts() table functions.

To query the logs, the same table functions duckdb_logs() and duckdb_log_contexts() can be used, meaning that the default view duckdb_logs works as well:

FROM duckdb_logs

The above query uses bind_replace to return a scan + join over the two csv files automatically.

More ways to enable logging

The enable_logging pragma has been expanded to take various parameters, including a struct field storage_config which is passed through to the log storage to configure it in an extensible way. Let's go over some examples:

-- log all log messages of level `trace` to stdout
CALL enable_logging(level='trace', storage='stdout');
-- log only log messages of the `FileSystem` type, to a single, denormalized file `some/log/file.csv`
CALL enable_logging('FileSystem', storage='file', storage_config={'path': 'some/log/file.csv'});
-- log only log messages of the `FileSystem` type, to two, normalized files in the path `some/log/file/path`. The buffer size of the log storage is 5000
CALL enable_logging('FileSystem', storage='file', storage_config={'path': 'some/log/file/path', 'buffer_size': 5000});
-- Same as last query, but leverage some syntactic sugar to automatically forward common settings path and buffer size. Also infers storage='file' from presence of path
CALL enable_logging('FileSystem', storage_path='some/log/file/path', storage_buffer_size=5000);
-- Configure a custom log storage implemented in some extension
CALL enable_logging(storage='my_log_storage', storage_config={'my_custom_options': 'some_option'});
-- Enable logging only for log type QueryLog and FileSystem
CALL enable_logging(['QueryLog', 'FileSystem']);

Note that in the last query we can see that some log storage config params are pulled up into the enable_logging function as named parameters with a storage_ prefix. These clean up the UX for some common options. Currently available are:

  • storage_path
  • storage_normalize
  • storage_buffer_size

Another important thing to realize is that enable_logging will completely reset the logging config. So any manually specified logging related config (e.g. using the SET variables) will be lost once called. I feel this is a desirable trait though given the complexity of the number of states the logging mechanism can be in, allowing configuring the logging only once on enabling it keeps things sane.

Buffer sizes

As seen in section before, buffer sizes are configurable. The default values can differ between log storages:

  • memory: STANDARD_VECTOR_SIZE
  • stdout: 1 (flush on every write)
  • file: STANDARD_VECTOR_SIZE

TODOs

There are some problems around deadlocking that need further attention / could use better testing.

For example, the code can easily deadlock when a LogStorage tries to grab a lock while flushing. We need to make sure that writing a log entry is guaranteed to not grab any locks that the code calling the logger might hold. There's a hotfix coming in to fix this in v1.3-ossivalis, which needs to be applied to this PR too where we can not use the buffermanager while flushing logs. Instead we should rely on the default allocator to avoid grabbing a log when flushing.

@samansmink samansmink force-pushed the logging-improvements branch from 3495cf6 to a0c94ed Compare May 30, 2025 11:21
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 30, 2025 11:24
@samansmink samansmink marked this pull request as ready for review June 2, 2025 12:32
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 3, 2025 19:03
Mytherin added a commit that referenced this pull request Jul 5, 2025
Partially fixes #17714.

The problem was that we broke the existing http logging infrastructure
when moving to the new logger.

In this PR i partially the old behaviour for stdout logging by enabling
the new http logger with the `stdout` storage whenever the
`enable_http_logging` setting is set. This means that the stdout logging
now works similarly to how it did before.

I did not manage to find a good way to restore the http logging to a
file though. However in DuckDB v1.4 we will be able to do this by using
#17692
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes - looks great, some more comments:

@samansmink samansmink marked this pull request as ready for review August 6, 2025 10:58
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Looks great - some more comments below

@Mytherin Mytherin marked this pull request as ready for review August 20, 2025 10:20
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good - some minor comments then this is good to go

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 20, 2025 11:10
@samansmink samansmink marked this pull request as ready for review August 21, 2025 08:19
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2025 09:29
@samansmink samansmink marked this pull request as ready for review August 22, 2025 09:29
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2025 12:40
@samansmink samansmink marked this pull request as ready for review August 22, 2025 13:16
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2025 15:47
@samansmink samansmink marked this pull request as ready for review August 25, 2025 09:26
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 25, 2025 15:48
@samansmink samansmink marked this pull request as ready for review August 26, 2025 10:03
@Mytherin Mytherin merged commit 4101338 into duckdb:main Aug 26, 2025
65 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@carlopi
Copy link
Contributor

carlopi commented Aug 26, 2025

Question / idea: should there be by default a UUID (randomized once per duckdb instance) that decides the name of the files it gets logged to?

This solves both N invocation of the same process keep adding to the same file, and multiple parallel processes conflicting on the log resources.

@carlopi carlopi added the Needs Documentation Use for issues or PRs that require changes in the documentation label Aug 26, 2025
krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Aug 26, 2025
Add (CSV) file logger (duckdb/duckdb#17692)
feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)
krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Aug 27, 2025
Add (CSV) file logger (duckdb/duckdb#17692)
feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants