Add (CSV) file logger #17692

samansmink · 2025-05-28T09:06:44Z

This PR introduces a major change to the LogStorage

adds a new file logging storage that appends log entries to csv file(s)
- file logging supports both a normalized and denormalized mode which results in a single or multiple files
reworks log storage to make implementing new log storages (hopefully) a blast
- 2 base classes can be inherited from to reuse shared log storage code
  - BufferingLogStorage (used by all 3 available log storages)
  - CSVLogStorage (used by file and stdout log storages)
adds a mechanism for passing/parsing configuration parameters to log storages in an extensible way
adds switched the enable_logging, disable_logging and truncate_duckdb_logs to be table functions and not pragmas
changes the duckdb_logs view back to be a bind-replace function instead of a view: this means we can efficiently scan normalized log tables as well as denormalized log storages with the same function. (using a join if normalized and a simple scan if denormalized)

How to use the new enable_logging function

This PR moves towards using the enable_logging function as the primary way of configure logging. The simplest way to use the file log storage is:

CALL enable_logging();

This will enable logging in it's default config, which is currently the memory storage with log level INFO.

To use a different storage, one can use:

CALL enable_logging(storage='file', storage_path='./my_logs');

The following will make sure DuckDB sends all log entries at the default log level or higher (by default D_INFO) are written to two csv files ./my_logs/duckdb_log_entries.csv and ./my_logs/duckdb_log_contexts.csv which are written in the same schema as the duckdb_logs() and duckdb_log_contexts() table functions.

To query the logs, the same table functions duckdb_logs() and duckdb_log_contexts() can be used, meaning that the default view duckdb_logs works as well:

FROM duckdb_logs

The above query uses bind_replace to return a scan + join over the two csv files automatically.

More ways to enable logging

The enable_logging pragma has been expanded to take various parameters, including a struct field storage_config which is passed through to the log storage to configure it in an extensible way. Let's go over some examples:

-- log all log messages of level `trace` to stdout
CALL enable_logging(level='trace', storage='stdout');
-- log only log messages of the `FileSystem` type, to a single, denormalized file `some/log/file.csv`
CALL enable_logging('FileSystem', storage='file', storage_config={'path': 'some/log/file.csv'});
-- log only log messages of the `FileSystem` type, to two, normalized files in the path `some/log/file/path`. The buffer size of the log storage is 5000
CALL enable_logging('FileSystem', storage='file', storage_config={'path': 'some/log/file/path', 'buffer_size': 5000});
-- Same as last query, but leverage some syntactic sugar to automatically forward common settings path and buffer size. Also infers storage='file' from presence of path
CALL enable_logging('FileSystem', storage_path='some/log/file/path', storage_buffer_size=5000);
-- Configure a custom log storage implemented in some extension
CALL enable_logging(storage='my_log_storage', storage_config={'my_custom_options': 'some_option'});
-- Enable logging only for log type QueryLog and FileSystem
CALL enable_logging(['QueryLog', 'FileSystem']);

Note that in the last query we can see that some log storage config params are pulled up into the enable_logging function as named parameters with a storage_ prefix. These clean up the UX for some common options. Currently available are:

storage_path
storage_normalize
storage_buffer_size

Another important thing to realize is that enable_logging will completely reset the logging config. So any manually specified logging related config (e.g. using the SET variables) will be lost once called. I feel this is a desirable trait though given the complexity of the number of states the logging mechanism can be in, allowing configuring the logging only once on enabling it keeps things sane.

Buffer sizes

As seen in section before, buffer sizes are configurable. The default values can differ between log storages:

memory: STANDARD_VECTOR_SIZE
stdout: 1 (flush on every write)
file: STANDARD_VECTOR_SIZE

TODOs

There are some problems around deadlocking that need further attention / could use better testing.

For example, the code can easily deadlock when a LogStorage tries to grab a lock while flushing. We need to make sure that writing a log entry is guaranteed to not grab any locks that the code calling the logger might hold. There's a hotfix coming in to fix this in v1.3-ossivalis, which needs to be applied to this PR too where we can not use the buffermanager while flushing logs. Instead we should rely on the default allocator to avoid grabbing a log when flushing.

Partially fixes #17714. The problem was that we broke the existing http logging infrastructure when moving to the new logger. In this PR i partially the old behaviour for stdout logging by enabling the new http logger with the `stdout` storage whenever the `enable_http_logging` setting is set. This means that the stdout logging now works similarly to how it did before. I did not manage to find a good way to restore the http logging to a file though. However in DuckDB v1.4 we will be able to do this by using #17692

Mytherin

Thanks for the fixes - looks great, some more comments:

src/common/csv_utils.cpp

src/function/pragma/pragma_functions.cpp

test/sql/logging/logging_file_bind_replace.test

Mytherin

Thanks for the fixes! Looks great - some more comments below

src/common/csv_writer.cpp

src/include/duckdb/logging/log_storage.hpp

src/logging/log_storage.cpp

src/common/csv_writer.cpp

Mytherin

Thanks! Looks good - some minor comments then this is good to go

src/common/csv_writer.cpp

src/function/table/system/duckdb_log.cpp

Mytherin · 2025-08-26T14:04:38Z

Thanks!

carlopi · 2025-08-26T14:50:22Z

Question / idea: should there be by default a UUID (randomized once per duckdb instance) that decides the name of the files it gets logged to?

This solves both N invocation of the same process keep adding to the same file, and multiple parallel processes conflicting on the log resources.

Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)

samansmink added 5 commits May 27, 2025 12:48

add csv formatted logging

efe3245

deduplicate csv utility code

98aa21a

add bindreplace based scans of csv log

4b83c82

cleanup file logging

4eae4fc

add proper casts

a0c94ed

samansmink force-pushed the logging-improvements branch from 3495cf6 to a0c94ed Compare May 30, 2025 11:21

duckdb-draftbot marked this pull request as draft May 30, 2025 11:24

fix ci issues

04e496a

samansmink marked this pull request as ready for review June 2, 2025 12:32

Merge branch 'main' into logging-improvements

ecf66b6

samansmink mentioned this pull request Jun 3, 2025

HTTP Logging does not work on DuckDB CLI #17714

Closed

2 tasks

duckdb-draftbot marked this pull request as draft June 3, 2025 19:03

samansmink mentioned this pull request Jul 4, 2025

partially restore deprecated http logging settings #18150

Merged

Mytherin reviewed Jul 7, 2025

View reviewed changes

src/common/csv_utils.cpp Outdated Show resolved Hide resolved

src/function/pragma/pragma_functions.cpp Outdated Show resolved Hide resolved

test/sql/logging/logging_file_bind_replace.test Outdated Show resolved Hide resolved

samansmink added 3 commits July 8, 2025 19:01

fix scanning the log csv file while logging

8c6837e

wip: create csvwriter to reuse csv writing code

07c37e4

refactor: csv log storage deduplicating csv writer code

c97adbb

samansmink marked this pull request as ready for review August 6, 2025 10:58

Mytherin reviewed Aug 6, 2025

View reviewed changes

samansmink added 10 commits August 8, 2025 10:48

wip: refactor 1

aabde47

wip: logging refactor

bc1a310

wip: switch to bind replace to handle log normalization join

0b69285

wip: log buffer testing

515732a

wip: only flush contexts when needed, add tests

339cac5

wip: improve stdout logging, add test

bc0426d

wip: improve logging ux

4859440

wip: cleanup log storage code

01b90d7

format

84d9a88

wip: setting delimiter of csv logs

b0b10a8

Mytherin marked this pull request as ready for review August 20, 2025 10:20

Mytherin reviewed Aug 20, 2025

View reviewed changes

src/common/csv_writer.cpp Show resolved Hide resolved

Mytherin reviewed Aug 20, 2025

View reviewed changes

src/common/csv_writer.cpp Outdated Show resolved Hide resolved

src/function/table/system/duckdb_log.cpp Outdated Show resolved Hide resolved

samansmink added 3 commits August 20, 2025 12:52

fix: minor cleanup

10d5451

fix: correctly detect localfilesystem being disabled

9b498a5

chore: format

6112179

duckdb-draftbot marked this pull request as draft August 20, 2025 11:10

fix: skip new logging tests for enable_verification tests

f4c74a0

samansmink marked this pull request as ready for review August 21, 2025 08:19

fix: minor issues with csv copy

c8d8fab

duckdb-draftbot marked this pull request as draft August 22, 2025 09:29

samansmink marked this pull request as ready for review August 22, 2025 09:29

fix: minor csv write issue

27e683d

duckdb-draftbot marked this pull request as draft August 22, 2025 12:40

samansmink marked this pull request as ready for review August 22, 2025 13:16

fix: avoid holding two handles to same file

7056478

duckdb-draftbot marked this pull request as draft August 22, 2025 15:47

fix: ci failures

c3a86a8

samansmink marked this pull request as ready for review August 25, 2025 09:26

fix logging shell test for windows

c3579f6

duckdb-draftbot marked this pull request as draft August 25, 2025 15:48

fix: minor typo

7c722f8

samansmink marked this pull request as ready for review August 26, 2025 10:03

Mytherin merged commit 4101338 into duckdb:main Aug 26, 2025
65 checks passed

carlopi added the Needs Documentation Use for issues or PRs that require changes in the documentation label Aug 26, 2025

duckdblabs-bot mentioned this pull request Aug 26, 2025

[duckdb/#17692] - Add (CSV) file logger needs documentation duckdb/duckdb-web#5720

Open

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Aug 26, 2025

vendor: Update vendored sources to duckdb/duckdb@4101338

a3c7e49

Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Aug 27, 2025

vendor: Update vendored sources to duckdb/duckdb@4101338

3ddca74

Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add (CSV) file logger #17692

Add (CSV) file logger #17692

samansmink commented May 28, 2025 •

edited

Loading

Uh oh!

Mytherin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin commented Aug 26, 2025

Uh oh!

carlopi commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add (CSV) file logger #17692

Add (CSV) file logger #17692

Conversation

samansmink commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the new enable_logging function

More ways to enable logging

Buffer sizes

TODOs

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mytherin commented Aug 26, 2025

Uh oh!

carlopi commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

samansmink commented May 28, 2025 •

edited

Loading

carlopi commented Aug 26, 2025 •

edited

Loading