Skip to content

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented Jun 2, 2025

Very basic initial implementation. Adds a new log type (PhysicalOperator) and adds logging for the hash join and parquet writer. I've implemented a utility that can be passed into classes that we use during execution such as JoinHashTable and ParquetWriter that logs messages, but we can see which operator this belongs to:

D pragma enable_logging;
D set logging_level='DEBUG';
D set debug_force_external=true;
D set threads=1;
D copy (
      select t1.i
      from range(3_000_000) t1(i)
      join range(3_000_000) t2(i)
      using (i)
  ) to 'physical_operator_logging.parquet';
D pragma disable_logging;
┌──────────────────┬───────────────┬──────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│       type       │ operator_type │                                parameters                                │                                                           message                                                           │
│     varcharvarchar    │                          map(varchar, varchar)                           │                                                           varchar                                                           │
├──────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187250 rows, 7377554 bytes)                                                                         │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ External hash join: enabled. Size (118108864 bytes) greater than reservation (15782448 bytes)                               │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122896 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (64354 rows, 532480 bytes) to file "physical_operator_logging.parquet" (Combine)                         │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187018 rows, 7373610 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188044 rows, 7391052 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123530 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187019 rows, 7373627 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124556 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187663 rows, 7384575 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123531 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187163 rows, 7376075 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124175 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (188208 rows, 7393840 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123675 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187522 rows, 7382178 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124720 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124034 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187526 rows, 7382246 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187171 rows, 7376211 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124038 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187427 rows, 7380563 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123683 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187701 rows, 7385221 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123939 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187690 rows, 7385034 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124213 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187374 rows, 7379662 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (124202 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (122880 rows, 998400 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded)  │
│ PhysicalOperator │ HASH_JOIN     │ {Join Type=INNER, Conditions='i = i', __estimated_cardinality__=3000000} │ Building JoinHashTable (187534 rows, 7382382 bytes)                                                                         │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (123886 rows, 1015040 bytes) to file "physical_operator_logging.parquet" (Sink: ROW_GROUP_SIZE exceeded) │
│ PhysicalOperator │ COPY_TO_FILE  │ {}                                                                       │ Flushing row group (93326 rows, 765440 bytes) to file "physical_operator_logging.parquet" (Combine)                         │
├──────────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 42 rows                                                                                                                                                                                                                         4 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

I'd be happy to receive any feedback on this :)

@lnkuiper lnkuiper requested a review from samansmink June 2, 2025 12:55
Copy link
Contributor

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 2, 2025 15:48
@lnkuiper lnkuiper marked this pull request as ready for review June 3, 2025 06:41
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 3, 2025 07:24
@lnkuiper lnkuiper marked this pull request as ready for review June 3, 2025 12:35
@lnkuiper
Copy link
Contributor Author

lnkuiper commented Jun 4, 2025

@Mytherin Sam and I are happy with the way this looks right now but maybe you want to have a look as well. Sam commented that the current info messages are completely unstructured, and I think fully structuring them will be difficult, but maybe we can do better than just a string?

I could see us using a info MAP(VARCHAR, VARCHAR) instead of just info VARCHAR, which is slightly more structured. It could hold something like:

{class=PhysicalHashJoin, event=Finalize, external=True}
{class=JoinHashTable, event=Build, tuples=x, size=y}
{class=ParquetWriter, event=FlushRowGroup, tuples=x, size=y, reason=ROW_GROUP_SIZE}

EDIT: or maybe add the class/event as columns:

class event info
PhysicalHashJoin Finalize {external=True}
JoinHashTable Build {tuples=x, size=y}
ParquetWriter FlushRowGroup {tuples=x, size=y, reason=ROW_GROUP_SIZE}

@samansmink
Copy link
Contributor

@lnkuiper Yea that sounds good to me!

@Mytherin
Copy link
Collaborator

Mytherin commented Jun 4, 2025

Sounds good to me. Is this ready to merge?

@lnkuiper
Copy link
Contributor Author

lnkuiper commented Jun 4, 2025

@Mytherin I'll make the change I came up with in my comment to make the log entries slightly more structured. I'll mark ready to merge when I'm done!

@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 4, 2025 09:27
@lnkuiper lnkuiper marked this pull request as ready for review June 4, 2025 09:28
@Mytherin Mytherin merged commit 843ea85 into duckdb:v1.3-ossivalis Jun 4, 2025
52 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Jun 4, 2025

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jun 5, 2025
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Jun 5, 2025
Physical operator logging (duckdb/duckdb#17752)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
@lnkuiper lnkuiper deleted the physical_operator_logging branch July 8, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants