Skip to content

Conversation

hmeriann
Copy link
Contributor

@hmeriann hmeriann commented Aug 7, 2024

This PR adds some benchmark queries to the RealNest JSON data as it discussed in the issue.

For now there are queries to only one of the files - hep-adl-ethz-Run2012B_SingleMu, which is not a part of the PR, but can be found here and should be placed in duckdb_benchmark_data/ directory.
Probably, the JSON file path should be changed after it becomes clear where to store RealNest test data (maybe somewhere outside of the mount-point of aws machine? )

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 7, 2024 14:47
@hmeriann hmeriann force-pushed the real-nest-benchmarks branch from b7e48af to de07922 Compare August 7, 2024 14:49
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hmeriann! Thanks for starting to work on this! :) I've left some comments and added a more extensive comment to the original issue.

Currently, this PR focuses heavily on the UNNEST. In future PRs, it would be nice to see more operations directly on the nested types without unnesting them.

Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the one comment for now, will continue tomorrow

Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
This looks good for now. Let's wait until Tania gets back to see if we should add more.

Also, could you rename all of the single digit benchmarks so that they are prefixed with a 0?

I.e, instead of 1_list_sort_text.benchmark it's 01_list_sort_text.benchmark, then they show up as sorted in the file system

Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side now 😄 - I only added a small comment.

@hmeriann hmeriann marked this pull request as ready for review August 20, 2024 11:08
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add this as a regression test to the Regression.yml
You can add it after the regression test csv step that looks like this

    - name: Regression Test CSV
      if: always()
      shell: bash
      run: |
        python scripts/regression_test_runner.py --old=duckdb/build/release/benchmark/benchmark_runner --new=build/release/benchmark/benchmark_runner --benchmarks=.github/regression/csv.csv --verbose --threads=2

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2024 07:30
@hmeriann hmeriann marked this pull request as ready for review August 22, 2024 07:35
@hmeriann hmeriann requested a review from Tmonster August 22, 2024 07:35
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2024 07:46
@hmeriann
Copy link
Contributor Author

Also can you add this as a regression test to the Regression.yml

Done

@hmeriann hmeriann marked this pull request as ready for review August 22, 2024 08:44
@hmeriann hmeriann requested a review from Tmonster August 23, 2024 12:59
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 23, 2024 13:23
@hmeriann hmeriann marked this pull request as ready for review August 27, 2024 07:48
@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 27, 2024 08:04
@hmeriann hmeriann marked this pull request as ready for review August 27, 2024 09:33
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hmeriann, I had a look at the changes and added a few more comments. Great work!

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 13, 2024 12:11
@hmeriann hmeriann marked this pull request as ready for review September 13, 2024 14:43
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks 👍

@hmeriann hmeriann requested a review from Tmonster September 16, 2024 11:08
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now! Great work.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 23, 2024 08:06
@hmeriann hmeriann marked this pull request as ready for review September 23, 2024 08:08
Mytherin and others added 11 commits September 24, 2024 13:30
…db#14038)

To further override `EnumTypeInfoTemplated` it needs to be either
vendored (bad) or we move it to a header.
Relevant `pg_duckdb` PR: <duckdb/pg_duckdb#193>
Add some spaces and other seperators.

I spotted in the docs that the formatting of the sniffer's error
messages is inconsistent, see
https://duckdb.org/docs/data/csv/reading_faulty_csv_files#anatomy-of-a-csv-error

```
  file= people.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \r\n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  date_format = (DD-MM-YYYY) (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding=0
  sample_size=20480
  ignore_errors=false
  all_varchar=0
```

This PR adds a bunch of whitespace and also other separators for similar
error messages.
Seems like some `NULL` values snuck in there. This was causing issues on
the weekly regression runner
@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 25, 2024 10:31
@hmeriann hmeriann marked this pull request as ready for review September 25, 2024 10:32
@hannes hannes changed the base branch from main to feature October 2, 2024 10:29
@hannes hannes merged commit 36f3716 into duckdb:feature Oct 2, 2024
42 checks passed
@hannes
Copy link
Member

hannes commented Oct 2, 2024

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants