-
Notifications
You must be signed in to change notification settings - Fork 2.5k
add some RealNest benchmarks #13345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some RealNest benchmarks #13345
Conversation
b7e48af
to
de07922
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hmeriann! Thanks for starting to work on this! :) I've left some comments and added a more extensive comment to the original issue.
Currently, this PR focuses heavily on the UNNEST
. In future PRs, it would be nice to see more operations directly on the nested types without unnesting them.
benchmark/micro/json/RealNest/1_unnest_and_sum_one_json_struct.benchmark
Outdated
Show resolved
Hide resolved
benchmark/micro/json/RealNest/2_create_table_combined_all_json_fields.benchmark
Outdated
Show resolved
Hide resolved
benchmark/micro/json/RealNest/3_select_avg_from_unnested_structure.benchmark
Outdated
Show resolved
Hide resolved
benchmark/micro/json/RealNest/5_list_filter_on_unnested_structure.benchmark
Outdated
Show resolved
Hide resolved
benchmark/micro/json/RealNest/4_filter_unnested_struct.benchmark
Outdated
Show resolved
Hide resolved
benchmark/micro/json/RealNest/6_select_sum_from_unnested_structure.benchmark
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the one comment for now, will continue tomorrow
benchmark/micro/json/RealNest/2_create_table_combined_all_json_fields.benchmark
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This looks good for now. Let's wait until Tania gets back to see if we should add more.
Also, could you rename all of the single digit benchmarks so that they are prefixed with a 0
?
I.e, instead of 1_list_sort_text.benchmark
it's 01_list_sort_text.benchmark
, then they show up as sorted in the file system
benchmark/micro/json/RealNest/2_create_table_combined_all_json_fields.benchmark
Outdated
Show resolved
Hide resolved
benchmark/realnest/3_list_transform_plus_list_aggregate.benchmark
Outdated
Show resolved
Hide resolved
benchmark/realnest/9_select_sum_from_unnested_structure.benchmark
Outdated
Show resolved
Hide resolved
benchmark/realnest/3_list_transform_plus_list_aggregate.benchmark
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side now 😄 - I only added a small comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you add this as a regression test to the Regression.yml
You can add it after the regression test csv
step that looks like this
- name: Regression Test CSV
if: always()
shell: bash
run: |
python scripts/regression_test_runner.py --old=duckdb/build/release/benchmark/benchmark_runner --new=build/release/benchmark/benchmark_runner --benchmarks=.github/regression/csv.csv --verbose --threads=2
benchmark/realnest/01_aggregate-on-one-deep-struct-members.benchmark
Outdated
Show resolved
Hide resolved
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hmeriann, I had a look at the changes and added a few more comments. Great work!
benchmark/realnest/04_list_transform_plus_list_aggregate.benchmark
Outdated
Show resolved
Hide resolved
benchmark/realnest/03_create_table_combined_all_json_fields.benchmark
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now! Great work.
…w row group segment tree after vacuum
…db#14038) To further override `EnumTypeInfoTemplated` it needs to be either vendored (bad) or we move it to a header. Relevant `pg_duckdb` PR: <duckdb/pg_duckdb#193>
…w row group segment tree after vacuum (duckdb#14092) Fixes duckdb#14077
Add some spaces and other seperators. I spotted in the docs that the formatting of the sniffer's error messages is inconsistent, see https://duckdb.org/docs/data/csv/reading_faulty_csv_files#anatomy-of-a-csv-error ``` file= people.csv delimiter = , (Auto-Detected) quote = " (Auto-Detected) escape = " (Auto-Detected) new_line = \r\n (Auto-Detected) header = true (Auto-Detected) skip_rows = 0 (Auto-Detected) date_format = (DD-MM-YYYY) (Auto-Detected) timestamp_format = (Auto-Detected) null_padding=0 sample_size=20480 ignore_errors=false all_varchar=0 ``` This PR adds a bunch of whitespace and also other separators for similar error messages.
Seems like some `NULL` values snuck in there. This was causing issues on the weekly regression runner
thanks! |
This PR adds some benchmark queries to the RealNest JSON data as it discussed in the issue.
For now there are queries to only one of the files -
hep-adl-ethz-Run2012B_SingleMu
, which is not a part of the PR, but can be found here and should be placed induckdb_benchmark_data/
directory.Probably, the JSON file path should be changed after it becomes clear where to store RealNest test data (maybe somewhere outside of the
mount-point
of aws machine? )