-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Ingestion benchmark framework #10341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… into tpch_ingestion_benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'm not so certain if there's much of a benefit to benchmarking all of the tables. Many of these tables are tiny (e.g. nation
has 25 rows?). Furthermore having lots of benchmarks means the chance of false negatives is higher which generates noise in the CI.
Can we perhaps limit this to the largest tables, and try to go for more of a variety of benchmarks? e.g. reading from Parquet, reading from CSV, copying from a separate table, running in-memory, running with persistent storage, etc.
…, then you need special handling for the load script as well
The now benchmarks the following matrix Format: CSV, PARQUET, NATIVE Getting In-memory native storage to work is not super straightforward in the current benchmark framework. I saw two options,
(2) seems confusing and potentially error prone. I implemented a cache {db_file} no_reconnect function to do steps 1-3. This works for benchmarking ingestion of csv and parquet files since you don't need to data natively. |
This fails regression now because the current duckdb doesn't have the new cache {db_file} no_connect command |
Thanks for the updates! Looks good - could we perhaps just remove the ingestion benchmarks that aren't run (i.e. leaving only lineitem, store_sales, inventory and orders)? I think adding the small tables - even if they are not run - adds a lot of noise to when you want to run these benchmarks and I can't really see a scenario in which we want to run them. |
sure 👍 |
Thanks! |
Merge pull request duckdb/duckdb#10658 from hannes/csvpathlength Merge pull request duckdb/duckdb#10756 from Mytherin/preserveinsertionordermemory Merge pull request duckdb/duckdb#10746 from samansmink/enable-azure-autoload Merge pull request duckdb/duckdb#10747 from maiadegraaf/list_reverse_bug Merge pull request duckdb/duckdb#10748 from taniabogatsch/capi-tests Merge pull request duckdb/duckdb#10739 from peterboncz/pb/immmedate_mode_only_in_non_autocommit Merge pull request duckdb/duckdb#10688 from Tmonster/union_exclude Merge pull request duckdb/duckdb#10710 from samansmink/comment-on-column Merge pull request duckdb/duckdb#10725 from hawkfish/fuzzer-preceding-frame Merge pull request duckdb/duckdb#10723 from hawkfish/fuzzer-null-timestamp Merge pull request duckdb/duckdb#10436 from taniabogatsch/map-fixes Merge pull request duckdb/duckdb#10587 from kryonix/main Merge pull request duckdb/duckdb#10738 from TinyTinni/fix-assert-in-iscntrl Merge pull request duckdb/duckdb#10708 from carlopi/ci_fixes Merge pull request duckdb/duckdb#10726 from hawkfish/fuzzer-to-weeks Merge pull request duckdb/duckdb#10727 from hawkfish/fuzzer-window-bind Merge pull request duckdb/duckdb#10733 from TinyTinni/remove-static-string Merge pull request duckdb/duckdb#10715 from Tishj/python_tpch_regression_rework Merge pull request duckdb/duckdb#10728 from hawkfish/fuzzer-argminmax-decimal Merge pull request duckdb/duckdb#10717 from carlopi/fix_extension_deploy Merge pull request duckdb/duckdb#10694 from Mytherin/castquerylocation Merge pull request duckdb/duckdb#10448 from peteraisher/feature/use-assertThrows-for-jdbc-tests Merge pull request duckdb/duckdb#10691 from Mytherin/issue10685 Merge pull request duckdb/duckdb#10684 from Mytherin/distincton Merge pull request duckdb/duckdb#9539 from Tishj/timestamp_unit_to_tz Merge pull request duckdb/duckdb#10341 from Tmonster/tpch_ingestion_benchmark Merge pull request duckdb/duckdb#10689 from Mytherin/juliaversion Merge pull request duckdb/duckdb#10669 from Mytherin/skippedtests Merge pull request duckdb/duckdb#10679 from Tishj/reenable_window_rows_overflow Merge pull request duckdb/duckdb#10672 from carlopi/wasm_extensions_ci Merge pull request duckdb/duckdb#10660 from szarnyasg/update-storage-info-for-v0100 Merge pull request duckdb/duckdb#10643 from bleskes/duck_transaction_o11y Merge pull request duckdb/duckdb#10654 from carlopi/fix_10548 Merge pull request duckdb/duckdb#10650 from hannes/noprintf Merge pull request duckdb/duckdb#10649 from Mytherin/explicitenumnumbers
Adds a new benchmark directory to our regression scripts to test ingestion performance. Some questions I still have
These things can also be added in a future PR.