-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix OrderPreservationType issue of MATERIALIZED CTEs #10587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I did investigate the failing test case a bit further (simplified query):
This query contains two bugs. First, column eeb0093 fixes the second bug, where Again, this is totally unrelated to the initial issue I am trying to address with this pr. |
Alright. I think I fixed it. However, I'm not sure if this fix of the array slicing code has a negative effect on performance due to copy overhead. It now copies the input vector just as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! LGTM. Perhaps @taniabogatsch can have a look at the changes to the list functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kryonix, thanks for the PR and for uncovering these bugs!
I think we need to look at them more in-depth before merging this. Especially for the second bug (list_reverse
inside list_concat
). I think the bug is not in the list_concat
code, i.e., that code just receives incorrect input data.
Is it possible to merge these fixes without the changes to list_concat
/list_slice
? If so, I'd suggest that we open a separate issue/PR to address them separately?
Vector &list_or_str_vector = args.data[0]; | ||
Vector &list_or_str_vector = result; | ||
// this ensures that we do not change the input chunk | ||
VectorOperations::Copy(args.data[0], list_or_str_vector, count, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm not sure if this fix of the array slicing code has a negative effect on performance
Indeed, fixing this bug by copying the input vector is not the most efficient way to address this. However, I think we can merge this, and address the performance in a separate PR. cc @maiadegraaf
@@ -55,14 +55,12 @@ static void ListConcatFunction(DataChunk &args, ExpressionState &state, Vector & | |||
if (lhs_data.validity.RowIsValid(lhs_list_index)) { | |||
const auto &lhs_entry = lhs_entries[lhs_list_index]; | |||
result_entries[i].length += lhs_entry.length; | |||
ListVector::Append(result, lhs_child, *lhs_child_data.sel, lhs_entry.offset + lhs_entry.length, | |||
lhs_entry.offset); | |||
ListVector::Append(result, lhs_child, lhs_entry.offset + lhs_entry.length, lhs_entry.offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot assume the child vector to be a flat vector, i.e., removing the *lhs_child_data.sel
. I may be missing something. But I think this might be related (to an extent) to the other bug in list_slice
, as list_reverse
uses list_slice
under the hood. I.e., I think we need to fix this somewhere else, so that we retrieve a correct selection vector for our child vector here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to revert this. I'm not too familiar with this part of the system. For my test cases (and the unit tests) it worked fine. But I may not see the whole picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be missing a unit test here that would cause this to fail if the child vector is not a flat vector. Let's double-check this and then either address the fix properly or we can confirm that this fix works! I think @maiadegraaf already confirmed that she'll have a closer look. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I reverted the change.
Yes it is. I only had a look at these issues because I introduced a new test case that does not work because of these bugs. The important part of this pr is 9543363 |
Great - would it be okay for you to revert the changes here, and then we can merge this? |
Thanks! |
Merge pull request duckdb/duckdb#10658 from hannes/csvpathlength Merge pull request duckdb/duckdb#10756 from Mytherin/preserveinsertionordermemory Merge pull request duckdb/duckdb#10746 from samansmink/enable-azure-autoload Merge pull request duckdb/duckdb#10747 from maiadegraaf/list_reverse_bug Merge pull request duckdb/duckdb#10748 from taniabogatsch/capi-tests Merge pull request duckdb/duckdb#10739 from peterboncz/pb/immmedate_mode_only_in_non_autocommit Merge pull request duckdb/duckdb#10688 from Tmonster/union_exclude Merge pull request duckdb/duckdb#10710 from samansmink/comment-on-column Merge pull request duckdb/duckdb#10725 from hawkfish/fuzzer-preceding-frame Merge pull request duckdb/duckdb#10723 from hawkfish/fuzzer-null-timestamp Merge pull request duckdb/duckdb#10436 from taniabogatsch/map-fixes Merge pull request duckdb/duckdb#10587 from kryonix/main Merge pull request duckdb/duckdb#10738 from TinyTinni/fix-assert-in-iscntrl Merge pull request duckdb/duckdb#10708 from carlopi/ci_fixes Merge pull request duckdb/duckdb#10726 from hawkfish/fuzzer-to-weeks Merge pull request duckdb/duckdb#10727 from hawkfish/fuzzer-window-bind Merge pull request duckdb/duckdb#10733 from TinyTinni/remove-static-string Merge pull request duckdb/duckdb#10715 from Tishj/python_tpch_regression_rework Merge pull request duckdb/duckdb#10728 from hawkfish/fuzzer-argminmax-decimal Merge pull request duckdb/duckdb#10717 from carlopi/fix_extension_deploy Merge pull request duckdb/duckdb#10694 from Mytherin/castquerylocation Merge pull request duckdb/duckdb#10448 from peteraisher/feature/use-assertThrows-for-jdbc-tests Merge pull request duckdb/duckdb#10691 from Mytherin/issue10685 Merge pull request duckdb/duckdb#10684 from Mytherin/distincton Merge pull request duckdb/duckdb#9539 from Tishj/timestamp_unit_to_tz Merge pull request duckdb/duckdb#10341 from Tmonster/tpch_ingestion_benchmark Merge pull request duckdb/duckdb#10689 from Mytherin/juliaversion Merge pull request duckdb/duckdb#10669 from Mytherin/skippedtests Merge pull request duckdb/duckdb#10679 from Tishj/reenable_window_rows_overflow Merge pull request duckdb/duckdb#10672 from carlopi/wasm_extensions_ci Merge pull request duckdb/duckdb#10660 from szarnyasg/update-storage-info-for-v0100 Merge pull request duckdb/duckdb#10643 from bleskes/duck_transaction_o11y Merge pull request duckdb/duckdb#10654 from carlopi/fix_10548 Merge pull request duckdb/duckdb#10650 from hannes/noprintf Merge pull request duckdb/duckdb#10649 from Mytherin/explicitenumnumbers
Materialized CTEs did sometimes create the incorrect
OrderPreservationType
. This is becauseOrderPreservationRecursive
tried to take the materialization phase into account, which should not be done.Fixes #10576, reverts and fixes #10560.
The tests will likely fail, however, because this revealed another—unrelated—bug:
test/sql/function/list/list_reverse.test