Skip to content

Conversation

kryonix
Copy link
Contributor

@kryonix kryonix commented Feb 12, 2024

Materialized CTEs did sometimes create the incorrect OrderPreservationType. This is because OrderPreservationRecursive tried to take the materialization phase into account, which should not be done.

Fixes #10576, reverts and fixes #10560.

The tests will likely fail, however, because this revealed another—unrelated—bug:

test/sql/function/list/list_reverse.test

WITH example AS (
  SELECT [1, 2, 3] AS arr UNION ALL
  SELECT [4, 5] AS arr UNION ALL
  SELECT [] AS arr
)
SELECT
  list_reverse(arr) AS reverse_arr
FROM example ORDER BY length(reverse_arr) DESC;
Invalid Error: Unoptimized statement differs from original result!
Original Result:
reverse_arr
INTEGER[]
[ Rows: 3]
[3, 2, 1]
[5, 4]
[]

Unoptimized:
reverse_arr
INTEGER[]
[ Rows: 3]
[1, 2, 3]
[4, 5]
[]

@szarnyasg szarnyasg requested a review from Mytherin February 12, 2024 09:59
@kryonix
Copy link
Contributor Author

kryonix commented Feb 12, 2024

I did investigate the failing test case a bit further (simplified query):

SELECT arr, list_concat(list_reverse(arr), arr) FROM (SELECT [1,2,3]) AS _(arr);
┌───────────┬─────────────────────────────────────┐
│ arr       ┆ list_concat(list_reverse(arr), arr) │
╞═══════════╪═════════════════════════════════════╡
│ [3, 2, 1] ┆ [1, 2, 3, 1, 2, 3]                  │
└───────────┴─────────────────────────────────────┘
   ↑  ↑  ↑     ↑  ↑  ↑ 
   Bug 1       Bug 2

This query contains two bugs. First, column arr should be [1,2,3], but it is reversed. This is a bug in array_slice.cpp, line 349: ListVector::ReferenceEntry(result, list_or_str_vector);. The input is just referenced, but I think the vector has to be copied. Otherwise, the input is changed.

eeb0093 fixes the second bug, where list_reverse has no effect when nested in list_concat.

Again, this is totally unrelated to the initial issue I am trying to address with this pr.

@github-actions github-actions bot marked this pull request as draft February 12, 2024 15:11
@Mytherin Mytherin marked this pull request as ready for review February 12, 2024 19:37
@Mytherin Mytherin marked this pull request as draft February 12, 2024 19:40
@kryonix kryonix marked this pull request as ready for review February 12, 2024 20:42
@kryonix
Copy link
Contributor Author

kryonix commented Feb 12, 2024

Alright. I think I fixed it. However, I'm not sure if this fix of the array slicing code has a negative effect on performance due to copy overhead. It now copies the input vector just as list_sort does.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! LGTM. Perhaps @taniabogatsch can have a look at the changes to the list functions?

Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kryonix, thanks for the PR and for uncovering these bugs!

I think we need to look at them more in-depth before merging this. Especially for the second bug (list_reverse inside list_concat). I think the bug is not in the list_concat code, i.e., that code just receives incorrect input data.

Is it possible to merge these fixes without the changes to list_concat/list_slice? If so, I'd suggest that we open a separate issue/PR to address them separately?

Vector &list_or_str_vector = args.data[0];
Vector &list_or_str_vector = result;
// this ensures that we do not change the input chunk
VectorOperations::Copy(args.data[0], list_or_str_vector, count, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm not sure if this fix of the array slicing code has a negative effect on performance

Indeed, fixing this bug by copying the input vector is not the most efficient way to address this. However, I think we can merge this, and address the performance in a separate PR. cc @maiadegraaf

@@ -55,14 +55,12 @@ static void ListConcatFunction(DataChunk &args, ExpressionState &state, Vector &
if (lhs_data.validity.RowIsValid(lhs_list_index)) {
const auto &lhs_entry = lhs_entries[lhs_list_index];
result_entries[i].length += lhs_entry.length;
ListVector::Append(result, lhs_child, *lhs_child_data.sel, lhs_entry.offset + lhs_entry.length,
lhs_entry.offset);
ListVector::Append(result, lhs_child, lhs_entry.offset + lhs_entry.length, lhs_entry.offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot assume the child vector to be a flat vector, i.e., removing the *lhs_child_data.sel. I may be missing something. But I think this might be related (to an extent) to the other bug in list_slice, as list_reverse uses list_slice under the hood. I.e., I think we need to fix this somewhere else, so that we retrieve a correct selection vector for our child vector here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to revert this. I'm not too familiar with this part of the system. For my test cases (and the unit tests) it worked fine. But I may not see the whole picture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be missing a unit test here that would cause this to fail if the child vector is not a flat vector. Let's double-check this and then either address the fix properly or we can confirm that this fix works! I think @maiadegraaf already confirmed that she'll have a closer look. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I reverted the change.

@kryonix
Copy link
Contributor Author

kryonix commented Feb 13, 2024

Is it possible to merge these fixes without the changes to list_concat/list_slice? If so, I'd suggest that we open a separate issue/PR to address them separately?

Yes it is. I only had a look at these issues because I introduced a new test case that does not work because of these bugs.

The important part of this pr is 9543363

@taniabogatsch
Copy link
Contributor

Yes it is. I only had a look at these issues because I introduced a new test case that does not work because of these bugs.

Great - would it be okay for you to revert the changes here, and then we can merge this?

@github-actions github-actions bot marked this pull request as draft February 13, 2024 11:39
@kryonix kryonix marked this pull request as ready for review February 13, 2024 12:00
@Mytherin Mytherin merged commit 6f1d82f into duckdb:main Feb 19, 2024
@Mytherin
Copy link
Collaborator

Thanks!

@maiadegraaf maiadegraaf mentioned this pull request Feb 19, 2024
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Mar 15, 2024
Merge pull request duckdb/duckdb#10658 from hannes/csvpathlength
Merge pull request duckdb/duckdb#10756 from Mytherin/preserveinsertionordermemory
Merge pull request duckdb/duckdb#10746 from samansmink/enable-azure-autoload
Merge pull request duckdb/duckdb#10747 from maiadegraaf/list_reverse_bug
Merge pull request duckdb/duckdb#10748 from taniabogatsch/capi-tests
Merge pull request duckdb/duckdb#10739 from peterboncz/pb/immmedate_mode_only_in_non_autocommit
Merge pull request duckdb/duckdb#10688 from Tmonster/union_exclude
Merge pull request duckdb/duckdb#10710 from samansmink/comment-on-column
Merge pull request duckdb/duckdb#10725 from hawkfish/fuzzer-preceding-frame
Merge pull request duckdb/duckdb#10723 from hawkfish/fuzzer-null-timestamp
Merge pull request duckdb/duckdb#10436 from taniabogatsch/map-fixes
Merge pull request duckdb/duckdb#10587 from kryonix/main
Merge pull request duckdb/duckdb#10738 from TinyTinni/fix-assert-in-iscntrl
Merge pull request duckdb/duckdb#10708 from carlopi/ci_fixes
Merge pull request duckdb/duckdb#10726 from hawkfish/fuzzer-to-weeks
Merge pull request duckdb/duckdb#10727 from hawkfish/fuzzer-window-bind
Merge pull request duckdb/duckdb#10733 from TinyTinni/remove-static-string
Merge pull request duckdb/duckdb#10715 from Tishj/python_tpch_regression_rework
Merge pull request duckdb/duckdb#10728 from hawkfish/fuzzer-argminmax-decimal
Merge pull request duckdb/duckdb#10717 from carlopi/fix_extension_deploy
Merge pull request duckdb/duckdb#10694 from Mytherin/castquerylocation
Merge pull request duckdb/duckdb#10448 from peteraisher/feature/use-assertThrows-for-jdbc-tests
Merge pull request duckdb/duckdb#10691 from Mytherin/issue10685
Merge pull request duckdb/duckdb#10684 from Mytherin/distincton
Merge pull request duckdb/duckdb#9539 from Tishj/timestamp_unit_to_tz
Merge pull request duckdb/duckdb#10341 from Tmonster/tpch_ingestion_benchmark
Merge pull request duckdb/duckdb#10689 from Mytherin/juliaversion
Merge pull request duckdb/duckdb#10669 from Mytherin/skippedtests
Merge pull request duckdb/duckdb#10679 from Tishj/reenable_window_rows_overflow
Merge pull request duckdb/duckdb#10672 from carlopi/wasm_extensions_ci
Merge pull request duckdb/duckdb#10660 from szarnyasg/update-storage-info-for-v0100
Merge pull request duckdb/duckdb#10643 from bleskes/duck_transaction_o11y
Merge pull request duckdb/duckdb#10654 from carlopi/fix_10548
Merge pull request duckdb/duckdb#10650 from hannes/noprintf
Merge pull request duckdb/duckdb#10649 from Mytherin/explicitenumnumbers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants