Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector #15737

Mytherin · 2025-01-15T21:47:35Z

This prevents unnecessarily flattening dictionary vectors when scanning.

The two test changes are unrelated but just minor fixes from issues encountered while testing this change.

…idity when we encounter a dictionary vector

Tishj

Ah I see, that works 👍
Dictionary compression is the only compression method that can create a Dictionary Vector, when we create a DictionaryVector we make sure it has the validity already set.

In the validity compression methods we recognize this and return immediately

(also slightly optimize fetching by not initializing the dictionary)

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector (duckdb/duckdb#15737)

…lues when reading files created by older versions of DuckDB

@Tishj

…odifies the validity (#16851) Fixes #16836 This regression was caused by #15737 Effectively that change introduced an optimization for dictionary-compressed data where the validity data would be read directly from the dictionary - instead of being read from the separate validity data. This is possible because dictionary-compressed data stores validity data (at offset 0 in the dictionary). However, when doing an `UPDATE`, we would not rewrite the dictionary data when changing only the validity - which would then cause the dictionary column to no longer contain the new (updated) validity data. The fix here is to also rewrite the main column data when updating the validity data. Note that we currently do this for all primitive types - we could limit this to compression methods (like dictionary) that need this - but we can leave that for a future PR. (CC @Tishj).

Scan validity from dictionary vectors directly, and skip scanning val…

a54163c

…idity when we encounter a dictionary vector

Tishj approved these changes Jan 15, 2025

View reviewed changes

Mytherin merged commit a597e41 into duckdb:main Jan 16, 2025
47 checks passed

Mytherin deleted the dictvalidity branch January 16, 2025 16:53

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Feb 2, 2025

vendor: Update vendored sources to duckdb/duckdb@a597e41

362de0a

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector (duckdb/duckdb#15737)

Mytherin mentioned this pull request Mar 26, 2025

Fix #16836: rewrite main column data in case of an update that only modifies the validity #16851

Merged

Mytherin added a commit to Mytherin/duckdb that referenced this pull request Mar 26, 2025

Revert duckdb#15737 instead - otherwise we can read incorrect NULL va…

4d0435c

…lues when reading files created by older versions of DuckDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector #15737

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector #15737

Uh oh!

Mytherin commented Jan 15, 2025

Uh oh!

Tishj left a comment

Uh oh!

Uh oh!

Uh oh!

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector #15737

Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector #15737

Uh oh!

Conversation

Mytherin commented Jan 15, 2025

Uh oh!

Tishj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!