Skip to content

Conversation

Mytherin
Copy link
Collaborator

This prevents unnecessarily flattening dictionary vectors when scanning.

The two test changes are unrelated but just minor fixes from issues encountered while testing this change.

CC @Tishj

Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that works 👍
Dictionary compression is the only compression method that can create a Dictionary Vector, when we create a DictionaryVector we make sure it has the validity already set.

In the validity compression methods we recognize this and return immediately

(also slightly optimize fetching by not initializing the dictionary)

@Mytherin Mytherin merged commit a597e41 into duckdb:main Jan 16, 2025
47 checks passed
@Mytherin Mytherin deleted the dictvalidity branch January 16, 2025 16:53
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Feb 2, 2025
Scan validity from dictionary vectors directly, and skip scanning validity when we encounter a dictionary vector (duckdb/duckdb#15737)
Mytherin added a commit to Mytherin/duckdb that referenced this pull request Mar 26, 2025
…lues when reading files created by older versions of DuckDB
Mytherin added a commit that referenced this pull request Mar 27, 2025
…odifies the validity (#16851)

Fixes #16836

This regression was caused by
#15737

Effectively that change introduced an optimization for
dictionary-compressed data where the validity data would be read
directly from the dictionary - instead of being read from the separate
validity data. This is possible because dictionary-compressed data
stores validity data (at offset 0 in the dictionary).

However, when doing an `UPDATE`, we would not rewrite the dictionary
data when changing only the validity - which would then cause the
dictionary column to no longer contain the new (updated) validity data.
The fix here is to also rewrite the main column data when updating the
validity data.

Note that we currently do this for all primitive types - we could limit
this to compression methods (like dictionary) that need this - but we
can leave that for a future PR. (CC @Tishj).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants