Fix #16836: rewrite main column data in case of an update that only modifies the validity #16851
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #16836
This regression was caused by #15737
Effectively that change introduced an optimization for dictionary-compressed data where the validity data would be read directly from the dictionary - instead of being read from the separate validity data. This is possible because dictionary-compressed data stores validity data (at offset 0 in the dictionary).
However, when doing an
UPDATE
, we would not rewrite the dictionary data when changing only the validity - which would then cause the dictionary column to no longer contain the new (updated) validity data. The fix here is to also rewrite the main column data when updating the validity data.Note that we currently do this for all primitive types - we could limit this to compression methods (like dictionary) that need this - but we can leave that for a future PR. (CC @Tishj).