Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Oct 17, 2024

When we serialize the ColumnDataCollection, we will write the validity_t *validity_mask regardless of the validity being uninitialized or not.
We do this because it's difficult to keep track of the validity status when appending multiple Vectors.

When we read this back however, we receive a ValidityMask that returns false for AllValid which could be costly as most execution paths of the engine have optimized paths for AllValid() being true.

Now we'll efficiently check the validity mask when reading and replace it will nullptr if all entries are valid, enabling the AllValid() == true optimized path for these vectors.

}
}
if ((count % ValidityMask::BITS_PER_VALUE) != 0) {
auto entry = validity_mask[(count / ValidityMask::BITS_PER_VALUE)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do this with a single bitwise operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm constructing the mask and then comparing against that in one go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was wondering if some kind of shifted mask comparison would work: entry & mask = mask. I think mask could be constructed from all bits shifted by the ragged count?

@hawkfish
Copy link
Contributor

Love this BTW - it may let me remove some bookkeeping in the window spooling logic.

@Mytherin Mytherin merged commit 3673cb1 into duckdb:feature Oct 19, 2024
40 checks passed
@Mytherin
Copy link
Collaborator

Thanks! LGTM

@Tishj
Copy link
Contributor Author

Tishj commented Oct 19, 2024

I was quickly looking into the optimization that Richard recommended, will push that as a separate PR 👍

Mytherin added a commit that referenced this pull request Oct 20, 2024
This PR implements the suggestion made by
[Richard](#14416 (comment))

When a `validity_t` is used for < `ValidityMask::BITS_PER_VALUE` values,
the least significant bits are set, i.e if
`ValidityMask::BITS_PER_VALUE` was 8

5 values uses :
`xxx4 3210`

For that we can construct the mask:
`0001 1111`

To check that all bits are set for those values

Given that `ValidityMask::BITS_PER_VALUE` is 64, we save 1<=n<64
function calls by doing this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants