Skip to content

Conversation

Mytherin
Copy link
Collaborator

This effectively restores a previous optimization where we would skip reading elements if they were previously filtered out. For now we only enable this for strings - that has by far the highest performance benefits as we can skip UTF8 validation for any strings that we don't need to read.

For simple types like integers this optimization is not so straightforwardly useful - as we effectively replace a memcpy with a branchy lookup. I haven't run any benchmarks on this yet but I suspect that the usefulness of this optimization depends on selectivity - i.e. it might perform better when the selectivity is <10% (or some other to be determined threshold). I will leave that for a future PR.

@Mytherin Mytherin changed the title Add dedicated Select method that can be used to push selection vectors into the read Parquet: Add dedicated Select method that can be used to push selection vectors into the read Feb 11, 2025
@Mytherin Mytherin merged commit 4c77e9c into duckdb:main Feb 11, 2025
47 checks passed
Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
@Mytherin Mytherin deleted the parquetselect branch April 2, 2025 09:25
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant