Skip to content

Conversation

peterboncz
Copy link
Contributor

@peterboncz peterboncz commented Oct 9, 2024

add a new named_parameter for the parquet_scan() called "explicit_cardinality"

this ubigint is the exact cardinality of the parquet_scan(), which can span multiple files that one may know from external metadata

like the "schema" named_parameter, it is not really public, but intended for duckdb_iceberg

…dinality"

this ubigint sthe exact cardinality of the parquet_scan(), which can span multiple files
that one may know from external metadata

like the "schema" named_parameter, it is not really public, but intended for duckdb_iceberg
Copy link
Contributor

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay to add, it is is pretty low profile and will improve iceberg queries quite a bit once it uses this.

However, ideally we still just switch iceberg to the MultiFileReader approach unifying delta and iceberg and avoid needing this parameter altogether.

@peterboncz
Copy link
Contributor Author

peterboncz commented Oct 9, 2024

the MultiFileReader approach will an improvement, but it will happen in the longer term, so I think this still is useful for merge.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 10, 2024 09:25
@Mytherin Mytherin marked this pull request as ready for review October 10, 2024 09:25
@@ -70,8 +70,8 @@ struct ParquetReadBindData : public TableFunctionData {
// These come from the initial_reader, but need to be stored in case the initial_reader is removed by a filter
idx_t initial_file_cardinality;
idx_t initial_file_row_groups;
idx_t explicit_cardinality; // can be set to inject exterior knowledge on total cardinality (e.g. from a data lake)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly intitialize this with 0? It seems this is currently uninitialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, sorry I thought the whole struct was initialized 0

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 11, 2024 12:38
@peterboncz peterboncz marked this pull request as ready for review October 11, 2024 12:38
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 11, 2024 12:45
@peterboncz peterboncz marked this pull request as ready for review October 11, 2024 12:45
@Mytherin Mytherin merged commit 35dfcc0 into duckdb:main Oct 11, 2024
43 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 17, 2024
allow external cardinality information (e.g. from iceberg) (duckdb/duckdb#14292)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Oct 17, 2024
allow external cardinality information (e.g. from iceberg) (duckdb/duckdb#14292)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants