Skip to content

Conversation

Mytherin
Copy link
Collaborator

Implements #2534

This PR adds the last missing piece of the Parquet metadata scanning, which is the top-level file metadata.

D FROM parquet_file_metadata('data/parquet-testing/arrow/alltypes_dictionary.parquet');
┌──────────────────────┬─────────────────────────────────────────────────┬──────────┬────────────────┬────────────────┬──────────────────────┬─────────────────────────────┐
│      file_name       │                   created_by                    │ num_rows │ num_row_groups │ format_version │ encryption_algorithm │ footer_signing_key_metadata │
│       varcharvarchar                     │  int64   │     int64      │     int64      │       varcharvarchar           │
├──────────────────────┼─────────────────────────────────────────────────┼──────────┼────────────────┼────────────────┼──────────────────────┼─────────────────────────────┤
│ data/parquet-testi…  │ impala version 1.3.0-INTERNAL (build 8a48ddb1…  │        211NULLNULL                        │
└──────────────────────┴─────────────────────────────────────────────────┴──────────┴────────────────┴────────────────┴──────────────────────┴─────────────────────────────┘

In addition, we also add a file_name to the parquet_kv_metadata function to make it compatible with the others.

Now all Parquet file metadata should be scannable using the below functions.

Function Metadata
parquet_file_metadata Top-Level File Meta Data
parquet_metadata Row-Group Metadata
parquet_schema Schema Metadata
parquet_kv_metadata Optional Key-Value Pairs

@github-actions github-actions bot marked this pull request as draft November 24, 2023 13:36
@Mytherin Mytherin marked this pull request as ready for review November 24, 2023 13:37
@github-actions github-actions bot marked this pull request as draft November 24, 2023 15:51
@Mytherin Mytherin marked this pull request as ready for review November 24, 2023 16:08
@github-actions github-actions bot marked this pull request as draft November 25, 2023 11:22
@Mytherin Mytherin marked this pull request as ready for review November 25, 2023 11:22
@Mytherin Mytherin merged commit 40d29a5 into duckdb:main Nov 27, 2023
@Mytherin Mytherin deleted the allparquetmetadata branch December 4, 2023 11:44
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 14, 2023
Merge pull request duckdb/duckdb#9793 from Mytherin/allparquetmetadata
Merge pull request duckdb/duckdb#9797 from Mause/bugfix/python-3.12-minimal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: support all parquet file metadata
1 participant