Skip to content

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented May 8, 2025

This PR allows the system to directly "attach" to raw files (parquet/json/CSV), e.g.:

duckdb region.parquet -c "FROM region LIMIT 1"

┌─────────────┬─────────┬────────────────────────────────────────────────────────────────────────────────────────────────┐
│ r_regionkey │ r_name  │                                           r_comment                                            │
│    int32    │ varchar │                                            varchar                                             │
├─────────────┼─────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
│      0      │ AFRICA  │ ar packages. regular excuses among the ironic requests cajole fluffily blithely final reques…  │
└─────────────┴─────────┴────────────────────────────────────────────────────────────────────────────────────────────────┘

What actually happens is that we launch an in-memory DuckDB database, and create two views over the given file:

  • file - this view is always named the same, regardless of the name of the file
  • [base_file_name] - this view depends on the name of the file, e.g. for region.parquet this is region

These views can be queried.

The main advantage of this is usability - we can use the regular shell to navigate to a file, and then use DuckDB to open that file without having to refer to the path of the file at the SQL level.

CC @szarnyasg

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 9, 2025 08:09
@Mytherin Mytherin marked this pull request as ready for review May 9, 2025 08:10
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 9, 2025 12:00
@Mytherin Mytherin marked this pull request as ready for review May 9, 2025 12:00
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 11, 2025 18:28
@Mytherin Mytherin marked this pull request as ready for review May 12, 2025 07:34
@Mytherin Mytherin merged commit cc69850 into duckdb:main May 12, 2025
49 checks passed
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Allow directly attaching of Parquet/CSV/JSON files (duckdb/duckdb#17415)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Allow directly attaching of Parquet/CSV/JSON files (duckdb/duckdb#17415)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 19, 2025
Allow directly attaching of Parquet/CSV/JSON files (duckdb/duckdb#17415)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 19, 2025
Allow directly attaching of Parquet/CSV/JSON files (duckdb/duckdb#17415)
@Mytherin Mytherin deleted the attachparquetfile branch June 12, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant