Skip to content

Conversation

Maxxen
Copy link
Member

@Maxxen Maxxen commented Oct 9, 2024

Closes duckdb/duckdb-node#124
Closes duckdb/duckdb-spatial#423

This PR changes the GeoParquet reading behavior so that the spatial extension is no longer auto-loaded when GeoParquet metadata is detected. Spatial was not marked as autoloadable and the way the load was issued did not respect auto loading settings either. If spatial would become auto-loadable in the future we can change this behavior back (but this time also respect auto loading settings).

This PR also adds a new setting to the parquet extension enable_geoparquet_conversion = true (default) which controls whether geometry data will be converted when reading/writing GeoParquet files (and the spatial extension is loaded). This enables you to "fall back" to reading writing parquet files normally even if they have geoparquet metadata. This is primarily useful when trying to read a geoparquet file that is incompatible with the current geoparquet support in duckdb, either because it uses an unsupported version/encoding or because it is non-standards compliant (or the metadata is broken in some other way).

Additionally, I've also moved over the geoparquet tests and test files into this repository for now. Since they require spatial they won't be invoked during our normal CI runs (I think?), but it still useful when testing locally.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 9, 2024 18:51
@Maxxen Maxxen marked this pull request as ready for review October 9, 2024 18:51
const auto is_loaded = ExtensionHelper::TryAutoLoadExtension(context, "spatial");
if (!is_loaded) {
// Spatial extension is not available, we can't make use of the metadata anyway.
yyjson_doc_free(geo_metadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay in?

And possibly also the comment // Check if the spatial extension is loaded, or try to autoload it. could be fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 9, 2024 21:15
@Maxxen Maxxen force-pushed the geoparquet-disable branch from e51ed8a to 2c9dcd2 Compare October 9, 2024 21:16
@Maxxen Maxxen marked this pull request as ready for review October 9, 2024 21:17
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 10, 2024 10:24
@Maxxen Maxxen marked this pull request as ready for review October 10, 2024 10:24
@Mytherin Mytherin merged commit 6dc2e93 into duckdb:main Oct 11, 2024
44 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 16, 2024
Add option to ignore GeoParquet, disable spatial autoloading when reading GeoParquet (duckdb/duckdb#14297)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Oct 16, 2024
Add option to ignore GeoParquet, disable spatial autoloading when reading GeoParquet (duckdb/duckdb#14297)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geoparquet column 'geometry' does not have geometry types Auto Install/Load extensions is causing schema inference discrepancies
3 participants