Skip to content

Conversation

samansmink
Copy link
Contributor

Fixes #5685

The addition of presigned s3 urls caused some complications that broke the s3 upload. Somehow minio handles url encoding slightly differently which is why we didn't catch it.

Changes made to fix it:

  • '?' is no longer allowed in s3 urls by default: the first occurence of a ? will be considered the start of the query param string, like a regular url.
  • to present those who really want to use '?'s in their s3 urls, theres now a s3_url_compatibility_mode option that can be set to disable query params and globs on s3 urls.

While simply disallowing '?' in s3 urls may seem a bit crude, we already did not really support them well as they would be seen as glob characters and would be able to cause unexpected behaviour.

To in short:

By default, Globbing and Query params for fully qualified urls are enabled

SELECT * FROM "s3://bucket/path/file*.parquet?s3_region=eu-west-1";

optionally, you can use the compatibility mode to reach those hard to reach places

SET s3_url_compatibility_mode=true;
SELECT * FROM "s3://bucket/path/an?awfully*named[file].parquet";

If you were using ? in globs on s3 urls, the workaround is to use the [abcde] notation to catch single characters.

@tobilg
Copy link

tobilg commented Feb 23, 2023

@samansmink When I look at my S3 prefix where the files are written to when using COPY TO PARTITION BY, I see keys like

data_0.parquet?partNumber=1&uploadId=1_X6JmBpcba6CpYIKsbeO9XbOK.eRfTJzXhMyx2OZKSQUJ.w2eFADSc8fyQec77c1tJK6bp381ESX__5lfAccVsUS.ovfQY0JzltvvsbFD.c5CBnP9zVr9xmhi.8pgOFMTYwOD6tDrpOBsK7WAsDKFTfW8rvMEZeIECWYc6wXXu._QutdzKs6AOEB.XAibGJ

Will this PR also handle these cases?

@Mytherin
Copy link
Collaborator

Thanks for the PR! Looks good - it just seems like there are some CI failures remaining - could you have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when coping data to Cloudflare R2
3 participants