-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add FileOpener parameter in some FileSystem method #11259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The file opener allow to access some client context data as secret and variables not exposing them in some method make them no implementable in extensions.
Hello! Let me explain a bit this PR. I'm working on the integration of the write capability in the DuckDB azure extension and will performing my initial test I discover that it was missing the file opener in some on the interfaces, and it is through it that I'm able to access the the current context and therefore secret, variables. |
Hey @quentingodeau! Yea I think breaking the API here may be justified. My first Intuition would be to do the whole API and indeed replace the raw pointers with optional pointers. However for this I would prefer to have this merged in feature and not main, which would mean that we can not get the azure writes working for duckdb v1.0.0 but at the earliest v1.1.0. The alternative would be to sneak this version of the PR in into main with the idea that the API methods that we break here are less used. Another possibility is to first add the non-breaking API to allow azure writes to work in v1.0.0, then refactor the api afterwards. This is a bit more work but perhaps the cleanest. Im a bit undecided here, @Mytherin any opinion on the matter? |
I do have some opinions here yeah, and have tried to tackle this myself in the past. I think we should preserve the old API methods and add a second overload that has an TryOpenFile, TryCreateDirectory, etcI think we also need to look at how these APIs are used at a more fundamental level. If you think about e.g. if (!DirectoryExists(dir)) {
CreateDirectory(dir);
} This is not great because (1) it is two file system operations, and (2) there is an obvious race condition here. A better API would be something like: TryCreateDirectory(dir); This is also something that I tried to tackle in the above changeset (although it has been a while). Ideally we remove all usages of OpenerFileSystemOne change I also made in that PR is that passing in an opener manually at all should not be necessary. After all, if you think about how we usually use the file system, it works like this: auto &fs = FileSystem::GetFileSystem(context);
auto file = fs.OpenFile(..., FileOpener::GetOpener(context)); Now what I have tried in the above changeset is made GetFileTypeFinally, |
I have been struggling with this a ton over in spatial trying to provide file handler hooks to GDAL and emulate |
Ok I have read all your comments I will make the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revisited my old PR and I did actually merge the OpenerFileSystem into main in #7423. I had forgotten about that. So this should greatly simplify the passing of the opener - the OpenerFileSystem
should do this automatically (as I see already happens in the PR).
I would say let's add a optional_ptr<FileOpener> opener = nullptr
to all these APIs and just break compatibility. The actual code using the file system should remain unchanged, and all that needs to happen is the file systems need to be updated.
@@ -125,7 +126,8 @@ BoundStatement Binder::BindCopyTo(CopyStatement &stmt) { | |||
if (is_remote_file) { | |||
use_tmp_file = false; | |||
} else { | |||
bool is_file_and_exists = config.file_system->FileExists(stmt.info->file_path); | |||
bool is_file_and_exists = | |||
config.file_system->FileExists(stmt.info->file_path, ClientData::Get(context).file_opener.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use FileSystem::GetFileSystem(context)
instead? That way we don't need to pass a FileOpener explicitly. It's unclear to me why this is using config.file_system
in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will make the change here :)
void RemoveDirectory(const string &directory) override { | ||
return GetFileSystem().RemoveDirectory(directory); | ||
void RemoveDirectory(const string &directory, FileOpener *opener = nullptr) override { | ||
return GetFileSystem().RemoveDirectory(directory, GetOpener().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these all throw an exception if an opener is passed in, similar to the other code? Otherwise the opener parameter gets silently ignored which could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does a D_ASSERT
from src/include/duckdb/common/assert.hpp
is ok instead of an exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine as well
@Mytherin out of curiosity does the file opener can be null ? |
It is possible in some tests, but it shouldn't be in most code paths |
After looking at this some more I've decided I'm going to pick up my old |
I would like to work on it if that of with you (it's a way for me to get a bit more familiar with the duckdb code base). to give you an insight on what i'm doing:
I didn't go with the |
Thanks - your contribution is much appreciated, but the |
Ok I understand in that case I will drop it and let you manage it. @Mytherin Please when you do the PR for this can you add my name so I can see the changes and sync my duckdb_azure branch. |
Thanks for understanding - I'll keep you posted! |
I have an initial PR open here - #11297 This PR adds the TryOpenFile (although I turned it into a flag instead of a separate method after further consideration). I'm working on a second PR that adds the |
@quentingodeau Is this what you needed to please continue with duckdb/duckdb-azure#44 ? |
The file opener allow to access some client context data as secret and variables not exposing them in some method make them no implementable in extensions.