-
Notifications
You must be signed in to change notification settings - Fork 2.6k
MultiFileReader Rework (part 18): Replace file path with OpenFileInfo
struct
#17071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I like the idea of having a OpenFileInfo struct that can carry more info than just the string path. But have you considered also making Glob return something more than just a vector of OpenFileInfos? E.g. some sort of Also I think |
The idea is to add extra members to The design I have for it in my follow-up branch is this: struct ExtendedOpenFileInfo {
unordered_map<string, Value> options;
};
struct OpenFileInfo {
OpenFileInfo() = default;
OpenFileInfo(string path_p) // NOLINT: allow implicit conversion from string
: path(std::move(path_p)) {
}
string path;
shared_ptr<ExtendedOpenFileInfo> extended_info;
public:
bool operator<(const OpenFileInfo &rhs) const {
return path < rhs.path;
}
}; I think reworking glob to return an iterator is also a good idea - but somewhat orthogonal to this PR. |
Ah alright, I guess that makes sense. I guess |
…on keys and footer size to Parquet reader (#17085) Follow-up to #17071 This PR extends the `OpenFileInfo` struct with `ExtendedOpenFileInfo`: ```cpp struct ExtendedOpenFileInfo { unordered_map<string, Value> options; }; ``` The Parquet reader is extended to support two keys in this struct: * `encryption_key`: used to pass a global encryption key for the Parquet file to the reader * `footer_size`: used to pass the footer size of the Parquet file to the reader. This can be used to skip the separate reading of the footer length from the back of the file - instead we can directly read the footer. Note that this must be exactly correct - an error is thrown if an incorrect footer size is provided.
MultiFileReader Rework (part 18): Replace file path with `OpenFileInfo` struct (duckdb/duckdb#17071) update julia to v1.2.2 (duckdb/duckdb#17074)
MultiFileReader Rework (part 18): Replace file path with `OpenFileInfo` struct (duckdb/duckdb#17071) update julia to v1.2.2 (duckdb/duckdb#17074)
MultiFileReader Rework (part 18): Replace file path with `OpenFileInfo` struct (duckdb/duckdb#17071) update julia to v1.2.2 (duckdb/duckdb#17074)
MultiFileReader Rework (part 18): Replace file path with `OpenFileInfo` struct (duckdb/duckdb#17071) update julia to v1.2.2 (duckdb/duckdb#17074)
MultiFileReader Rework (part 18): Replace file path with `OpenFileInfo` struct (duckdb/duckdb#17071) update julia to v1.2.2 (duckdb/duckdb#17074)
This PR reworks several parts of the file system and multi file reader to work on an extensible
OpenFileInfo
struct instead of astring path
:OpenFileExtended
takes anOpenFileInfo
Glob
returns a list ofOpenFileInfo
OpenFileInfo
(including theMultiFileList
that returns a list ofOpenFileInfo
)The idea here is that we can pass per-file information down to the readers and file system that can avoid doing unnecessary work. For example:
This PR does not change anything aside from wrapping the paths into a struct - but that will allow these optimizations in a follow-up PR.