-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: update parser exception handling for extensions #7868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/parser/parser.cpp
Outdated
if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) { | ||
throw ParserException(result.error); | ||
} else if (result.type == ParserExtensionResultType::DISPLAY_ORIGINAL_ERROR) { | ||
parser_error = QueryErrorContext::Format(query, another_parser.error_message, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset and query will be off in the parser error produced here (it will refer to the individual statement instead of the original query). This might be desirable, calling out for discussion.
We should add some test cases around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good. I think the Tokenize
approach is slower than required, and this also turns every parse operation that involves a parser extension into a multi-pass hop, but I think this is fine for now (pending a bigger rework).
The loadable_extension_demo
has a parser extension. I think it is important to test that this works in different configurations, with things like comments, quotes, newlines, mixes of multiple statements, etc. Could you add the tests there?
This also seems to break a lot of the CI/tests which should be addressed as well.
src/parser/parser.cpp
Outdated
return; | ||
// if DuckDB fails to parse the entire sql string, break the string down into individual statements | ||
// so that parser extensions can help parse some statements | ||
if (parsing_failed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe only do this if parser extensions are registered (if (options.extensions)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added options.extensions
to the condition for correctness.
src/parser/parser.cpp
Outdated
auto &t_prev = tokens[i - 1]; | ||
auto &t = tokens[i]; | ||
if (t_prev.type == SimplifiedTokenType::SIMPLIFIED_TOKEN_OPERATOR) { | ||
if (StringUtil::Contains(query.substr(t_prev.start, t.start - t_prev.start), ";")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a copy of the substring for every token - a straightforward loop (as we are looking for a single character) would be more performant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/parser/parser.cpp
Outdated
if (parsing_failed) { | ||
// If DuckDB fails to parse the entire sql string, break the string down into individual statements | ||
// using ; as the delimiter so that parser extensions can parse the statement | ||
if (parsing_failed && options.extensions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It a little hard to tell with nesting (be nice to split up for readability), but think we might be missing the else
block corresponding with this if
that throws the original error. If parsing failed but their is no options.extensions
[will need to split up this if conditional].
I suggest (to make it straight forward to check we have the right logic)
if(parsing_succeed) {
# return here would require refactoring into another function. o.w. will just no-op in order to run wrap up code at the end of this function
} else if(!options.extensions) {
throw parser_error
} else {
[split in to statement and reparse leveraging extension]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, a big if
is def bad. I refactored the logic here: 818e9cc Could you take another look please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Thank you for the review. CI failures should be fixed (they're still running...). I added a list of tests covering a variety of relevant use cases here in our extension. Please let me know if more changes are needed. |
Current code is not really working in case of more than one parsing extension, unsure if this is a real or only potential problem. Current: for (auto &ext : *options.extensions) {
D_ASSERT(ext.parse_function);
auto result = ext.parse_function(ext.parser_info.get(), query_statement);
if (result.type == ParserExtensionResultType::PARSE_SUCCESSFUL) {
auto statement = make_uniq<ExtensionStatement>(ext, std::move(result.parse_data));
statement->stmt_length = query_statement.size();
statement->stmt_location = 0;
statements.push_back(std::move(statement));
}
if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) {
throw ParserException(result.error);
} else if (result.type == ParserExtensionResultType::DISPLAY_ORIGINAL_ERROR) {
parser_error = QueryErrorContext::Format(query, another_parser.error_message,
another_parser.error_location - 1);
throw ParserException(parser_error);
}
} Proposed: bool parsed_single_statement = false;
for (auto &ext : *options.extensions) {
D_ASSERT(!parsed_single_statement);
D_ASSERT(ext.parse_function);
auto result = ext.parse_function(ext.parser_info.get(), query_statement);
if (result.type == ParserExtensionResultType::PARSE_SUCCESSFUL) {
auto statement = make_uniq<ExtensionStatement>(ext, std::move(result.parse_data));
statement->stmt_length = query_statement.size();
statement->stmt_location = 0;
statements.push_back(std::move(statement));
parsed_single_statement = true;
break;
} else if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) {
throw ParserException(result.error);
} else {
// We move to the next one!
}
}
if (!parsed_single_statement) {
parser_error = QueryErrorContext::Format(query, another_parser.error_message,
another_parser.error_location - 1);
throw ParserException(parser_error);
} Also, independent, |
Thank you for the review. I think it would be nice to make it work for when there're multiple parser extensions (though not needed for now). I adopted your feedback and put in |
Thanks! |
Fixes #7854