Skip to content

Conversation

stephaniewang526
Copy link
Contributor

Fixes #7854

if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) {
throw ParserException(result.error);
} else if (result.type == ParserExtensionResultType::DISPLAY_ORIGINAL_ERROR) {
parser_error = QueryErrorContext::Format(query, another_parser.error_message,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset and query will be off in the parser error produced here (it will refer to the individual statement instead of the original query). This might be desirable, calling out for discussion.

We should add some test cases around this.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good. I think the Tokenize approach is slower than required, and this also turns every parse operation that involves a parser extension into a multi-pass hop, but I think this is fine for now (pending a bigger rework).

The loadable_extension_demo has a parser extension. I think it is important to test that this works in different configurations, with things like comments, quotes, newlines, mixes of multiple statements, etc. Could you add the tests there?

This also seems to break a lot of the CI/tests which should be addressed as well.

return;
// if DuckDB fails to parse the entire sql string, break the string down into individual statements
// so that parser extensions can help parse some statements
if (parsing_failed) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only do this if parser extensions are registered (if (options.extensions))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added options.extensions to the condition for correctness.

auto &t_prev = tokens[i - 1];
auto &t = tokens[i];
if (t_prev.type == SimplifiedTokenType::SIMPLIFIED_TOKEN_OPERATOR) {
if (StringUtil::Contains(query.substr(t_prev.start, t.start - t_prev.start), ";")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a copy of the substring for every token - a straightforward loop (as we are looking for a single character) would be more performant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (parsing_failed) {
// If DuckDB fails to parse the entire sql string, break the string down into individual statements
// using ; as the delimiter so that parser extensions can parse the statement
if (parsing_failed && options.extensions) {
Copy link
Contributor

@rjatwal rjatwal Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It a little hard to tell with nesting (be nice to split up for readability), but think we might be missing the else block corresponding with this if that throws the original error. If parsing failed but their is no options.extensions [will need to split up this if conditional].

I suggest (to make it straight forward to check we have the right logic)

if(parsing_succeed) {
   # return here would require refactoring into another function. o.w. will just no-op in order to run wrap up code at the end of this function
} else if(!options.extensions) {
    throw parser_error
} else {
    [split in to statement and reparse leveraging extension]
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, a big if is def bad. I refactored the logic here: 818e9cc Could you take another look please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@stephaniewang526
Copy link
Contributor Author

Thanks for the PR! Looks good. I think the Tokenize approach is slower than required, and this also turns every parse operation that involves a parser extension into a multi-pass hop, but I think this is fine for now (pending a bigger rework).

The loadable_extension_demo has a parser extension. I think it is important to test that this works in different configurations, with things like comments, quotes, newlines, mixes of multiple statements, etc. Could you add the tests there?

This also seems to break a lot of the CI/tests which should be addressed as well.

Thank you for the review. CI failures should be fixed (they're still running...). I added a list of tests covering a variety of relevant use cases here in our extension. Please let me know if more changes are needed.

@carlopi
Copy link
Contributor

carlopi commented Jun 9, 2023

Current code is not really working in case of more than one parsing extension, unsure if this is a real or only potential problem.

Current:

for (auto &ext : *options.extensions) {
	D_ASSERT(ext.parse_function);
	auto result = ext.parse_function(ext.parser_info.get(), query_statement);
	if (result.type == ParserExtensionResultType::PARSE_SUCCESSFUL) {
		auto statement = make_uniq<ExtensionStatement>(ext, std::move(result.parse_data));
		statement->stmt_length = query_statement.size();
		statement->stmt_location = 0;
		statements.push_back(std::move(statement));
	}
	if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) {
		throw ParserException(result.error);
	} else if (result.type == ParserExtensionResultType::DISPLAY_ORIGINAL_ERROR) {
		parser_error = QueryErrorContext::Format(query, another_parser.error_message,
						                                         another_parser.error_location - 1);
		throw ParserException(parser_error);
	}
}

Proposed:

bool parsed_single_statement = false;
for (auto &ext : *options.extensions) {
	D_ASSERT(!parsed_single_statement);
	D_ASSERT(ext.parse_function);
	auto result = ext.parse_function(ext.parser_info.get(), query_statement);
	if (result.type == ParserExtensionResultType::PARSE_SUCCESSFUL) {
		auto statement = make_uniq<ExtensionStatement>(ext, std::move(result.parse_data));
		statement->stmt_length = query_statement.size();
		statement->stmt_location = 0;
		statements.push_back(std::move(statement));
		parsed_single_statement = true;
		break;
	} else if (result.type == ParserExtensionResultType::DISPLAY_EXTENSION_ERROR) {
		throw ParserException(result.error);
	} else {
		// We move to the next one!
	}
}
if (!parsed_single_statement) {
	parser_error = QueryErrorContext::Format(query, another_parser.error_message,
					                                         another_parser.error_location - 1);
	throw ParserException(parser_error);
}

Also, independent, // LCOV_EXCL_START and // LCOV_EXCL_END will need to be added around the untested code (otherwise CodeCoverage CI will fail, check latest run to discover which lines are uncovered).

@stephaniewang526
Copy link
Contributor Author

Thank you for the review. I think it would be nice to make it work for when there're multiple parser extensions (though not needed for now). I adopted your feedback and put in // LCOV_EXCL_START and // LCOV_EXCL_STOP.

@Mytherin Mytherin merged commit b0a1a5a into duckdb:master Jun 11, 2023
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants