-
Notifications
You must be signed in to change notification settings - Fork 129
Disable allow_stream_result
to force a materialized DuckDB execution results
#877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/pgduckdb_node.cpp
Outdated
// This is required for cases like CTAS from a Postgres table, where allowing streaming results | ||
// could lead to race conditions on Postgres resources. | ||
// Checkout discussion: https://github.com/duckdb/pg_duckdb/discussions/866 | ||
auto pending = prepared.PendingQuery(named_values, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to only do this if postgres tables are actually involved in the query that we send to duckdb. Otherwise dataloading steps (e.g. loading a few GB parquet file) will use much more memory for no real reason, because now the result cannot be streamed anymore. The only way to really know whether a postgres query is involved in the query (given that duckdb.query
exists) is to do this detection while preparing the statement, i.e. if it the plan involves a postgres scan, then we should not allow streaming results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check on the rtables of the Query instance. And as long as a Postgres table is referenced, the stream is disabled.
src/scan/postgres_scan.cpp
Outdated
PostgresScopedStackReset scoped_stack_reset; | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these changes suddenly needed? (as well as the one below in PostgresScanFunction
) It seems like these additions could very well be the cause for the test failures in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code's been removed. I hit a stack limit error when I was testing, but now cannot reproduce it again
src/pgduckdb_hooks.cpp
Outdated
@@ -149,6 +149,20 @@ namespace pgduckdb { | |||
|
|||
int64_t executor_nest_level = 0; | |||
|
|||
bool | |||
ContainsPostgresTable(const Query *query) { | |||
List *rtable = query->rtable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do a full traversal of the tree to look for more queries, not just look in the outermost query. Similarly to ContainsDuckdbItems
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YuweiXiao are you still fixing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sure, will get back on it this week
8ea040f
to
7bea7dd
Compare
We don't correctly take the global Postgres lock in the main Postgres thread when fetching rows one by one from the chunks in
Duckdb_ExecCustomScan_Cpp
. Background threads might be calling postgres functions (while holding the lock), but the main thread never takes the lock so race conditions are bound to happen when that's the case. Fixing this requires some refactoring and careful testing. So for now we work around the issue by disallowing streaming results, which makes sure that the only the background threads will call Postgres functions, and the main thread will only continue running when all background threads are finished.Discussion: #866