Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Sep 19, 2024

This PR makes ExecutePushInternal respect the max_chunks set in the call to PipelineExecutor::Execute(idx_t max_chunks)

When running DuckDB inside an environment where the control over received signals (like SIGINT) is not instant (like Python or Postgres), we rely on ExecuteTask to return after doing a small amount of processing work.

In the DuckDB CLI this problem was not apparent because SIGINT is handled and Interrupt() is called instantly, which is respected by the execution.
This problem only appeared because the interrupt flag could not be set as the application was first waiting for ExecuteTask to return.

…shInternal knows to return early, respecting the 'max_chunks' set by PipelineExecutor::Execute()
@Tishj
Copy link
Contributor Author

Tishj commented Sep 19, 2024

This is definitely out of my depth and I'm not confident all cases are considered or the necessary state in the PipelineExecutor is preserved to support pausing this process.

The original problem was that ExecutePushInternal would only return when the Execute(chunk, chunk, idx_t) call would not return HAVE_MORE_OUTPUT

@Mytherin Mytherin changed the base branch from main to feature September 19, 2024 12:17
@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 19, 2024 12:24
@Tishj Tishj marked this pull request as ready for review September 19, 2024 12:27
@Tishj
Copy link
Contributor Author

Tishj commented Sep 20, 2024

TryFlushCachingOperators is not prepared to deal with ExecutePushInternal returning HAVE_MORE_OUTPUT, that's the main thing I see that still needs to be fixed

@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 20, 2024 14:52
@Tishj
Copy link
Contributor Author

Tishj commented Sep 23, 2024

TryFlushCachingOperators is not prepared to deal with ExecutePushInternal returning HAVE_MORE_OUTPUT, that's the main thing I see that still needs to be fixed

I fixed this by detecting that HAVE_MORE_OUTPUT gets returned from ExecutePushInternal, then returning false without setting remaining_sink_chunk to indicate that the flushing did not complete yet, but the sink was succesful.

Outside of TryFlushCachingOperator we check if remaining_sink_chunk was set:

				if (remaining_sink_chunk) {
					return PipelineExecuteResult::INTERRUPTED;
				} else {
					return PipelineExecuteResult::NOT_FINISHED;
				}

By not setting remaining_sink_chunk we don't try to re-sink the chunk, enter TryFlushCachingOperators as normal, and because in_process_operators is non-empty (otherwise HAVE_MORE_OUTPUT would not be returned) we will get back into ExecutePushInternal where we continue pushing the produced chunks into the Sink.

@Tishj Tishj marked this pull request as ready for review September 23, 2024 12:45
@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 23, 2024 15:34
@Tishj Tishj marked this pull request as ready for review September 23, 2024 15:46
@duckdb-draftbot duckdb-draftbot marked this pull request as draft September 24, 2024 07:48
@Tishj Tishj marked this pull request as ready for review September 24, 2024 07:48
@hannes hannes requested review from lnkuiper and removed request for samansmink and Mytherin October 2, 2024 10:28
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This looks great. I just want to add a few assertions:

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 4, 2024 08:18
@Tishj Tishj marked this pull request as ready for review October 4, 2024 08:49
@Tishj Tishj requested a review from samansmink October 4, 2024 12:06
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! LGTM now

@Mytherin Mytherin merged commit 92cf6f1 into duckdb:feature Oct 7, 2024
41 of 42 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Oct 7, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants