Skip to content

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented Jun 13, 2025

Following the discussion here:
https://github.com/duckdblabs/motherduck/issues/320

Our eviction queue uses an LRU policy to evict pages. However, we need to "purge dead nodes" (pages that have been (un)pinned more than once before they are evicted), which re-adds some of the "alive nodes" to the back of the queue. This means that the LRU behavior is not maintained for a small portion of the pages.

This PR sorts the alive nodes to slightly improve the LRU behavior, and re-adds them in bulk to slightly improve performance (instead of re-adding them one-by-one).

EDIT: @taniabogatsch Do you remember if there was any reason we did not bulk re-add already?

@lnkuiper lnkuiper requested a review from taniabogatsch June 13, 2025 07:32
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I don't remember a particular reason - I'd say if the tests pass, all good :))

@Mytherin
Copy link
Collaborator

Thanks for the PR! The bulk enqueue makes sense to me and seems like an improvement, but the sort on sequence numbers does not make that much sense to me. Sequence numbers are kind of arbitrary, they are incremented when a block is pinned, and they are never decremented. That means if you have a block that is pinned/unpinned many times in a short period of time and then never used again it will have a very high sequence number, versus a block that is pinned/unpinned constantly but at a lower frequency. If we want to do this kind of sorting we could add a timestamp or similar to the buffer nodes instead.

@lnkuiper
Copy link
Contributor Author

@Mytherin Thanks, I did not realize that. I thought it was a global thing. I've removed the sort for now, we can add timestamps and sort in v1.4.0

@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 13, 2025 14:01
@lnkuiper lnkuiper marked this pull request as ready for review June 13, 2025 14:02
@duckdb-draftbot duckdb-draftbot marked this pull request as draft June 18, 2025 14:32
@lnkuiper lnkuiper marked this pull request as ready for review June 18, 2025 14:32
@Mytherin Mytherin merged commit 8c40a99 into duckdb:v1.3-ossivalis Jun 19, 2025
52 checks passed
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Jun 19, 2025
Eviction queue: Sort purged nodes and bulk re-add (duckdb/duckdb#17913)
[CI] adding DONT_LINK parameter to the test extension configuration for `inet` extension (duckdb/duckdb#17967)
bump julia to v1.3.1 (duckdb/duckdb#17966)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Jun 19, 2025
Eviction queue: Sort purged nodes and bulk re-add (duckdb/duckdb#17913)
[CI] adding DONT_LINK parameter to the test extension configuration for `inet` extension (duckdb/duckdb#17967)
bump julia to v1.3.1 (duckdb/duckdb#17966)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
}
}

// bulk re-add (TODO order them by timestamp to better retain the LRU behavior)
q.enqueue_bulk(purge_nodes.begin(), alive_nodes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lnkuiper @Mytherin @taniabogatsch This function call will cause the DuckDB WASM build to fail. Because the MVP/EH feature of WASM doesn't support thread (DUCKDB_NO_THREADS=1)

In this case, the file https://github.com/duckdb/duckdb/blob/main/third_party/concurrentqueue/concurrentqueue.h would not be included. Instead a simple polyfill for the ConcurrentQueue class in the file https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parallel/concurrentqueue.hpp will be used.

However, this class doesn't implement a enqueue_bulk method, so the following error will be thrown:

/path/to/duckdb-wasm/submodules/duckdb/src/storage/buffer/buffer_pool.cpp:227:4: error: no member named 'enqueue_bulk' in 'duckdb_moodycamel::ConcurrentQueue<duckdb::BufferEvictionNode>'

I think that a method with the same name enqueue_bulk needs to be added into the ConcurrentQueue class in DuckDB src directory.
Or checking the macro variable DUCKDB_NO_THREADS or other macro variables which could determine whether the real ConcurrentQueue class has been included inthe file https://github.com/duckdb/duckdb/blob/main/src/storage/buffer/buffer_pool.cpp

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants