Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary #7697

lnkuiper · 2023-05-26T13:00:40Z

This PR fixes a bunch of allocation-related issues.

Jemalloc

We use jemalloc pretty much out-of-the-box, which is very efficient, but it caused DuckDB to use more than it should. I've now set the decay rate of jemalloc's pages to 1 second rather than the default 10 seconds. This should reduce memory usage mostly for long-running queries.

I've also configured jemalloc to use one arena per thread, and we now flush thread-local caches and purge thread-local arenas after every task that takes longer than 100ms. This is needed because the decay discussed above is triggered by doing allocations or when threads exit. DuckDB's threads do not exit, but stay idle when no queries are run, so, currently, some of jemalloc's allocations are never cleaned up after queries are done.

One clear downside to this approach is when a specific query performs many small tasks that run for just over 100ms. This is a rare case because DuckDB is made for bulk processing, so tasks should run for longer, not triggering a flush very often. In all of our regression tests, this only really regresses one query, 07c in the IMDB benchmark (I hope that's ok?).

Buffer size counting

I've found two places where we didn't route ColumnDataCollection allocations through the buffer manager, namely in the local states of the parquet writer and in the BatchedDataCollection of the PhysicalLimit operator. I've ensured we now count these buffers to our total memory usage, and we shouldn't go over the memory limit there anymore.

Redundant string copying

The parquet writer does dictionary compression when writing strings, and builds a dictionary when creating a row group.

We used to not copy over strings when creating the row group from a ColumnDataCollection, but we found out we needed to because the strings weren't guaranteed to be in memory (because we were using a buffer-managed ColumnDataCollection).

However, we've switched to an in-memory ColumnDataCollection since then (using the BufferAllocator so we can count the size!), so the strings are now guaranteed to be in memory again! I've removed the redundant string copying, which reduces the parquet writer's memory usage and improves performance when writing strings.

Other

I've replaced a bunch of duckdb::unique_ptr with unique_ptr in the parquet extension as we always use the duckdb namespace. This was a side-effect of switching to our own safer unique_ptr implementation.

…nager accounting

…h jemalloc's memory usage overall

…imit for test

lnkuiper · 2023-05-26T13:56:05Z

@carlopi had an idea that we could have a budget of 100ms rather than the fixed 100ms for a task. So, after 2x50ms tasks it will purge. I like the idea, but this regresses imdb q07c more.

Not sure what's best, since this pesky 07c is the only query that really regresses with these thread-local purges

hannes · 2023-05-30T06:49:54Z

I'm okay with that regression. @Mytherin?

Mytherin · 2023-05-30T07:27:23Z

I think that's fine as well

Mytherin

Thanks for the PR! LGTM. Some comments below. I agree that it is likely better to run this every n milliseconds instead of only after a single task running more than 100ms - with perhaps a higher threshold. For example instead of only after tasks that exceed 100ms, after executing tasks totalling >250ms.

An alternative would be to look at the arena size somehow, e.g. using thread.allocated. That might be even better as just because a task is long-running, does not necessarily mean a lot of memory is allocated (e.g. we could be aggregating a large data source into a small hash table).

Mytherin · 2023-05-30T13:35:06Z

src/parallel/task_scheduler.cpp

 	shared_ptr<Task> task;
 	// loop until the marker is set to false
 	while (*marker) {
 		// wait for a signal with a timeout
 		queue->semaphore.wait();
 		if (queue->q.try_dequeue(task)) {
+			const auto timestamp_before_task = CurrentTimeMS();


Could we use Timestamp::GetCurrentTimestamp here?

Mytherin · 2023-05-30T13:37:15Z

src/parallel/task_scheduler.cpp

 void TaskScheduler::ExecuteForever(atomic<bool> *marker) {
 #ifndef DUCKDB_NO_THREADS
+	constexpr static int64_t TASK_DURATION_FLUSH_THRESHOLD_MS = 100;


Perhaps this should be a configuration parameter - with also an optional way to disable it?

lnkuiper · 2023-05-31T09:06:19Z

I'm now using thread.peak.read rather than thread.allocated because this can be reset with thread.peak.reset.

I've set the flush threshold to 134217728 (1 << 27), and now we don't seem to be regressing on IMDB 07c anymore (at least, locally). Hopefully no other regressions in CI

…setting

Mytherin

Thanks for the changes! LGTM. Ready to merge after CI passes.

lnkuiper · 2023-06-05T14:35:08Z

This is ready to go, the one fail is the regression test we OK'd

lnkuiper added 21 commits May 22, 2023 14:47

buffer-managed BatchedDataCollection in PhysicalLimit

5ee4b47

add some more buffer manager accounting in places where we materialize

eb28889

parquet blocks allocated by ArenaAllocator

19c623c

remove unused code

31ff7d1

Merge branch 'feature' into allocations

175532a

undo parquet changes and replace duckdb::unique_ptr with unique_ptr

4a51aef

no longer copy strings into dictionary for parquet and more buffer ma…

3badba3

…nager accounting

Merge branch 'feature' into allocations

415d388

no BufferAllocator

7de761f

try to configure jemalloc exactly once

45e8767

cleanup jemalloc mess

c33b839

merge feature

b43e569

reduce decay time

be7945f

make sure stringheap survives until the dictionary has been written

484dc0b

less aggressive allocation cleanup

3162a52

thread-local arena's, thread-local purges, and decay. should help wit…

d4eddfc

…h jemalloc's memory usage overall

increase memory limit for test now that we're properly counting

c8c4810

only flush thread allocations after long tasks, and increase memory l…

a14acf2

…imit for test

Merge branch 'feature' into allocations

b3bc453

remove accidentally pasted text from test

a4ed6b0

remove iostream

bbf3e57

Mytherin reviewed May 30, 2023

View reviewed changes

lnkuiper added 2 commits May 31, 2023 10:36

Merge branch 'feature' into allocations

8512bad

thread.peek-based allocation flushing

798ed9a

lnkuiper added 2 commits May 31, 2023 13:40

Merge branch 'feature' into allocations

be89f1d

higher flush threshold so we hopefully don't regress on IMDB 07c

c9f0aaf

lnkuiper added 7 commits June 1, 2023 09:33

make allocator flush threshold a setting

3b15159

Merge branch 'feature' into allocations

2d1df81

add allocator_flush_threshold to test

2f0da4c

LCOV exclude for file that I didn't touch (?) and thread-safe config …

cdc978b

…setting

Merge branch 'feature' into allocations

a90cdf4

remove /results (path not found)

3764a15

merge with feature

128373c

Mytherin approved these changes Jun 5, 2023

View reviewed changes

Mytherin merged commit f46704b into duckdb:feature Jun 5, 2023

Mytherin mentioned this pull request Jun 5, 2023

Issue #7809: Segment Tree Performance #7831

Merged

lnkuiper mentioned this pull request Jun 13, 2023

Crash running external hash join #7923

Closed

2 tasks

Mytherin mentioned this pull request Jun 20, 2023

Memory leak when using single thread or 2 threads, 3 or more the issue goes away #8004

Closed

2 tasks

lnkuiper deleted the allocations branch June 26, 2023 09:06

Giorgi mentioned this pull request Jun 30, 2023

Memory leak Giorgi/DuckDB.NET#113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary #7697

Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary #7697

Uh oh!

lnkuiper commented May 26, 2023

Uh oh!

lnkuiper commented May 26, 2023 •

edited

Loading

Uh oh!

hannes commented May 30, 2023

Uh oh!

Mytherin commented May 30, 2023

Uh oh!

Mytherin left a comment

Uh oh!

Mytherin May 30, 2023

Uh oh!

Mytherin May 30, 2023

Uh oh!

lnkuiper commented May 31, 2023

Uh oh!

Mytherin left a comment

Uh oh!

lnkuiper commented Jun 5, 2023

Uh oh!

Uh oh!

Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary #7697

Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary #7697

Uh oh!

Conversation

lnkuiper commented May 26, 2023

Jemalloc

Buffer size counting

Redundant string copying

Other

Uh oh!

lnkuiper commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannes commented May 30, 2023

Uh oh!

Mytherin commented May 30, 2023

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

Mytherin May 30, 2023

Choose a reason for hiding this comment

Uh oh!

Mytherin May 30, 2023

Choose a reason for hiding this comment

Uh oh!

lnkuiper commented May 31, 2023

Uh oh!

Mytherin left a comment

Choose a reason for hiding this comment

Uh oh!

lnkuiper commented Jun 5, 2023

Uh oh!

Uh oh!

lnkuiper commented May 26, 2023 •

edited

Loading