Skip to content

Using ORDER BY within aggregate functions is slow and consumes a lot of memory. #9950

@yiyuanliu

Description

@yiyuanliu

What happens?

When executing the following query inside a container with a memory limit of 64GB, duckdb consumed a large amount of memory and began to use swap space. The command SET memory_limit = '50GB' didn't work. The query ran for a long time without returning any results, so I had to kill the process.

SET memory_limit = '50GB';
create table tbl as (select random() as key, uuid() as uuid, random() as score from (select unnest(generate_series(100000000))));
explain analyze select key, list(uuid order by score) from tbl group by key;
# cat /proc/`pgrep duckdb`/status | grep Vm
VmPeak:	133817964 kB
VmSize:	133799764 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	66484908 kB
VmRSS:	66359428 kB
VmData:	132941568 kB
VmStk:	     136 kB
VmExe:	   21464 kB
VmLib:	    2812 kB
VmPTE:	  259980 kB
VmSwap:	65703480 kB

After increasing the container's memory limit to 256GB, the query could be executed successfully. However, after the query finished, duckdb still occupied a large amount of memory.

# cat /proc/`pgrep duckdb`/status | grep Vm
VmPeak: 141865136 kB
VmSize: 97442716 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:  139743112 kB
VmRSS:  95574208 kB
VmData: 97288552 kB
VmStk:       136 kB
VmExe:     21464 kB
VmLib:      2812 kB
VmPTE:    256956 kB
VmSwap:        0 kB

The query can finish very quickly if ORDER BY clause is removed,.

To Reproduce

see above

OS:

ubuntu 2204

DuckDB Version:

v0.9.3-dev1411 7d5150c

DuckDB Client:

cli

Full Name:

Yiyuan Liu

Affiliation:

High-Flyer AI

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions