-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What happens?
When query is fired on duckdb database that uses "approx_count_distinct" function, seems that duckdb is consuming way too much memory than what is specified in max_memory parameter
To Reproduce
-
Launch terminal/command prompt/shell and go the directory where duckdb exe is present
-
Attach to the duckdb database using below command. You can find lineorder_flat.db file in the attachment (attachment contains .txt extension, so please remove .txt extension): lineorder_flat.db.txt
lineorder_flat.db.txtThis file contains empty table
duckdb lineorder_flat.db
-
insert data to the table using below command and csv file, de-normalised SSB line order data
INSERT INTO lineorder_flat SELECT * FROM read_csv('lineorder_flat.tbl', AUTO_DETECT=TRUE);
-
Set max memory to the database using following SQL.
set max_memory="2GB";
-
Run the query that uses "approx_count_distinct" function. In this case I ran below query
select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_commitdate, c_nation, c_name, c_address, c_city, c_nation, c_region, c_phone, c_mktsegment, s_name, s_address, s_city, s_nation, s_region, s_phone, p_name, p_mfgr, p_category, p_brand, p_color, p_type, p_size, p_container, sum(lo_tax), sum(lo_discount), sum(lo_extendedprice), approx_count_distinct(c_name) from lineorder_flat group by all;
-
Monitor the duckdb usage using top command or on activity monitor in Mac
-
In my case memory usage is going beyond 30GB that duckdb process is consuming. Please see the attached screenshot of Activity monitor of my MacBook
-
Please note that if instead of "approx_count_distinct", if I use "count" function then memory usage is within 2GB limit.
OS:
MACOS
DuckDB Version:
Latest
DuckDB Client:
Command Line
Full Name:
Mozz3
Affiliation:
Self
Have you tried this on the latest main
branch?
I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have