Skip to content

Duckdb is consuming too much memory way beyond what is specified using max_memory parameter in case of approx_count_distinct function #10023

@mozz3

Description

@mozz3

What happens?

When query is fired on duckdb database that uses "approx_count_distinct" function, seems that duckdb is consuming way too much memory than what is specified in max_memory parameter

To Reproduce

  1. Launch terminal/command prompt/shell and go the directory where duckdb exe is present

  2. Attach to the duckdb database using below command. You can find lineorder_flat.db file in the attachment (attachment contains .txt extension, so please remove .txt extension): lineorder_flat.db.txt
    lineorder_flat.db.txt

    This file contains empty table

    duckdb lineorder_flat.db
    
  3. insert data to the table using below command and csv file, de-normalised SSB line order data

    INSERT INTO lineorder_flat SELECT * FROM read_csv('lineorder_flat.tbl', AUTO_DETECT=TRUE);
  4. Set max memory to the database using following SQL.

    set max_memory="2GB";
  5. Run the query that uses "approx_count_distinct" function. In this case I ran below query

    select lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_commitdate, c_nation, c_name, c_address, c_city, c_nation, c_region, c_phone, c_mktsegment, s_name, s_address, s_city, s_nation, s_region, s_phone, p_name, p_mfgr, p_category, p_brand, p_color, p_type, p_size, p_container, sum(lo_tax), sum(lo_discount), sum(lo_extendedprice), approx_count_distinct(c_name)
    from lineorder_flat
    group by all;
  6. Monitor the duckdb usage using top command or on activity monitor in Mac

  7. In my case memory usage is going beyond 30GB that duckdb process is consuming. Please see the attached screenshot of Activity monitor of my MacBook

  8. Please note that if instead of "approx_count_distinct", if I use "count" function then memory usage is within 2GB limit.

OS:

MACOS

DuckDB Version:

Latest

DuckDB Client:

Command Line

Full Name:

Mozz3

Affiliation:

Self

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
memory utilisation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions