Skip to content

Out of Memory Error When Unnesting Large Array Field #16277

@kylejbrk

Description

@kylejbrk

What happens?

I have large dataset that contains a deeply nested json field I am attempting to unnest. It is the ingredients column that is part of this dataset:
https://huggingface.co/datasets/openfoodfacts/product-database

I have loaded this parquet file into my database and I am attempting to unnest it with the following query:

with
    parsed_ingredients as (
        select code, ingredients::json[] as json_array
        from "mydb"."main_raw"."raw__open_food_facts"
    ),
    ingredient_hierarchy as (
        select
            code,
            unnest(json_array, max_depth := 2) as ingredient_name,
            generate_subscripts(json_array, 1) as ingredient_order,
        from parsed_ingredients
    )

select *
from ingredient_hierarchy
order by code, ingredient_order

When running it results in the following error (note I am limiting my duckdb instance to only use 8gb to run on my hardware):
Out of Memory Error: could not allocate block of size 256.0 KiB (7.4 GiB/7.4 GiB used)

I have also tried converting the field to a struct first but this causes it run out of memory before I even begin unnesting. How can i properly process this field without running OOM?

To Reproduce

Download the parquet data and run the code above to reporoduce.

OS:

Windows

DuckDB Version:

1.7.0

DuckDB Client:

DuckDB-DBT

Hardware:

No response

Full Name:

Kyle Burke

Affiliation:

None

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions