Segfault when query specifiying an absolute path with globs for inputs is executed with limited memory

### What happens?

~In a mac-ARM machine~ When building a query that reads multiple parquet files specified as an **absolute** path in the OS file system but having `*.parquet` at the end (e.g. `/path/to/the/files/*.parquet`), executing a query under a DuckDB process that has limited its own working memory through `SET memory_limit` will end up segfaulting when it needs to spill data to disk.

The bug doesn't occur anymore if the path is specified as a relative path from the working directory (i.e. just `*.parquet`).

~The same query is able to execute successfully in linux-amd64 without crashing. Removing the memory limit avoids the crash in mac-ARM. I'm not sure if it's related to being ARM or being macOS or something else, but it does end up segfaulting, whether it's executed in Python or in R.~

In case it's related, when compiling DuckDB from source on the machine where it crashes, I see many compilation warnings about `jemalloc`.

~And it case it helps, when executed in R, the crash is always blamed on address 0x4 (with "invalid permissions") which looks like something that wouldn't be the output of a memory allocation.~

### To Reproduce

Code to generate files for a query:
```python
import numpy as np, pandas as pd
import string

rng = np.random.default_rng(seed=123)

def gen_large_parquet(file_number):
    n = int(5e6)
    pct_missing = 0.05
    
    def gen_ts_col():
        low = np.datetime64("2020-01-01 00:00:00")
        high = np.datetime64("2020-12-31 23:59:59")
        out = rng.integers(
            low.astype(int),
            high.astype(int),
            size=n
        ).astype("datetime64[s]")
        set_missing = rng.uniform(size=n) <= pct_missing
        out[set_missing] = np.datetime64("NaT")
        return out
    
    def gen_num_col():
        out = rng.standard_normal(size=n)
        set_missing = rng.uniform(size=n) <= pct_missing
        out[set_missing] = np.nan
        return out

    def gen_random_word():
        str_lenght = rng.integers(low=1, high=20)
        choices = [c for c in string.ascii_uppercase + string.digits]
        return "".join(rng.choice(choices, size=str_lenght))

    def gen_cat_col():
        ncateg = rng.integers(low=3, high=20)
        choices = [gen_random_word() for cat in range(ncateg)]
        out = rng.choice(choices, size=n).astype(object)
        set_missing = rng.uniform(size=n) <= pct_missing
        out[set_missing] = None
        return out

    df = pd.DataFrame({
        "dt" : gen_ts_col(),
        "cat1" : gen_cat_col(),
        "cat2" : gen_cat_col(),
        "cat3" : gen_cat_col(),
        "num1" : gen_num_col(),
        "num2" : gen_num_col(),
        "num3" : gen_num_col(),
    })

    n_additional = 10
    for i in range(n_additional):
        name_num = "additional_num_" + str(i)
        name_cat = "additional_cat_" + str(i)
        name_dt = "additional_dt_" + str(i)
        df[name_num] = gen_num_col()
        df[name_cat] = gen_cat_col()
        df[name_dt] = gen_ts_col()

    df.to_parquet(f"df{file_number}.parquet")

for file_number in range(3):
    gen_large_parquet(file_number)
```

The query:
```python
import duckdb

q = """
with tbl0 as (
select
    *
from read_parquet(
    '*.parquet'
    , filename=true
    , file_row_number=true
    , union_by_name=true
)
where
    dt is not null
    and dt >= '2020-04-01'
    and dt <= '2020-08-01'
    and cat1 is not null
    and num1 is not null
)
, tbl_join1 as (
    select
        cat1
        , favg("num2") as "num2_avg"
        , count(*) as num2_cnt
    from tbl0
    where
        "num2" is not null
        and not isnan("num2"::float)
    group by
        cat1
)
, tbl_join2 as (
    select
        cat3
        , favg("num3") as "num3_avg"
        , count(*) as num3_cnt
    from tbl0
    where
        "num3" is not null
        and not isnan("num3"::float)
    group by cat3
)
select
    tbl0.cat1
    , tbl0.cat3
    , cat2
    , num1
    , num3
    , num2
    , dt
    , num2_avg
    , num2_cnt
    , num3_avg
    , num3_cnt
    , row_number() over(order by filename, file_row_number) as row_num
from tbl0
join tbl_join1
    on tbl_join1.cat1 = tbl0.cat1
join tbl_join2
    on tbl_join2.cat3 = tbl0.cat3
where
    "dt" is not null
order by filename, file_row_number
""".strip()

duckdb.query("SET memory_limit='1GB';")
res = duckdb.query(q).df()
```

### OS:

macOS-aarch64

### DuckDB Version:

0.7.2-dev1146

### DuckDB Client:

Python and R

### Full Name:

David Cortes

### Affiliation:

None

### Have you tried this on the latest `master` branch?

- [X] I agree

### Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

- [X] I agree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segfault when query specifiying an absolute path with globs for inputs is executed with limited memory #6881

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest `master` branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segfault when query specifiying an absolute path with globs for inputs is executed with limited memory #6881

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Have you tried this on the latest `master` branch?