Skip to content

Python bus error when using a UDF with a LIMIT clause #9786

@dino-rodriguez

Description

@dino-rodriguez

What happens?

When I am doing a query using a user defined function in python, if I add a LIMIT clause to the query I get the following error:

[1]    49082 bus error  python udf.py

When I remove the LIMIT clause, the query works as expected.

This only happens when I am querying some significant amount of data (a folder of parquet files).

To Reproduce

import duckdb
from duckdb.functional import FunctionNullHandling


def compare_nums(a: str | None, b: str | None) -> bool:
    if not a or not b:
        return False

    return int(a) > int(b)


con = duckdb.connect()
con.create_function(
    "compare_nums",
    compare_nums,
    null_handling=FunctionNullHandling.SPECIAL,
)
res = con.execute(
    'SELECT * FROM read_parquet("parquet_test/*.parquet") WHERE compare_nums(_amountIn, -1) LIMIT 1'
).fetchall()
print(len(res))

This is the offending code -- the _amountIn column is a varchar of only numbers. I need the UDF because I am dealing with bigger than 128 bit numbers. Removing the limit statement, this works as expected.

I would like to be able to use the limit statement as I only need a small number of results and it would improve the query speed. I am zipping and including the folder of parquet files so it is reproducible.

parquet_test.zip

OS:

Mac OS

DuckDB Version:

0.9.2

DuckDB Client:

Python

Full Name:

Dino Rodriguez

Affiliation:

Yoz Labs

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions