-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happens?
When I am doing a query using a user defined function in python, if I add a LIMIT clause to the query I get the following error:
[1] 49082 bus error python udf.py
When I remove the LIMIT clause, the query works as expected.
This only happens when I am querying some significant amount of data (a folder of parquet files).
To Reproduce
import duckdb
from duckdb.functional import FunctionNullHandling
def compare_nums(a: str | None, b: str | None) -> bool:
if not a or not b:
return False
return int(a) > int(b)
con = duckdb.connect()
con.create_function(
"compare_nums",
compare_nums,
null_handling=FunctionNullHandling.SPECIAL,
)
res = con.execute(
'SELECT * FROM read_parquet("parquet_test/*.parquet") WHERE compare_nums(_amountIn, -1) LIMIT 1'
).fetchall()
print(len(res))
This is the offending code -- the _amountIn column is a varchar of only numbers. I need the UDF because I am dealing with bigger than 128 bit numbers. Removing the limit statement, this works as expected.
I would like to be able to use the limit statement as I only need a small number of results and it would improve the query speed. I am zipping and including the folder of parquet files so it is reproducible.
OS:
Mac OS
DuckDB Version:
0.9.2
DuckDB Client:
Python
Full Name:
Dino Rodriguez
Affiliation:
Yoz Labs
Have you tried this on the latest main
branch?
I have tested with a release build (and could not test with a main build)
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have