Skip to content

Python UDF with arrow type incorrectly returns the cache result of the first row #9921

@chilang

Description

@chilang

What happens?

When defining an Python UDF with type="arrow", e.g.

import random as r

def random_arrow(x):
        return [r.randint(0, 10) for item in x.to_pylist()]
...
duckdb.create_function(
        "random_arrow",
        random_arrow,
        [VARCHAR],
        INTEGER,
        side_effects=True,
        type="arrow",
    )

executing a query which call the UDF with the same argument results in incorrect results (which contains the first "row" duplicated up to results set size), i.e.

SELECT random_arrow('') FROM range(10)

returns [(1,), (1), (1,), ...] instead of random numbers.
This incorrect and doesn't seems to honor the side_effects=True flag.

Note: Using SELECT random_arrow(range) FROM range(10) returns correct results.

Attached is a minimally reproducing test.

To Reproduce

import duckdb
import random as r

from duckdb.typing import INTEGER, VARCHAR

def random_arrow(x):
    return [r.randint(0, 10) for item in x.to_pylist()]

def random_non_arrow(x):
    return r.randint(0, 10)

con = duckdb.connect()

con.create_function(
    "random_non_arrow",
    random_non_arrow,
    [VARCHAR],
    INTEGER,
    side_effects=True,
)

con.create_function(
    "random_arrow",
    random_arrow,
    [VARCHAR],
    INTEGER,
    side_effects=True,
    type="arrow",
)

res = con.sql("select random_non_arrow('') from range(10)").fetchall()
assert len(set(res)) > 1

res = con.sql("select random_arrow(range) from range(10)").fetchall()
assert len(set(res)) > 1

res = con.sql("select random_arrow('') from range(10)").fetchall()
assert len(set(res)) > 1 # should pass

fails

>       assert len(set(res)) > 1
E       assert 1 > 1
E        +  where 1 = len({(10,)})
E        +    where {(10,)} = set([(10,), (10,), (10,), (10,), (10,), (10,), ...])

OS:

arm64

DuckDB Version:

0.9.2

DuckDB Client:

Python

Full Name:

Chi Lang Ngo

Affiliation:

NA

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions