Skip to content

Variables can 'leak' through the stack in Python #11687

@davetapley

Description

@davetapley

What happens?

The Python API will use Python variables in sql even if they aren't available in the local scope in Python,
they just have to be in scope somewhere in the call stack.

This makes code unpredictable because you need to be aware of all in scope variables in the entire call stack when using sql, and a change in a caller could easily 'leak' a variable in to SQL if the names happen to match 😱

To Reproduce

Here I demonstrate how I was handling the optional existence of a data table,
but then I introduce a data variable in the calling main function which 'leaks' in to foo,
even though Python knows it's out of scope (NameError):

from duckdb import CatalogException, connect


def foo():
    try:
        print(f'foo {data}')
    except NameError:
        print('foo name error')

    con = connect()

    try:
        con.sql('select * from data')
    except CatalogException:
        print('no data table')


def main():
    try:
        print(f'main {data}')
    except NameError:
        print('main name error')

    foo()  # no data table, correct

    data = [1, 2, 3, 4, 5]
    print(f'main {data}')

    foo() # Unexpected InvalidInputException


if __name__ == '__main__':
    main()
main name error
foo name error
no data table
main [1, 2, 3, 4, 5]
foo name error
Traceback (most recent call last):
  File "/workspaces/ng/leaky_var.py", line 33, in <module>
    main()
  File "/workspaces/ng/leaky_var.py", line 29, in main
    foo()  # InvalidInputException: Invalid Input Error: Python Object "data" of type "list"
    ^^^^^
  File "/workspaces/ng/leaky_var.py", line 13, in foo
    con.sql('select * from data')
duckdb.duckdb.InvalidInputException: Invalid Input Error: Python Object "data" of type "list" found on line "/workspaces/ng/leaky_var.py:29" not suitable for replacement scans.
Make sure that "data" is either a pandas.DataFrame, duckdb.DuckDBPyRelation, pyarrow Table, Dataset, RecordBatchReader, Scanner, or NumPy ndarrays with supported format

OS:

Ubuntu

DuckDB Version:

0.9.2

DuckDB Client:

Python

Full Name:

Dave Tapley

Affiliation:

JE Fuller

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions