Skip to content

Overflow when exporting far-future dates to Arrow (IFF registered with pandas) #9520

@gforsyth

Description

@gforsyth

What happens?

If I create a table with a Pandas DataFrame that contains a date value of 9999-12-31, when those values are retrieved in an Arrow table the conversion isn't handled correctly.

This appears to be a relatively recent regression (doesn't happen in 0.8.1). I also tested this dropping the Pandas version down to 1.5.1 and the issue is still there with DuckDB 0.9.1 (I thought it might be related to some of the dtype changes in Pandas 2, but apparently not)

If I first convert the DataFrame to a PyArrow table, then everything works fine.

[ins] In [1]: import pandas as pd

[ins] In [2]: from datetime import date

[ins] In [3]: import duckdb

[ins] In [4]: pd.__version__
Out[4]: '2.1.0'

[ins] In [5]: duckdb.__version__
Out[5]: '0.9.1'

[ins] In [6]: df = pd.DataFrame({"val": [date(9999, 12, 30), date(9999, 12, 31)]})

[ins] In [7]: con = duckdb.connect()

[ins] In [8]: con.execute("CREATE TABLE t as SELECT * FROM df");

[ins] In [9]: con.execute("SELECT * FROM t").fetchall()
Out[9]: [(datetime.date(9999, 12, 30),), (datetime.date(9999, 12, 31),)]

[ins] In [10]: con.execute("SELECT * FROM t").arrow()
Out[10]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,<value out of range: 2147483647>]]

[ins] In [11]: import pyarrow as pa

[ins] In [12]: pat = pa.Table.from_pandas(df)

[ins] In [13]: pat
Out[13]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,9999-12-31]]

[ins] In [14]: con.execute("CREATE TABLE t2 as SELECT * FROM pat")
Out[14]: <duckdb.duckdb.DuckDBPyConnection at 0x7fa934c79530>

[ins] In [15]: con.execute("SELECT * FROM t2").fetchall()
Out[15]: [(datetime.date(9999, 12, 30),), (datetime.date(9999, 12, 31),)]

[ins] In [16]: con.execute("SELECT * FROM t2").arrow()
Out[16]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,9999-12-31]]

To Reproduce

import pandas as pd
from datetime import date
import duckdb

df = pd.DataFrame({"val": [date(9999, 12, 30), date(9999, 12, 31)]})

con = duckdb.connect()

con.execute("CREATE TABLE t as SELECT * FROM df");
con.execute("SELECT * FROM t").fetchall()  # works
con.execute("SELECT * FROM t").arrow()  # doesn't work

OS:

Ubuntu 22.04 x64

DuckDB Version:

0.9.1 and dev

DuckDB Client:

Python

Full Name:

Gil Forsyth

Affiliation:

Voltron Data

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs DocumentationUse for issues or PRs that require changes in the documentationunder review

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions