-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
Needs DocumentationUse for issues or PRs that require changes in the documentationUse for issues or PRs that require changes in the documentationunder review
Description
What happens?
If I create a table with a Pandas DataFrame that contains a date value of 9999-12-31
, when those values are retrieved in an Arrow table the conversion isn't handled correctly.
This appears to be a relatively recent regression (doesn't happen in 0.8.1). I also tested this dropping the Pandas version down to 1.5.1 and the issue is still there with DuckDB 0.9.1 (I thought it might be related to some of the dtype changes in Pandas 2, but apparently not)
If I first convert the DataFrame to a PyArrow table, then everything works fine.
[ins] In [1]: import pandas as pd
[ins] In [2]: from datetime import date
[ins] In [3]: import duckdb
[ins] In [4]: pd.__version__
Out[4]: '2.1.0'
[ins] In [5]: duckdb.__version__
Out[5]: '0.9.1'
[ins] In [6]: df = pd.DataFrame({"val": [date(9999, 12, 30), date(9999, 12, 31)]})
[ins] In [7]: con = duckdb.connect()
[ins] In [8]: con.execute("CREATE TABLE t as SELECT * FROM df");
[ins] In [9]: con.execute("SELECT * FROM t").fetchall()
Out[9]: [(datetime.date(9999, 12, 30),), (datetime.date(9999, 12, 31),)]
[ins] In [10]: con.execute("SELECT * FROM t").arrow()
Out[10]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,<value out of range: 2147483647>]]
[ins] In [11]: import pyarrow as pa
[ins] In [12]: pat = pa.Table.from_pandas(df)
[ins] In [13]: pat
Out[13]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,9999-12-31]]
[ins] In [14]: con.execute("CREATE TABLE t2 as SELECT * FROM pat")
Out[14]: <duckdb.duckdb.DuckDBPyConnection at 0x7fa934c79530>
[ins] In [15]: con.execute("SELECT * FROM t2").fetchall()
Out[15]: [(datetime.date(9999, 12, 30),), (datetime.date(9999, 12, 31),)]
[ins] In [16]: con.execute("SELECT * FROM t2").arrow()
Out[16]:
pyarrow.Table
val: date32[day]
----
val: [[9999-12-30,9999-12-31]]
To Reproduce
import pandas as pd
from datetime import date
import duckdb
df = pd.DataFrame({"val": [date(9999, 12, 30), date(9999, 12, 31)]})
con = duckdb.connect()
con.execute("CREATE TABLE t as SELECT * FROM df");
con.execute("SELECT * FROM t").fetchall() # works
con.execute("SELECT * FROM t").arrow() # doesn't work
OS:
Ubuntu 22.04 x64
DuckDB Version:
0.9.1 and dev
DuckDB Client:
Python
Full Name:
Gil Forsyth
Affiliation:
Voltron Data
Have you tried this on the latest main
branch?
I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have
Metadata
Metadata
Assignees
Labels
Needs DocumentationUse for issues or PRs that require changes in the documentationUse for issues or PRs that require changes in the documentationunder review