-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
Description
What happens?
I have a Pandas dataframe that I load from a Parquet file. If a column in that file is of pd.Float64DType, DuckDB does not convert pd.NA values to NULL
, which would be an expected behaviour.
To Reproduce
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import duckdb
testdf = pd.DataFrame({"value": [26.0, 26.0, 26.0, pd.NA, 27.0, pd.NA, pd.NA, pd.NA, pd.NA, 29.0]})
testdf["value"] = testdf["value"].astype(pd.Float64Dtype())
pq.write_table(pa.Table.from_pandas(testdf), "./testdf.parquet")
testdf2 = pd.read_parquet("./testdf.parquet")
con = duckdb.connect(database=":memory:")
con.register("testdf", testdf)
con.register("testdf2", testdf2)
con.sql("SELECT MIN(value) FROM testdf")
┌──────────────┐
│ min("value") │
│ double │
├──────────────┤
│ 26.0 │
└──────────────┘
testdf2["value"].min()
26.0
con.sql("SELECT MIN(value) FROM testdf2")
┌──────────────┐
│ min("value") │
│ double │
├──────────────┤
│ 0.0 │
└──────────────┘
OS:
MacOS, x86_64
DuckDB Version:
1.1.1.
DuckDB Client:
Python
Hardware:
No response
Full Name:
Dmitry Kryuchkov
Affiliation:
SHERLOK TECHNOLOGY PTY LTD
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have