Skip to content

pd.NA values get coerced to 0.0 for a pd.Float64DType  #14129

@dmitry-sherlok

Description

@dmitry-sherlok

What happens?

I have a Pandas dataframe that I load from a Parquet file. If a column in that file is of pd.Float64DType, DuckDB does not convert pd.NA values to NULL, which would be an expected behaviour.

To Reproduce

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import duckdb
testdf = pd.DataFrame({"value": [26.0, 26.0, 26.0, pd.NA, 27.0, pd.NA, pd.NA, pd.NA, pd.NA, 29.0]})
testdf["value"] = testdf["value"].astype(pd.Float64Dtype())
pq.write_table(pa.Table.from_pandas(testdf), "./testdf.parquet")
testdf2 = pd.read_parquet("./testdf.parquet")
con = duckdb.connect(database=":memory:")
con.register("testdf", testdf)
con.register("testdf2", testdf2)
con.sql("SELECT MIN(value) FROM testdf")
┌──────────────┐
│ min("value") │
│    double    │
├──────────────┤
│         26.0 │
└──────────────┘
testdf2["value"].min()
26.0
con.sql("SELECT MIN(value) FROM testdf2")
┌──────────────┐
│ min("value") │
│    double    │
├──────────────┤
│          0.0 │
└──────────────┘

OS:

MacOS, x86_64

DuckDB Version:

1.1.1.

DuckDB Client:

Python

Hardware:

No response

Full Name:

Dmitry Kryuchkov

Affiliation:

SHERLOK TECHNOLOGY PTY LTD

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions