Skip to content

sum( ... RESPECT NULLS) should throw error (and the docs shouldn't say that RESPECT NULLS is the default) #10965

@soerenwolfers

Description

@soerenwolfers

What happens?

The documentation says "The inverse of IGNORE NULLS is RESPECT NULLS, which is the default for all functions."

(1) I don't think count/sum and most other aggregate functions "respect nulls" in any sense of the word. A NULL-respecting count would just count all rows like count(*) does or pandas.shape does; a null-respecting sum would return NULL if any value was NULL, analogous to how 3.14::FLOAT + NULL evaluates to NULL in duckdb and how df.sum(skipna=False) behaves in pandas.
In fact, the documentation also says "[...] will not work as expected when using an aggregate function that does not ignore NULL values. The first function falls into this category" when talking about count.

(2) I therefore don't think it makes sense to allow count/sum/...(... RESPECT NULLS), which currently does the same as count/sum/...(...) and is better described as IGNORE NULLS

(3) Also, the documentation should be updated, but I can open an issue for that once I get confirmation on the technical side here

To Reproduce

FROM
(VALUES (NULL), (1)) t(b)
SELECT
sum(b RESPECT NULLS)
GROUP BY ()

and

FROM
(VALUES (NULL), (1)) t(b)
SELECT
sum(b RESPECT NULLS) OVER ()

work, but behave like IGNORE NULLS (i.e., return 1).

OS:

Linux

DuckDB Version:

'0.10.1-dev717'

DuckDB Client:

Python

Full Name:

Soeren Wolfers

Affiliation:

G-Research

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions