Skip to content

min() and max() do not respect default collation #13824

@arifsaslam

Description

@arifsaslam

What happens?

The min() and max() aggregate functions in DuckDB do not respect the default collation set using the set default_collation = 'EN_US'; statement. This causes the results to be different from what one would expect based on the sort order.

To Reproduce

  1. Create a table with some sample data:
create table test(id int, name text);
insert into test values (1, 'a'), (2, 'b'), (3, 'c'), (4, 'A'), (5, 'G'), (6, 'd');
  1. Observe the default sort order
select * from test order by name;
  1. This returns the expected sorting:
┌───────┬─────────┐
│  id   │  name   │
│ int32 │ varchar │
├───────┼─────────┤
│     4 │ A       │
│     5 │ G       │
│     1 │ a       │
│     2 │ b       │
│     3 │ c       │
│     6 │ d       │
└───────┴─────────┘
  1. Check the min() and max() functions:
select min(name) from test;
select max(name) from test;

This also returns the expected results:

┌─────────────┐
│ min("name") │
│   varchar   │
├─────────────┤
│ A           │
└─────────────┘

┌─────────────┐
│ max("name") │
│   varchar   │
├─────────────┤
│ d           │
└─────────────┘
  1. Set the default collation to 'EN_US':
set default_collation = 'EN_US';
  1. Observe the new sort order:
select * from test order by name;

This now returns a different sort order (which is expected):

┌───────┬─────────┐
│  id   │  name   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ a       │
│     4 │ A       │
│     2 │ b       │
│     3 │ c       │
│     6 │ d       │
│     5 │ G       │
└───────┴─────────┘
  1. Check the min() and max() functions again:
select min(name) from test;
select max(name) from test;

The results are not what one would expect based on the new sort order:

┌─────────────┐
│ min("name") │
│   varchar   │
├─────────────┤
│ A           │
└─────────────┘

┌─────────────┐
│ max("name") │
│   varchar   │
├─────────────┤
│ d           │
└─────────────┘

The min() should return 'a' and the max() should return 'G', but they are returning 'A' and 'd' respectively.

OS:

Ubuntu 20.04.5 LTS

DuckDB Version:

1.1

DuckDB Client:

python

Hardware:

No response

Full Name:

Arif Aslam

Affiliation:

Mammoth Analytics Inc.

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions