-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
duckdb/duckdb-r
#415Labels
Description
What happens?
The min()
and max()
aggregate functions in DuckDB do not respect the default collation set using the set default_collation = 'EN_US'
; statement. This causes the results to be different from what one would expect based on the sort order.
To Reproduce
- Create a table with some sample data:
create table test(id int, name text);
insert into test values (1, 'a'), (2, 'b'), (3, 'c'), (4, 'A'), (5, 'G'), (6, 'd');
- Observe the default sort order
select * from test order by name;
- This returns the expected sorting:
┌───────┬─────────┐
│ id │ name │
│ int32 │ varchar │
├───────┼─────────┤
│ 4 │ A │
│ 5 │ G │
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
│ 6 │ d │
└───────┴─────────┘
- Check the min() and max() functions:
select min(name) from test;
select max(name) from test;
This also returns the expected results:
┌─────────────┐
│ min("name") │
│ varchar │
├─────────────┤
│ A │
└─────────────┘
┌─────────────┐
│ max("name") │
│ varchar │
├─────────────┤
│ d │
└─────────────┘
- Set the default collation to 'EN_US':
set default_collation = 'EN_US';
- Observe the new sort order:
select * from test order by name;
This now returns a different sort order (which is expected):
┌───────┬─────────┐
│ id │ name │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ a │
│ 4 │ A │
│ 2 │ b │
│ 3 │ c │
│ 6 │ d │
│ 5 │ G │
└───────┴─────────┘
- Check the min() and max() functions again:
select min(name) from test;
select max(name) from test;
The results are not what one would expect based on the new sort order:
┌─────────────┐
│ min("name") │
│ varchar │
├─────────────┤
│ A │
└─────────────┘
┌─────────────┐
│ max("name") │
│ varchar │
├─────────────┤
│ d │
└─────────────┘
The min()
should return 'a' and the max()
should return 'G', but they are returning 'A' and 'd' respectively.
OS:
Ubuntu 20.04.5 LTS
DuckDB Version:
1.1
DuckDB Client:
python
Hardware:
No response
Full Name:
Arif Aslam
Affiliation:
Mammoth Analytics Inc.
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have