-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Closed
Copy link
Labels
Description
What happens?
When generating a pivot table with more than 20 columns using the SUM()
aggregation, DuckDB forces the result into a HUGEINT
(INT128), even when the values are explicitly cast to a smaller numeric type like UTINYINT
.
This behavior does not occur when using FIRST()
instead of SUM()
, nor when the pivoted result has 20 columns or fewer.
To Reproduce
- Create a table of 20 column names.
- Create a matrix table with pairs of columns and assign random values < 250 using
UTINYINT
. - Run a pivot query using
SUM(val)::UTINYINT
. - Observe correct types for 20 columns, and unexpected promotion to
HUGEINT
for 21 columns.
CREATE TABLE cols (c VARCHAR);
INSERT INTO cols (SELECT 'c'||i FROM range(20) t1(i));
CREATE TABLE matrix (col1 VARCHAR, col2 VARCHAR, val UTINYINT);
INSERT INTO matrix (SELECT c1.c, c2.c, random()*250 FROM cols c1 JOIN cols c2 ON c1 != c2);
.print 'with 20 lines/cols each value is keept in UTINYINT'
PIVOT matrix ON col1 USING sum(val)::UTINYINT ORDER BY col2 LIMIT 2;
-- adding just one value to cols translate INTO one more line / column
INSERT INTO cols VALUES ('c20');
CREATE or replace TABLE matrix (col1 VARCHAR, col2 VARCHAR, val UTINYINT);
INSERT INTO matrix (SELECT c1.c, c2.c, random()*200 FROM cols c1 JOIN cols c2 ON c1 != c2);
.print 'with 21 lines/cols each value are forced INTO HUGEINT even when CASTING INTO UTINYINT'
PIVOT matrix ON col1 USING sum(val)::UTINYINT ORDER BY col2 LIMIT 2;
.print 'the problem doesn't occur with first instead of sum'
PIVOT matrix ON col1 USING first(val)::UTINYINT ORDER BY col2 LIMIT 2;
Expected Behavior:
Values should remain cast to UTINYINT
, even with more than 20 columns.
OS:
Debian GNU/Linux 12 (aarch64), Ubuntu 25.04 (x86_64)
DuckDB Version:
1.3.1, 1.4.0-dev861
DuckDB Client:
duckdb
Hardware:
No response
Full Name:
Cyril Chaboisseau
Affiliation:
none
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have