Skip to content

PIVOT with SUM on >20 columns coerces values to HUGEINT despite explicit cast #18007

@adren

Description

@adren

What happens?

When generating a pivot table with more than 20 columns using the SUM() aggregation, DuckDB forces the result into a HUGEINT (INT128), even when the values are explicitly cast to a smaller numeric type like UTINYINT.

This behavior does not occur when using FIRST() instead of SUM(), nor when the pivoted result has 20 columns or fewer.

To Reproduce

  1. Create a table of 20 column names.
  2. Create a matrix table with pairs of columns and assign random values < 250 using UTINYINT.
  3. Run a pivot query using SUM(val)::UTINYINT.
  4. Observe correct types for 20 columns, and unexpected promotion to HUGEINT for 21 columns.
CREATE TABLE cols (c VARCHAR);
INSERT INTO cols (SELECT 'c'||i FROM range(20) t1(i));

CREATE TABLE matrix (col1 VARCHAR, col2 VARCHAR, val UTINYINT);
INSERT INTO matrix (SELECT c1.c, c2.c, random()*250 FROM cols c1 JOIN cols c2 ON c1 != c2);

.print 'with 20 lines/cols each value is keept in UTINYINT'
PIVOT matrix ON col1 USING sum(val)::UTINYINT ORDER BY col2 LIMIT 2;


-- adding just one value to cols translate INTO one more line / column
INSERT INTO cols VALUES ('c20');
CREATE or replace TABLE matrix (col1 VARCHAR, col2 VARCHAR, val UTINYINT);
INSERT INTO matrix (SELECT c1.c, c2.c, random()*200 FROM cols c1 JOIN cols c2 ON c1 != c2);

.print 'with 21 lines/cols each value are forced INTO HUGEINT even when CASTING INTO UTINYINT'
PIVOT matrix ON col1 USING sum(val)::UTINYINT ORDER BY col2 LIMIT 2;

.print 'the problem doesn't occur with first instead of sum'
PIVOT matrix ON col1 USING first(val)::UTINYINT ORDER BY col2 LIMIT 2;

Expected Behavior:

Values should remain cast to UTINYINT, even with more than 20 columns.

OS:

Debian GNU/Linux 12 (aarch64), Ubuntu 25.04 (x86_64)

DuckDB Version:

1.3.1, 1.4.0-dev861

DuckDB Client:

duckdb

Hardware:

No response

Full Name:

Cyril Chaboisseau

Affiliation:

none

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions