Unimplemented type for cast (VARCHAR -> NULL) error for aggregation on large string column

### What happens?

For a query involving a string column with NULLs, on a relatively large DataFrame (3.2 million rows), I receive the following error:

```
InvalidInputException: Invalid Input Error: Failed to cast value: Unimplemented type for cast (VARCHAR -> NULL)
```
The query completes successfully when the dataset is slightly smaller.

### To Reproduce

Load an initial 3201 row dataset and remove all but one column (`Major_Genre`). This is a VARCHAR column that contains NULL values.

```python
import pandas as pd
import duckdb

# Load movies dataset using pandas and remove all but one string column
movies = pd.read_json('https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json')
movies = movies[["Major_Genre"]]
duckdb.query("SELECT * from movies")
```
```
┌───────────────────┐
│    Major_Genre    │
│      varchar      │
├───────────────────┤
│ NULL              │
│ Drama             │
│ Comedy            │
│ Comedy            │
│ Drama             │
│ NULL              │
│ NULL              │
│ Comedy            │
│ NULL              │
│ NULL              │
│  ·                │
│  ·                │
│  ·                │
│ Drama             │
│ Adventure         │
│ Comedy            │
│ Comedy            │
│ Comedy            │
│ Comedy            │
│ Thriller/Suspense │
│ Adventure         │
│ Adventure         │
│ Adventure         │
├───────────────────┤
│     3201 rows     │
│    (20 shown)     │
└───────────────────┘
```
Duplicate and concat the original dataset `n` times to create a larger dataset, then count the number of each unique value of the Major_Genera column.

When n is 999, the total number of rows of the dataset is 3197799 and the query completes as expected.
```python
n = 999
movies_medium = pd.concat([movies] * n, axis=0).reset_index()
print(len(movies_medium))
duckdb.query("SELECT Major_Genre, COUNT(*) from movies_medium GROUP BY Major_Genre")
```
```
3197799
┌─────────────────────┬──────────────┐
│     Major_Genre     │ count_star() │
│       varchar       │    int64     │
├─────────────────────┼──────────────┤
│ Action              │       419580 │
│ Drama               │       788211 │
│ NULL                │       274725 │
│ Comedy              │       674325 │
│ Black Comedy        │        35964 │
│ Adventure           │       273726 │
│ Thriller/Suspense   │       238761 │
│ Musical             │        52947 │
│ Romantic Comedy     │       136863 │
│ Horror              │       218781 │
│ Western             │        35964 │
│ Documentary         │        42957 │
│ Concert/Performance │         4995 │
├─────────────────────┴──────────────┤
│ 13 rows                  2 columns │
└────────────────────────────────────┘
```
When n is 1000, the total number of rows of the dataset is 3201000 and the query raises an error:

```python
n = 1000
movies_medium = pd.concat([movies] * n, axis=0).reset_index()
print(len(movies_medium))
duckdb.query("SELECT Major_Genre, COUNT(*) from movies_medium GROUP BY Major_Genre")
```
```
3201000
---------------------------------------------------------------------------
InvalidInputException                     Traceback (most recent call last)
...
InvalidInputException: Invalid Input Error: Failed to cast value: Unimplemented type for cast (VARCHAR -> NULL)
```

If nulls are removed, the query works for even much larger datasets
```
n = 2000
movies_medium = pd.concat([movies] * n, axis=0).reset_index().dropna()
print(len(movies_medium))
duckdb.query("SELECT Major_Genre, COUNT(*) from movies_medium GROUP BY Major_Genre")
```
```
5852000
┌─────────────────────┬──────────────┐
│     Major_Genre     │ count_star() │
│       varchar       │    int64     │
├─────────────────────┼──────────────┤
│ Thriller/Suspense   │       478000 │
│ Drama               │      1578000 │
│ Horror              │       438000 │
│ Adventure           │       548000 │
│ Action              │       840000 │
│ Romantic Comedy     │       274000 │
│ Comedy              │      1350000 │
│ Documentary         │        86000 │
│ Black Comedy        │        72000 │
│ Musical             │       106000 │
│ Western             │        72000 │
│ Concert/Performance │        10000 │
├─────────────────────┴──────────────┤
│ 12 rows                  2 columns │
└────────────────────────────────────┘
```


### OS:

macOS Ventura 13.2.1

### DuckDB Version:

0.7.2-dev586. The error is present in 0.7.0 and beyond, but there is no error in 0.6.0.

### DuckDB Client:

Python

### Full Name:

Jon Mease

### Affiliation:

VegaFusion / Hex

### Have you tried this on the latest `master` branch?

- [X] I agree

### Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

- [X] I agree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unimplemented type for cast (VARCHAR -> NULL) error for aggregation on large string column #6669

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest `master` branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unimplemented type for cast (VARCHAR -> NULL) error for aggregation on large string column #6669

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Have you tried this on the latest `master` branch?