Potential regression from 0.8 to 0.9 around partitioned writes by timestamp columns

### What happens?

When performing a `COPY` using a `PARTITION_BY(<timestamp-column>)` we end up losing the time information of `<timestamp-column>`.

After some preliminary testing I found out that the issue is with 0.9.x and 0.8.1 has what I consider to be the correct behavior based on the test case below.

### To Reproduce

Create a script `duck-issue.sh` to generate a fake dataset of page views per minute that is rolled up into hourly buckets:

```sh
# Ensure a clean state
output="/tmp/duckdb-issue/$(${DUCK_CLI:-duckdb} --version | awk '{print $1}')"
rm -rf "${output}"
mkdir -p "${output}"

# Generate the data and do the partitioned export
cat <<-SQL | "${DUCK_CLI:-duckdb}"
CREATE TABLE raw_data (
  ts TIMESTAMP_S NOT NULL,
  hits INTEGER NOT NULL
);
INSERT INTO raw_data
SELECT *, (random() * 500)::INTEGER
FROM RANGE(TIMESTAMP '2023-11-01', TIMESTAMP '2023-11-06', INTERVAL 1 MINUTE);
CREATE TABLE timeseries AS (
  SELECT DATE_TRUNC('hour', ts) AS bucket, SUM(hits)::BIGINT AS total
  FROM raw_data
  GROUP BY bucket
);
COPY (
  SELECT * FROM timeseries
) TO '${output}' (
  FORMAT 'PARQUET', COMPRESSION 'SNAPPY', PARTITION_BY (bucket), OVERWRITE_OR_IGNORE
);
SQL
```

Run the script with each version:

```sh
DUCK_CLI=duckdb-0.8.1 ./duck-issue.sh
DUCK_CLI=duckdb-0.9.1 ./duck-issue.sh
DUCK_CLI=duckdb-latest ./duck-issue.sh
```

Then query partitioned parquets, notice that the time is present for the bucket column on 0.8.1:

```
$ duckdb -table -c "SELECT * FROM read_parquet('/tmp/duckdb-issue/v0.8.1/*/*.parquet') LIMIT 5"

+---------------------+-------+
|       bucket        | total |
+---------------------+-------+
| 2023-11-02 02:00:00 | 16181 |
| 2023-11-05 10:00:00 | 13685 |
| 2023-11-01 15:00:00 | 13948 |
| 2023-11-04 05:00:00 | 16777 |
| 2023-11-04 08:00:00 | 15608 |
+---------------------+-------+
```

But not for 0.9.1:

```
$ duckdb -table -c "SELECT * FROM read_parquet('/tmp/duckdb-issue/v0.9.1/*/*.parquet') LIMIT 5"

+------------+-------+
|   bucket   | total |
+------------+-------+
| 2023-11-02 | 12711 |
| 2023-11-05 | 14800 |
| 2023-11-01 | 12593 |
| 2023-11-04 | 15917 |
| 2023-11-04 | 15025 |
+------------+-------+
```

There is a workaround to make things work, but definitely not optimal and would require downstream consumers to know about it:

```
$ duckdb -table -c "SELECT * FROM read_parquet('/tmp/duckdb-issue/v0.9.1/*/*.parquet', hive_partitioning=1, hive_types={'bucket': timestamp}) LIMIT 5"
+---------------------+-------+
|       bucket        | total |
+---------------------+-------+
| 2023-11-02 02:00:00 | 12711 |
| 2023-11-05 10:00:00 | 14800 |
| 2023-11-01 15:00:00 | 12593 |
| 2023-11-04 05:00:00 | 15917 |
| 2023-11-04 08:00:00 | 15025 |
+---------------------+-------+
```

Looks like the problem is still there on 0.9.2-dev242:

```
$ duckdb -table -c "SELECT * FROM read_parquet('/tmp/duckdb-issue/v0.9.2-dev242/*/*.parquet') LIMIT 5"

+------------+-------+
|   bucket   | total |
+------------+-------+
| 2023-11-02 | 16394 |
| 2023-11-05 | 14492 |
| 2023-11-01 | 14577 |
| 2023-11-04 | 14179 |
| 2023-11-04 | 13432 |
+------------+-------+
```

### OS:

x64

### DuckDB Version:

0.9.1

### DuckDB Client:

CLI

### Full Name:

Fabio Rehm

### Affiliation:

Koala

### Have you tried this on the latest `main` branch?

I have tested with a main build

### Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

- [X] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential regression from 0.8 to 0.9 around partitioned writes by timestamp columns #9589

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest `main` branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential regression from 0.8 to 0.9 around partitioned writes by timestamp columns #9589

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest main branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Have you tried this on the latest `main` branch?