-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
Description
What happens?
Consider the following call to read_csv
:
from read_csv('bug.csv', header=false, names=['','']);
When run with any table in bug.csv
with at least two rows and at least two columns, it sometimes segfaults, sometimes prints partial garbage, and sometimes returns wrong answers.
To Reproduce
For example, with the following CSV file, bug.csv
:
a,b
c,d
running the query above sometimes yields:
$ duckdb
v1.1.2 f680b7d08f
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D from read_csv('bug.csv', header=false, names=['', '']);
┌─────────┬─────────┐
│ C0 │ C1 │
│ varchar │ varchar │
├─────────┼─────────┤
│ b │ │
│ d │ │
└─────────┴─────────┘
which is just wrong. Other times, it segfaults.
If we add a few more rows to the CSV:
a,b
c,d
e,f
g,h
then we sometimes get a similar wrong answer like this:
D from read_csv('bug.csv', header=false, names=['', '']);
┌─────────┬─────────┐
│ C0 │ C1 │
│ varchar │ varchar │
├─────────┼─────────┤
│ b │ │
│ d │ │
│ f │ │
│ h │ │
└─────────┴─────────┘
And sometimes we segfault.
But sometimes we also get exciting stuff like this:
D from read_csv('bug.csv', header=false, names=['', '']);
┌─────────┬─────────┐
│ C0 │ C1 │
│ varchar │ varchar │
├─────────┼─────────┤
│ b │ │
│ d │ │
│ f │ \0\0 │
│ h │ \0\0 │
└─────────┴─────────┘
or this:
D from read_csv('bug.csv', header=false, names=['', '']);
┌─────────┬─────────┐
│ C0 │ C1 │
│ varchar │ varchar │
├─────────┼─────────┤
│ b │ │
│ d │ │
│ f │ f │
│ h │ h │
└─────────┴─────────┘
OS:
macOS 13.7
DuckDB Version:
v1.1.2 f680b7d
DuckDB Client:
CLI
Hardware:
Apple M1
Full Name:
James Wilcox
Affiliation:
University of Washington
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have