Replies: 4 comments
-
#3014 |
Beta Was this translation helpful? Give feedback.
-
The BOM in this file is not actually valid. The BOM has to be the first three bytes of the file. The first byte of the file is the quote instead. If we modify the file so the BOM is actually at the start this works as expected:
SELECT id FROM '~/Downloads/people.csv';
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘ You can also use SELECT id FROM read_csv_auto('~/Downloads/people.csv', normalize_names=True);
┌───────┐
│ id │
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘ |
Beta Was this translation helpful? Give feedback.
-
Thanks ! normalize_names works perfectly |
Beta Was this translation helpful? Give feedback.
-
I think the behaviour of how duckdb (read_csv) handles BOM should be explicitly documented. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
"The BOM (byte order mark) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text"
people.csv
The problem is that there is a "BOM" character before id.
Pandas can solve this by specifying the engine.
https://en.wikipedia.org/wiki/Byte_order_mark
Beta Was this translation helpful? Give feedback.
All reactions