-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Expected Behavior
Gephi 0.9.2 should be able to import CSV files in which double quotes inside strings are escaped with backslashes (see below for an example). This worked in 0.9.1 but no longer does.
Also, an error message should be shown when an IOException occurs while reading a CSV file.
Current Behavior
Since 0.9.2, CSV files are only imported correctly when double quotes inside strings are escaped with double quotes. When another character is used, the import fails, beginning with the first line to contain the escaped quotes.
No error message is shown. The rest of the file is simply ignored. This means that when importing a file with 100,000 edges you may end up with 10,000 and not know why.
Possible Solution
When importing a CSV file, let the user set the escape character, just like the user can currently set the field delimiter, or auto-detect.
Pass this parameter to CSVFormat’s withEscape function.
Steps to Reproduce
- Create a file with the following content:
"Source";"Target"
"Dwayne \"The Rock\" Johnson";"John Cena"
- Data Laboratory -> Import Speadsheet -> Select File, import as Edges List
- No data is shown in the preview. After clicking 'Finish' Gephi reports a successful import of zero edges and zero nodes.
Context
In CSV files, some people escape quotes inside strings with backslashes (see above). Some people escape them with double quotes:
"Source";"Target"
"Dwayne ""The Rock"" Johnson";"John Cena"
Gephi 0.9.1 expected the former; Gephi 0.9.2 expects the latter (as does Excel). I haven't seen this change documented anywhere. It means that, for example, some scripts written for older versions of Gephi that produce CSV files for import into Gephi are now broken.
Your Environment
- Version used: Gephi 0.9.2
- OS: I have tested this on macOS and Windows.
Relevant part of messages.log
:
SEVERE: IOException reading next record: java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
SEVERE [null]: Last record repeated 4 more times.
INFO [DefaultProcessor]: # Nodes loaded: 0
INFO [DefaultProcessor]: # Edges loaded: 0