Skip to content

CSV files are no longer imported correctly when double quotes inside strings are delimited with backslashes #1812

@bjoernross

Description

@bjoernross

Expected Behavior

Gephi 0.9.2 should be able to import CSV files in which double quotes inside strings are escaped with backslashes (see below for an example). This worked in 0.9.1 but no longer does.

Also, an error message should be shown when an IOException occurs while reading a CSV file.

Current Behavior

Since 0.9.2, CSV files are only imported correctly when double quotes inside strings are escaped with double quotes. When another character is used, the import fails, beginning with the first line to contain the escaped quotes.

No error message is shown. The rest of the file is simply ignored. This means that when importing a file with 100,000 edges you may end up with 10,000 and not know why.

Possible Solution

When importing a CSV file, let the user set the escape character, just like the user can currently set the field delimiter, or auto-detect.

Pass this parameter to CSVFormat’s withEscape function.

Steps to Reproduce

  1. Create a file with the following content:
 "Source";"Target"
 "Dwayne \"The Rock\" Johnson";"John Cena"
  1. Data Laboratory -> Import Speadsheet -> Select File, import as Edges List
  2. No data is shown in the preview. After clicking 'Finish' Gephi reports a successful import of zero edges and zero nodes.

Context

In CSV files, some people escape quotes inside strings with backslashes (see above). Some people escape them with double quotes:

 "Source";"Target"
 "Dwayne ""The Rock"" Johnson";"John Cena"

Gephi 0.9.1 expected the former; Gephi 0.9.2 expects the latter (as does Excel). I haven't seen this change documented anywhere. It means that, for example, some scripts written for older versions of Gephi that produce CSV files for import into Gephi are now broken.

Your Environment

  • Version used: Gephi 0.9.2
  • OS: I have tested this on macOS and Windows.

Relevant part of messages.log:

SEVERE: IOException reading next record: java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
SEVERE [null]: Last record repeated 4 more times.
INFO [DefaultProcessor]: # Nodes loaded: 0
INFO [DefaultProcessor]: # Edges loaded: 0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions