Skip to content

[R-Forge #5358] fread quoted strings not always handled properly #489

@arunsrinivasan

Description

@arunsrinivasan

Submitted by: James Sams; Assigned to: Nobody; R-Forge link

I have a file with three fields: two string fields and an integer field. In 99% of cases, the string fields aren't even quoted due to their simplicity. However, I have one line in a file that looks like:

233,"A ""EMBEDDED"" QUOTE FIELD",morechars

And fread fails to read, thinking that the second quote closes the string field and it expects a separator:

# "Expected sep (',') but '"' ends field 2 on line 828 when reading data:". 

(Actual data not used due to confidentiality concerns.)

read.csv properly interprets this as three columns:

1) 233
2) A "EMBEDDED" QUOTE FIELD
3) morechars

IME, there are two ways that CSV-type files will handle embedded quotes with backslash escape (") and by doubling them up, as is done here (""). Well, at least two unambiguous ways. Note that it isn't uncommon to see this field without the outer quotes. The reason for this, as I understand it, is that some programs will only include the outer quotes if the field contains the designated field separator. Otherwise, these programs will rely on the escaping mechanism (either backslash or doubling) to handle single or double quotes, etc. Of course, csv files aren't standardized; so, there may be other cases. Hopefully this is helpful information though.

I see several other bug reports about fread's handling of quoted fields, but this seems to be a different issue than the others. Thus the separate report. Apologies if you consider it to be a duplicate report.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions