Skip to content

fread in version 1.9.5 fails on csv that contains json with embedded double quote #1164

@richardtessier

Description

@richardtessier

fread on this csv

json1, string1
"{""f1"":""value1"",""f2"":""double quote escaped with a backslash [ \"" ]""}", "string field"

results in the following error

Error in fread("data/json.csv", verbose = TRUE, data.table = FALSE, stringsAsFactors = FALSE) : 
  Field 1 on line 2 starts with quote (") but then has a problem. It can contain balanced unescaped quoted subregions but if it does it can't contain embedded \n as well. Check for unbalanced unescaped quotes: "{""f1"":""value1"",""f2"":""double quote escaped with a backslash [ \"" ]""}", "string field"

Verbose fread output

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000000 GB.
Memory mapping ... ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 2 columns. Longest stretch was from line 1 to line 2
Starting data input on line 1 (either column names or first row of data). First 10 characters: json1, str
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 1 (including 0 at the end)
Count of sep: 2
nrow = MIN( nsep [2] / ncol [2] -1, neol [1] - nblank [0] ) = 1
Type codes (   first 5 rows): 40
Type codes: 40 (after applying colClasses and integer64)
Type codes: 40 (after applying drop or select (if supplied)
Allocating 2 column slots (2 - 0 dropped)

My sessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.5

loaded via a namespace (and not attached):
[1] chron_2.3-45 tools_3.1.2 

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions