I am unable to create a dataframe which has escaped quotes when using read_csv.
(Note: R’s read.csv works as expected.)
My code:
import pandas as pd
pd.read_csv('data.csv')
#error!
CParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3
data.csv
SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv på hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \"Bergslagen\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
How can I read this csv and avoid this error?
My guess is that pandas is using some regular expressions which cannot handle the ambiguity and trips on the third row, or more specifically: \"Bergslagen\".
It does work, but you have to indicate the escape character for the embedded quotes:
see this gist.