I am trying to read a csv file using pandas and the file has a column called Tags which consist of user provided tags and has tags like – , “”, ”,1950’s, 16th-century. Since these are user provided, there are many special characters which are entered by mistake as well. The issue is that I cannot open the csv file using pandas read_csv. It shows error:Cparser, error tokenizing data. Can someone help me with reading the csv file into pandas?
Share
Okay. Starting from a badly formatted CSV we can’t read:
We can make a nicer version, taking advantage of the fact the last three columns are well-behaved:
which produces
and then we can read it:
I’d look into fixing this problem at source and getting data in a tolerable format, though. Tricks like this shouldn’t be necessary, and it would have been very easy for it to be impossible to repair.