I have a few csv files that contain strings with line breaks in them. The files will open just fine in Excel, but when I try to read in the files with csv.DictReader(), csv.DictReader() appears to handle each line break inside the strings as a new row of data rather than ignoring the carriage breaks within strings.
What can I do to get the second test below to pass just as the first test does?
#csv contents
this, is, a, test
1,2,u'thr\nee',4
5,6,7,8
result = []
text = """this, is, a, test
1,2,u'three',4
5,6,7,8"""
b = StringIO(text)
reader = csv.DictReader(b)
for row in reader:
result.append(row)
self.assertEqual(2,len(result))
expected = [{'this': '1', ' test': '4', ' is': '2', ' a': "u'three'"}, {'this': '5', ' test': '8', ' is': '6', ' a': '7'}]
self.assertEqual(expected ,result)
#With a /n inside the string.
result = []
text = """this, is, a, test
1,2,u'thr\nee',4
5,6,7,8"""
b = StringIO(text)
reader = csv.DictReader(b)
for row in reader:
result.append(row)
self.assertEqual(2,len(result))
#expected = [{'this': '1', ' test': '4', ' is': '2', ' a': "u'thr\nee'"}, {'this': '5', ' test': '8', ' is': '6', ' a': '7'}]
#self.assertEqual(expected,result)
Assuming your csv content is properly quoted, specifying the appropriate quotechar when instanciating the reader should do:
http://docs.python.org/release/2.6.7/library/csv.html#csv.Dialect.quotechar
Else, I assume you have unix newlines (‘\n’) in your content and dos newlines (‘\r\n’) as line terminators. As of Python 2.6.7, the doc mentions that the reader is hardcoded to recognise both as line termitors whatever you specify, don’t know it if it’s the case with your Python version. If yes, you’ll have to manually preprocess (and possibly postprocess) your files to either ensure appropriate quoting or replace single ‘\n’ with something else then do the reverse after csv parsing.