I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than ‘rU’.
If I do open the file in the ‘rU’ mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.
I want to remove the newline altogether. How?
Note that, as the docs say:
So, you can always stick a filter on the file before handing it to your
readerorDictReader. Instead of this:Do this:
That
'\r'is the Python (and C) way of spelling^M. So, this just strips all^Mcharacters out, no matter where they appear, by replacing each one with an empty string.First, if you want to modify the file before running your Python script on it, why not do that from outside of Python?
sed,tr, many text editors, etc. can all do this for you. Here’s a GNU sed example:But if you want to do it in Python, it’s not that much more verbose, and you might find it more readable, so:
First, you can’t really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).
The cross-platform version:
The less-clunky, but Unix-only, version: