I have a file in UTF-8, where some lines contain the U+2028 Line Separator character (http://www.fileformat.info/info/unicode/char/2028/index.htm). I don’t want it to be treated as a line break when I read lines from the file. Is there a way to exclude it from separators when I iterate over the file or use readlines()? (Besides reading the entire file into a string and then splitting by \n.) Thank you!
Share
I can’t duplicate this behaviour in python 2.5, 2.6 or 3.0 on mac os x – U+2028 is always treated as non-endline. Could you go into more detail about where you see this error?
That said, here is a subclass of the “file” class that might do what you want: