I have a CSV-like text file that has about 1000 lines. Between each record in the file is a long series of dashes. The records generally end with a \n, but sometimes there is an extra \n before the end of the record. Simplified example:
"1x", "1y", "Hi there"
-------------------------------
"2x", "2y", "Hello - I'm lost"
-------------------------------
"3x", "3y", "How ya
doing?"
-------------------------------
I want to replace the extra \n’s with spaces, i.e. concatenate the lines between the dashes. I thought I would be able to do this (Python 2.5):
text = open("thefile.txt", "r").read()
better_text = re.sub(r'\n(?!\-)', ' ', text)
but that seems to replace every \n, not just the ones that are not followed by a dash. What am I doing wrong?
I am asking this question in an attempt to improve my own regex skills and understand the mistakes that I made. The end goal is to generate a text file in a format that is usable by a specific VBA for Word macro that generates a styled Word document which will then be digested by a Word-friendly CMS.
You need to exclude the line breaks at the end of the separating lines. Try this:
This regular expression uses a negative look-behind assertion to exclude
\nthat’s preceeded by an-.