I am having trouble replacing three commas with one comma in a text file of data.
I am processing a large text file to put it into comma delimited format so I can query it using a database.
I do the following at the command prompt and it works:
>>> import re
>>> line = 'one,,,two'
>>> line=re.sub(',+',',',line)
>>> print line
one,two
>>>
following below is my actual code:
with open("dmis8.txt", "r") as ifp:
with open("dmis7.txt", "w") as ofp:
for line in ifp:
#join lines by removing a line ending.
line=re.sub('(?m)(MM/ANGDEC)[\r\n]+$','',line)
#various replacements of text with nothing. This removes the text
line=re.sub('IDENTIFIER','',line)
line=re.sub('PART','50-1437',line)
line=re.sub('Eval','',line)
line=re.sub('Feat','',line)
line=re.sub('=','',line)
#line=re.sub('r"++++"','',line)
line=re.sub('r"----|"',' ',line)
line=re.sub('Nom','',line)
line=re.sub('Act',' ',line)
line=re.sub('Dev','',line)
line=re.sub('LwTol','',line)
line=re.sub('UpTol','',line)
line=re.sub(':','',line)
line=re.sub('(?m)(Trend)[\r\n]*$',' ',line)
#Remove spaces replace with semicolon
line=re.sub('[ \v\t\f]+', ',', line)
#no worky line=re.sub(r",,,",',',line)
line=re.sub(',+',',',line)
#line=line.replace(",+", ",")
#line=line.replace(",,,", ",")
ofp.write(line)
This is what i get from the code above:
There are several commas together. I don’t understand why they won’t get replaced down to one comma.
Never mind that I don’t see how the extra commas got there in the first place.
50-1437,d
2012/05/01
00/08/27
232_PD_1_DIA,PED_HL1_CR,,,12.482,12.478,-0.004,-0.021,0.020,----|++++
232_PD_2_DIA_TOP,PED_HL2_TOP,,12.482,12.483,0.001,-0.021,0.020,----|++++
232_PD_2_DIA,PED_HL2_CR,,12.482,12.477,-0.005,-0.021,0.020,----|++++
232_PD_2_DIA_BOT,PED_HL2_BOT,,12.482,12.470,-0.012,-0.021,0.020,--|--++++
raw data for reference:
PART IDENTIFIER : d
2012/05/01
00/08/27
232_PD_1_DIA Eval Feat = PED_HL1_CR MM/ANGDEC
Nom Act Dev LwTol UpTol Trend
12.482 12.478 -0.004 -0.021 0.020 ----|++++
232_PD_2_DIA_TOP Eval Feat = PED_HL2_TOP MM/ANGDEC
12.482 12.483 0.001 -0.021 0.020 ----|++++
232_PD_2_DIA Eval Feat = PED_HL2_CR MM/ANGDEC
12.482 12.477 -0.005 -0.021 0.020 ----|++++
Can someone kindly point what I am doing wrong?
thanks in advance…
Your regex is working fine. The problem is that it you concatenate the lines (by
write()ing them) after you scrub them with your regex.Instead, use
"".join()on all of your lines, runre.sub()on the whole thing, and thenwrite()it all to the file at once.