So here is my program with some new modifications:
datafile = open('C:\\text2.txt', 'r')
completedataset = open('C:\\bigfile.txt', 'r')
smallerdataset = open('C:\\smallerdataset.txt', 'w')
matchedLines = []
for line in datafile:
splitline = line.split()
for item in splitline:
if not item.endswith("NOVA"):
if item.startswith("JJJ") or item.startswith("KOS"):
matchedLines.append( item )
counter = 1
for line in completedataset:
print counter
counter +=1
for t in matchedLines:
if t in line:
smallerdataset.write(line)
datafile.close()
completedataset.close()
smallerdataset.close()
The problem that I have now is that I want to search through the “bigfile” but at a faster rate. I would like to limit the searching of each line in bigfile to the string that occurs before the first ‘,’
I want to use something like index = aString.find(‘,’) I beleive but I’m not having much luck limiting the search of the big file to the string that occurs before the first comma.
You could change
to
This may make the program faster if
lineis very very long and the comma appears near the beginning. Or it may make the program slower if,appears near the end of theline.PS. Is every
lineguaranteed to have a comma in it? The above code acts a bit funky if there is no comma. For example,If you want to ignore lines without a comma, this might be better: