I am trying to find entries with specific zip codes in a mailing list (CSV format).
I thought this should work but it never finds anything despite my knowing that the sought after zip codes are there.
text = open("during1.txt","r")
a = list(range(93201,93399))
b = list(range(93529,93535))
c = list(range(93601,93899))
d = list(range(95301,95399))
KFCFzip = a+b+c+d
output = open("output.txt","w")
for line in text:
array= line.strip().split(",")
print(array[6][0:5])
if array[6][0:5] in KFCFzip:
#output.write(array)
print("yes")
text.close()
output.close()
When I run the code, it finds no matches, but the print statement above the IF statement prints out values that look like they should be matches, and when I go to the Shell and type in something like
93701 in KFCFzip
It gives me back “True:, so it’s work to that extent. The file is just text separated by commas, so I can’t figure out why it can see them.
The data file has live data, so I would have to change it a bit before posting. I was wondering if anyone had any ideas that didn’t involve posting the data itself.
Because
array[6][0:5]is the string. You should convert it to the integer before looking at theKFCFziplist.Another problem with this solution is the performance.
rangecreates a list of elements so you are going to compare every “suspected” ZIP code with every possible zip code. Time complexity for this algorithm isO(n*m)wheren = len(KFCFzip)and m – number of lines in the file. Better way is to create a list of ranges like:in this case you can dramatically increase the performance.
For instance using your data you would have
197+5+297+97 = 596elements, so for each line you would have to make596/2 = 298comparisons in average. But using my algorithms you’ll have only8/2 = 4comparisons, which ~ 75 times less (read faster).