Python noob… please be gentle. In my current program, I have a list of 3 files which may or may not reside in my current directory. If they do reside in my directory, I want to be able to assign them values to be later used in other functions. If the file does not reside in the directory, it should not be assigned values as the file does not exist anyway. The code I have so far is below:
import os, csv
def chkifexists():
files = ['A.csv', 'B.csv', 'C.csv']
for fname in files:
if os.path.isfile(fname):
if fname == "A.csv":
hashcolumn = 7
filepathNum = 5
elif fname == "B.csv":
hashcolumn = 15
filepathNum = 5
elif fname == "C.csv":
hashcolumn = 1
filepathNum = 0
return fname, hashcolumn, filepathNum
def removedupes(infile, outfile, hashcolumn):
fname, hashcolumn, filepathNum = chkifexists()
r1 = file(infile, 'rb')
r2 = csv.reader(r1)
w1 = file(outfile, 'wb')
w2 = csv.writer(w1)
hashes = set()
for row in r2:
if row[hashcolumn] =="":
w2.writerow(row)
hashes.add(row[hashcolumn])
if row[hashcolumn] not in hashes:
w2.writerow(row)
hashes.add(row[hashcolumn])
w1.close()
r1.close()
def bakcount(origfile1, origfile2):
'''This function creates a .bak file of the original and does a row count to determine
the number of rows removed'''
os.rename(origfile1, origfile1+".bak")
count1 = len(open(origfile1+".bak").readlines())
#print count1
os.rename(origfile2, origfile1)
count2 = len(open(origfile1).readlines())
#print count2
print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"
def CleanAndPrettify():
print "Removing duplicate rows from input files..."
fname, hashcolumn, filepathNum = chkifexists()
removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
CleanAndPrettify()
The problem I am running into is that the code runs through the list and stops at the first valid file it finds.
I’m not sure if I’m completely thinking of it in the wrong way but I thought I was doing it right.
Current output of this program with A.csv, B.csv, and C.csv present in the same directory:
Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
The Desired output should be:
Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!
…and then continue on with the next portion of creating the .bak files.
The output of this program without any CSV files in the same directory:
UnboundLocalError: local variable 'hashcolumn' referenced before assignment
You have a couple problems in your code.
First,
chkifexistsisreturning as soon as it finds an existing file, so it never checks any remaining names; also, if no files are found then the hashcolumn and filepathNum are never set — giving you theUnboundLocalError.Second, you are calling
chkifexistsin two places — fromremovedupesand fromCleanAndPrettify. Soremovedupeswill run for every existing file for every existing file — not what you want! In fact, sinceCleanAndPrettifyhas just verified the file existsremovedupesshould just go with whatever is handed to it.There are at least three ways to handle the case where no files are found: have
chkifexistsraise an exception; have a flag inCleanAndPrettifythat tracks if files were found; or turn the results ofchkifexistsinto alistwhich you can then check for emptiness.In the modified code I moved the files into a dictionary with the name as the key and the value as a tuple of
hashcolumnandfilepathNum.chkifexistsnow accepts the filenames to look for as a dictionary, andyields the values when a file is found; if no files were found, aNoFilesFoundexception will be raised.Here’s the code:
Unable to test as I don’t have the
A,B, andC.csv files, but hopefully this will get you pointed in the right direction. As you can see, theraise NoFilesFoundoption uses the flag method to keep track of files not being found; here is thelistmethod: