I’m trying to process a bunch of csv files in a given directory. Each time I run the script, it goes through each file in the directory (in case I’ve added new ones), and then checks agains the database to see if the file has been processed, and if so, what line to start processing from.
Problem is, the script seems to skip any file I have listed in the database table, regardless of what the status is. I’m sure I’m missing something obvious, but can’t quite piece together where my tests are going wrong.
Here’s the structure of the table:
file_processed_id | file_type | file_name | file_line | file_lines_processed | file_lines_skipped | file_status
Here’s the pertinent code:
for filename in os.listdir(path):
status = check_process_status(filename,conn)
if status != None:
if status[7] == 'completed':
pass
else:
start_line = status[3]
file_to_processed = filename
break
else:
start_line = 0
file_to_be_processed = filename
And here’s the function checking the db:
def check_process_status(f,conn):
# retrieve process status of file
cursor = conn.cursor()
cursor.execute("""SELECT *
FROM files_processed
WHERE file_type = 'faca'
AND file_name = %s
""",(f,))
row = cursor.fetchone()
if row == None:
return None # if no entry, returns null
else:
return row # returns row information
I’ve tested the db connection and everything, and if the file actually exists in the table, it returns the row information just fine. The thing I don’t get is why it’s skipping to the next file each time I run the script, no matter what the “file_status” field is set to.
Any thoughts?
Based on your comments, oughtn’t there be a
breakstatement in theelseclause afterfile_to_be_processed = filenametoo?Also note that this variable is misnamed
file_to_processeda few lines above.Also note that
status[7]will probably throw an exception, since there only appear to be 7 fields in your table. I’d guess it should bestatus[6].