To set the context I have a directory with 200-300 files, each file ranges in size (# of lines). I pase the files and export them to a csv file. I think the last time I ran it the csv file had over 340,000 rows. On top of that the first 8 files are constantly being written to so I lose data while parsing sometimes.
Now, each file is set up like this:
DateTime Message Action ActionDetails
I have code in place to take go through all the files, parse them and then output to a csv file:
for infile in listing:
_path2 = _path + infile
f = open(_path2, 'r')
labels = ['date', 'message', 'action', 'details']
reader = csv.DictReader(f, labels, delimiter=' ', restkey='rest')
for line in reader:
if line.get('rest'):
line['details'] += ' %s' % (' '.join(line['rest']))
out_file.write(','.join([infile,line['date'], line['message'], line['action'], line['details']]) + '\n')
f.close()
out_file.close()
I was wondering what the “best” way to go about copying the first 8 files so I don’t lose data while parsing would be. By best I mean take the least amount of time as the total time to run the python script at the moment is about 35-45 seconds.
I got a little bored. Try this on for size. I didn’t actually have a chance to check if it was parsing and writting correctly but other than that I believe it should run given some info. This problem is a good opportunity to use queueing. Let me know how fast it runs!