I find myself using python for a lot of file management scripts as the one below. While looking for examples on the net I am surprised about how little logging and exception handling is featured on the examples. Every time I write a new script my intention is not to end up as the one below but if it deals with files then no matter what my paranoia takes over and the end result is nothing like the examples I see on the net. As I am a newbie I would like to known if this is normal or not. If not then how do you deal with the unknowns and the fears of deleting valuable info?
def flatten_dir(dirname):
'''Flattens a given root directory by moving all files from its sub-directories and nested
sub-directories into the root directory and then deletes all sub-directories and nested
sub-directories. Creates a backup directory preserving the original structure of the root
directory and restores this in case of errors.
'''
RESTORE_BACKUP = False
log.info('processing directory "%s"' % dirname)
backup_dirname = str(uuid.uuid4())
try:
shutil.copytree(dirname, backup_dirname)
log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))
except shutil.Error:
log.error('shutil.Error: Error while trying to back up the directory')
sys.stderr.write('the program is terminating with an error\n')
sys.stderr.write('press consult the log file\n')
sys.stderr.flush()
time.sleep(0.25)
print 'Press any key to quit this program.'
msvcrt.getch()
sys.exit()
for root, dirs, files in os.walk(dirname, topdown=False):
log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))
if root != dirname:
for file in files:
full_filename = os.path.join(root, file)
try:
shutil.move(full_filename, dirname)
log.debug('"%s" copied to directory "%s"' % (file,dirname))
except shutil.Error:
RESTORE_BACKUP = True
log.error('file "%s" could not be copied to directory "%s"' % (file,dirname))
log.error('flagging directory "%s" for reset' % dirname)
if not RESTORE_BACKUP:
try:
shutil.rmtree(root)
log.debug('directory "%s" deleted' % root)
except shutil.Error:
RESTORE_BACKUP = True
log.error('directory "%s" could not be deleted' % root)
log.error('flagging directory "%s" for reset' % dirname)
if RESTORE_BACKUP:
break
if RESTORE_BACKUP:
RESTORE_FAIL = False
try:
shutil.rmtree(dirname)
except shutil.Error:
log.error('modified directory "%s" could not be deleted' % dirname)
log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)
RESTORE_FAIL = True
if not RESTORE_FAIL:
try:
os.renames(backup_dirname, dirname)
log.debug('back up of directory "%s" restored' % dirname)
print '>'
print '>******WARNING******'
print '>There was an error while trying to flatten directory "%s"' % dirname
print '>back up of directory "%s" restored' % dirname
print '>******WARNING******'
print '>'
except WindowsError:
log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)
log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))
print '>'
print '>******WARNING******'
print '>There was an error while trying to flatten directory "%s"' % dirname
print '>back up of directory "%s" was NOT restored successfully' % dirname
print '>no information is lost'
print '>check the log file for information on manually restoring the directory'
print '>******WARNING******'
print '>'
else:
try:
shutil.rmtree(backup_dirname)
log.debug('back up of directory "%s" deleted' % dirname)
log.info('directory "%s" successfully processed' % dirname)
print '>directory "%s" successfully processed' % dirname
except shutil.Error:
log.error('backup directory "%s" could not be deleted' % backup_dirname)
log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)
print '>'
print '>******WARNING******'
print '>directory "%s" successfully processed' % dirname
print '>cleanup of backup directory "%s" failed' % backup_dirname
print '>manual cleanup necessary'
print '>******WARNING******'
print '>'
Learning to let go (or how I learned to live with the bomb)…
Ask yourself this: what exactly are you afraid of, and how will you handle it if it happens? In the example that you provide you want to avoid data-loss. The way that you’ve handled it is by looking for every combination of conditions that you think is an error and placing huge amounts of logging over it. Things will still go wrong and it’s not clear that having a large amount of logging will be a good way to deal with it. Sketching out what you are trying to achieve:
So what kind of things could go wrong in this process? Well, there are many ways in which the move file operations could barf due to the underlying file-system. Can we list them all and provide nice ways to deal with them? No… but in general you are going to deal with them all the same way. Sometimes an error is just an error regardless of what it is.
So in this case if any error occurs then you want to abort and undo any changes. The way that you’ve decided to do that is by creating a backup copy and restoring it when something goes wrong. But your most likely error is the file-system being full in which case these steps are likely to fail…. Ok, so it is a common enough problem – if you are worried about unknown errors at any point how do you stop your restoration path from going wrong?
The general answer is make sure that you do any intermediate work first, and then take a single troublesome (hopefully atomic) step. In your case you need to flip your recovery around. Instead of building a copy as a backup, build a copy of the result. If everything succeeds you can then swap the new result in for the old original tree. Or, if you are really paranoid you can leave that step for a human. The advantage here is that if something goes wrong you can just abort and throw away the partial state that you have constructed.
Your structure then becomes :
By the way, there is a bug in your current script that this psuedo-code makes more obvious: if you have files with identical names in different branches they will overwrite each other in the new flattened version.
The second point about this psuedo code is that all of the error handling is in the same place (ie wrap the make new directory and recursive copy inside a single try block and catch all the errors after it), this solves your original issue about the large ratio of logging / error-checking to actual work code.