I am using the following function to get all file sizes in a system from the target directory down.
def get_files(target):
# Get file size and modified time for all files from the target directory and down.
# Initialize files list
filelist = []
# Walk the directory structure
for root, dirs, files in os.walk(target):
# Do not walk into directories that are mount points
dirs[:] = filter(lambda dir: not os.path.ismount(os.path.join(root, dir)), dirs)
for name in files:
# Construct absolute path for files
filename = os.path.join(root, name)
# Test the path to account for broken symlinks
if os.path.exists(filename):
# File size information in bytes
size = float(os.path.getsize(filename))
# Get the modified time of the file
mtime = os.path.getmtime(filename)
# Create a tuple of filename, size, and modified time
construct = filename, size, str(datetime.datetime.fromtimestamp(mtime))
# Add the tuple to the master filelist
filelist.append(construct)
return(filelist)
How can I modify this to include a second list containing directories and the total size of the directories? I am trying to include this operation in one function to hopefully be more efficient than having to perform a second walk in a separate function to get the directory information and size.
The idea is to be able to report back with a sorted list of the top twenty largest files, and a second sorted list of the top ten largest directories.
Thanks for any suggestions you guys have.
I output the directories in a dictionary instead of a list, but see if you like it:
If you want the dirdict as a list of tuples, just do this: