I have a large image dataset (around 3000 files). My problem is simple, I want to copy randomly chosen image files to another destination. I use random.sample to select five hundred images and store their names in a list.
I now want to copy files from the src folder to a destination folder if their name exists in the list (and was hence randomly chosen).
The following code however copies ALL files in the folder, whether or not their names appear in the randomly selected list. help
import os.path
import os
import glob
import random
import shutil
dirfiles = os.listdir("/media/Data/Leaves/Leaves")
myfiles = []
myfiles.append(random.sample(dirfiles,500))
print myfiles
final_list=myfiles[0]
print final_list
count=0
for elem in final_list:
print elem
count= count+1
print count
src = '/home/mjanja/Desktop/Leaves'
dst = '/home/mjanja/Desktop/Positive Leaves'
for filename in final_list:
for file in glob.glob( os.path.join(src,filename)):
shutil.copy(file,dst)
print "Copied file!!" +infile
Your use of glob.glob in this case is where the danger occurs. That returns an iterator of all files that fit the pattern you provide. You’re building up a list of 500 specific files, but then matching by pattern … depending on what characters are in your filename this could give you extremely surprising results, as the pattern might very well match more files than your original 500.
You are also doing some steps unnecessarily, and could wrap it all up in a function: