If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the directory?
Here I’m assuming one has enough RAM to return the (potentially huge) list.
PS I’m having problems inlining code in a comment, so I’ll put some examples in here.
def list_dirs_list():
# list version
return glob.glob(/some/path/*)
def list_dirs_iter():
# iterator version
return glob.iglob(/some/path/*)
Behind the scenes both calls to glob use os.listdir so it would seem they are equivalent performance-wise. But this Python doc seems to imply glob.iglob is faster.
It depends on how you’re doing the directory listing. Most mechanisms in Python pull the entire directory listing into a list; if doing it that way then even a single yield is a waste. If using
opendir(3)then it’s probably a random number, according to XKCD’s definition of “random”.