Since I need to do many traversals of directories, which some complex filtering, I thought to create a wrapper around os.walk.
Which is something like this:
def fwalk(root, pred_dir, pred_files, walk_function=walk):
"""Wrapper function around the standard os.walk, that filter out
the directories visited using a filtering predicate
"""
for base, dirs, files in walk_function(root):
# ignore also the root directory when not needed, which is
# actually more important than the subdirectories
dirs = [d for d in dirs if pred_dir(path.join(base, d))]
files = [f for f in files if pred_files(path.join(base, f))]
if _ignore_dirs_predicate(base) and (dirs or files):
yield base, dirs, files
Basically it behaves as os.walk, but takes two predicates to make it a bit nicer to compose in higher-level functions.
For example this will only go through the python modules:
ISA_PY = lambda f: f[-3:] == '.py'
# I can make it a class or maybe even a module if it's better
def walk_py(src):
# should not be in the list
return fwalk(src, _ignore_dirs_predicate, ISA_PY)
It also takes a walk function which for example can be just a dummy walk, used for testing.
def dummy_walk(_):
test_dir = [
('/root/', ['d1, .git'], []),
('/root/d1', [], ['setup.py']),
('/root/test', [], ['test1.py']),
('/root/.git', [], [])
]
# returns a function which skips the parameter and return the iterator
return iter(test_dir)
The problem now is that I find it very hard to trust this function, apart from the some unit testing using the dummy walk is quite hard to make sure it’s correct.
Any suggestion about how I can improve this and make it nicer?
you need to modify dirs in place in order to avoid recursive traversal of the removed directories. Use:
this will remove the need to check
_ignore_dirs_predicate(base)(and remove theNameErrorcaused by the use of_ignore_dirs_predicateinstead ofpred_dir)You should also rewrite
ISA_PYto usestr.endswith()