Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8750831
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T12:56:04+00:00 2026-06-13T12:56:04+00:00

I need to walk through a folder with approximately ten thousand files. My old

  • 0

I need to walk through a folder with approximately ten thousand files. My old vbscript is very slow in handling this. Since I’ve started using Ruby and Python since then, I made a benchmark between the three scripting languages to see which would be the best fit for this job.

The results of the tests below on a subset of 4500 files on a shared network are

Python: 106 seconds
Ruby: 5 seconds
Vbscript: 124 seconds

That Vbscript would be slowest was no surprise but I can’t explain the difference between Ruby and Python. Is my test for Python not optimal? Is there a faster way to do this in Python?

The test for thumbs.db is just for the test, in reality there are more tests to do.

I needed something that checks every file on the path and doesn’t produce too much output to not disturb the timing. The results are a bit different each run but not by much.

#python2.7.0
import os

def recurse(path):
  for (path, dirs, files) in os.walk(path):
    for file in files:
      if file.lower() == "thumbs.db":
        print (path+'/'+file)

if __name__ == '__main__':
  import timeit
  path = '//server/share/folder/'
  print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1))
'vbscript5.7
set oFso = CreateObject("Scripting.FileSystemObject")
const path = "\\server\share\folder"
start = Timer
myLCfilename="thumbs.db"

sub recurse(folder)
  for each file in folder.Files
    if lCase(file.name) = myLCfilename then
      wscript.echo file
    end if
  next
  for each subfolder in folder.SubFolders
    call Recurse(subfolder)
  next
end Sub

set folder = oFso.getFolder(path)
recurse(folder)
wscript.echo Timer-start
#ruby1.9.3
require 'benchmark'

def recursive(path, bench)
  bench.report(path) do
    Dir["#{path}/**/**"].each{|file| puts file if File.basename(file).downcase == "thumbs.db"}
  end
end

path = '//server/share/folder/'
Benchmark.bm {|bench| recursive(path, bench)}

EDIT: since i suspected the print caused a delay i tested the scripts with printing all 4500 files and also printing none, the difference remains, R:5 P:107 in the first case and R:4.5 P:107 in the latter

EDIT2: based on the answers and comments here a Python version that in some cases could run faster by skipping folders

import os

def recurse(path):
  for (path, dirs, files) in os.walk(path):
    for file in files:
      if file.lower() == "thumbs.db":
        print (path+'/'+file)

def recurse2(path):
    for (path, dirs, files) in os.walk(path):
        for dir in dirs:
            if dir in ('comics'):
                dirs.remove(dir)
        for file in files:
            if file.lower() == "thumbs.db":
                print (path+'/'+file)


if __name__ == '__main__':
  import timeit
  path = 'f:/'
  print(timeit.timeit('recurse("'+path+'")', setup="from __main__ import recurse", number=1)) 
#6.20102692
  print(timeit.timeit('recurse2("'+path+'")', setup="from __main__ import recurse2", number=1)) 
#2.73848228
#ruby 5.7
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T12:56:05+00:00Added an answer on June 13, 2026 at 12:56 pm

    The Ruby implementation for Dir is in C (the file dir.c, according to this documentation). However, the Python equivalent is implemented in Python.

    It’s not surprising that Python is less performant than C, but the approach used in Python gives a little more flexibility – for example, you could skip entire subtrees named e.g. '.svn', '.git', '.hg' while traversing a directory hierarchy.

    Most of the time, the Python implementation is fast enough.

    Update: The skipping of files/subdirs doesn’t affect the traversal rate at all, but the overall time taken to process a directory tree could certainly be reduced because you avoid having to traverse potentially large subtrees of the main tree. The time saved is of course proportional to how much you skip. In your case, which looks like folders of images, it’s unlikely you would save much time (unless the images were under revision control, when skipping subtrees owned by the revision control system might have some impact).

    Additional update: Skipping folders is done by changing the dirs value in place:

    for root, dirs, files in os.walk(path):
        for skip in ('.hg', '.git', '.svn', '.bzr'):
            if skip in dirs:
                dirs.remove(skip)
            # Now process other stuff at this level, i.e.
            # in directory "root". The skipped folders
            # won't be recursed into.
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need a very basic step by step walk through on calling a webservice
Let's say I want to use this API: http://hiveminder.com/help/reference/API.html The instructions walk through its
I have a folder with ten files in it which I want to loop
Edit: This is a decade old so very likely not to be relevant to
I need a good resource or walk through for designing a form without using
I need to walk through a JDOM tree and extract all data from body
I try to walk through this tutorial Facebook Tutorial % ~/facebook-ios-sdk/scripts/build_facebook_ios_sdk_static_lib.sh My question: this
I have to walk a tree that reaches me from a NodeList, I need
Need to apply a filter to a file like this: TUPAC_0006:1:1:2554:2356#0/1 0 * 0
Need a map reduce function by mongo in php This my mongo structure [_id]

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.