Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 565803
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T12:54:37+00:00 2026-05-13T12:54:37+00:00

Attempt #2: People don’t seem to be understanding what I’m trying to do. Let

  • 0

Attempt #2:

People don’t seem to be understanding what I’m trying to do. Let me see if I can state it more clearly:

1) Reading a list of files is much faster than walking a directory.

2) So let’s have a function that walks a directory and writes the resulting list to a file. Now, in the future, if we want to get all the files in that directory we can just read this file instead of walking the dir. I call this file the index.

3) Obviously, as the filesystem changes the index file gets out of sync. To overcome this, we have a separate program that hooks into the OS in order to monitor changes to the filesystem. It writes those changes to a file called the monitor log. Immediately after we read the index file for a particular directory, we use the monitor log to apply the various changes to the index so that it reflects the current state of the directory.

Because reading files is so much cheaper than walking a directory, this should be much faster than walking for all calls after the first.

Original post:

I want a function that will recursively get all the files in any given directory and filter them according to various parameters. And I want it to be fast — like, an order of magnitude faster than simply walking the dir. And I’d prefer to do it in Python. Cross-platform is preferable, but Windows is most important.

Here’s my idea for how to go about this:

I have a function called all_files:

def all_files(dir_path, ...parms...):
    ...

The first time I call this function it will use os.walk to build a list of all the files, along with info about the files such as whether they are hidden, a symbolic link, etc. I’ll write this data to a file called “.index” in the directory. On subsequent calls to all_files, the .index file will be detected, and I will read that file rather than walking the dir.

This leaves the problem of the index getting out of sync as files are added and removed. For that I’ll have a second program that runs on startup, detects all changes to the entire filesystem, and writes them to a file called “mod_log.txt”. It detects changes via Windows signals, like the method described here. This file will contain one event per line, with each event consisting of the path affected, the type of event (create, delete, etc.), and a timestamp. The .index file will have a timestamp as well for the time it was last updated. After I read the .index file in all_files I will tail mod_log.txt and find any events that happened after the timestamp in the .index file. It will take these recent events, find any that apply to the current directory, and update the .index accordingly.

Finally, I’ll take the list of all files, filter it according to various parameters, and return the result.

What do you think of my approach? Is there a better way to do this?

Edit:

Check this code out. I’m seeing a drastic speedup from reading a cached list over a recursive walk.

import os
from os.path import join, exists
import cProfile, pstats

dir_name = "temp_dir"
index_path = ".index"

def create_test_files():
    os.mkdir(dir_name)
    index_file = open(index_path, 'w')
    for i in range(10):
        print "creating dir: ", i
        sub_dir = join(dir_name, str(i))
        os.mkdir(sub_dir)
        for i in range(100):
            file_path = join(sub_dir, str(i))
            open(file_path, 'w').close() 
            index_file.write(file_path + "\n")
    index_file.close()
#

#  0.238 seconds
def test_walk():            
    for info in os.walk("temp_dir"):
        pass

#  0.001 seconds
def test_read():
    open(index_path).readlines()

if not exists("temp_dir"):
    create_test_files()

def profile(s):
    cProfile.run(s, 'profile_results.txt')
    p = pstats.Stats('profile_results.txt')
    p.strip_dirs().sort_stats('cumulative').print_stats(10)

profile("test_walk()")
profile("test_read()")
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T12:54:37+00:00Added an answer on May 13, 2026 at 12:54 pm

    The best answer came from Michał Marczyk toward the bottom of the comment list on the initial question. He pointed out that what I’m describing is very close to the UNIX locate program. I found a Windows version here: http://locate32.net/index.php. It solved my problem.

    Edit: Actually the Everything search engine looks even better. Apparently Windows keeps journals of changes to the filesystem, and Everything uses that to keep the database up to date.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Some time ago I observed a few people trying to start an open source
This is my first attempt at reverse engineering, and really, I don't know how
I'm using Rob Conery's Massive to connect to my database, but I don't seem
Firstly, what i am trying to do is have a page that users can
probably there are a lot of people who will smile reading this question... Here's
I made a tool where people can upload photos and modify them, including desaturation,
First attempt to use this cool site - after searching for 2 hours: So
My attempt to customize the method contains of ArrayList, I used Eclipse to generate
I attempt to use webservice return POCO class generated from entity data model as
The following is my attempt at a prepare statement. It is causing the page

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.