Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1100961
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T00:57:50+00:00 2026-05-17T00:57:50+00:00

I have a database where each entry has a file path and a last

  • 0

I have a database where each entry has a file path and a last modified field:

1284581625555  C:\docs\text1.txt
1284581646992  C:\docs\text2.txt
1284581654886  C:\docs\text3.txt
1284581662927  C:\docs\subfolder\text4.txt
1284581671986  C:\docs\subfolder\text5.txt
...

Each entry also has a summary of the file contents, and the entries were created by recursively walking down a certain folder (in this case C:\docs) and adding all visited files. Now I’d like to update the database, i.e.

  • Add newly created files
  • Remove deleted files
  • Update modified files

Obviously, I have to walk down the root folder again to see what has changed. But what is the most efficient way to do so?

There are two approaches I can think of:

  • First traverse the database, remove all deleted entries and update all modified entries. For this, each time you have to create a file object from the the stored path string, and call file.exists() or file.isModified(). Then recursively walk down the root folder and add files which aren’t in the database yet.
  • First walk down the file tree and remember in a list what has been added/deleted/modified — this requires having stored a complete snapshot of the previous state of the file tree. Then traverse the database and add/delete/modify entries, based on the previously created list.

Which approach is better? Are there any other?

EDIT: Creating the summary is very expensive (full text extraction), and traversing the database is also somewhat expensive, since it is file-based.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T00:57:51+00:00Added an answer on May 17, 2026 at 12:57 am

    I would think that the easiest way to do this would be to delete and recreate the file. Depending on how difficult it is to create the “summary”, this could well be the fastest method since you don’t need to compare or edit anything.

    If the summary creation is “hard” and the database fits in memory, the easiest way to go would probably be to load the database into a dict (keyed on the filename, with data indicating whether or not the file has been “seen”) and do the os.walk again, updating the dict as necessary. Then iterate the dict, writing all entries that have been seen.

    (BTW the last modified field isn’t necessarily useful, you have to check the file’s modified time anyway so might as well compare it to the database’s timestamp.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table with entries which has a DATE field. Each entry has
I have a database table. It has several hundred entries and each entry is
I have to store millions of entries in a database. Each entry is identified
I have a table called routes within my database where each route has an
I have a database table of events. Each event has a category and a
I have a database backup running hourly between 7am and 7pm everyday, each file
I have two databases named: DB_A and DB_B . Each database has one collection
I have some hierarchical data - each entry has an id and a (nullable)
I have two Objects, Entries and Samples. Each entry has an associated set of
I have a .txt file that has a bunch of formatted data in it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.