Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7892023
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T06:45:56+00:00 2026-06-03T06:45:56+00:00

Good evening. I am looking at developing some code that will collect EXIF data

  • 0

Good evening. I am looking at developing some code that will collect EXIF data from JPEG images and store it in a MySQL database using Python v2.x The stumbling block lies in the fact that the JPEGs are scattered in a number of subdirectories and further subdirectories in a root folder so for example 200 JPEGs may be stored in root > subroot > subsubroot1 as well as a further 100 in root > subroot > subroot2. Once all images are identified, they will be scanned and their respective EXIF data abstracted before being added to a MySQL table.

At the moment I am just at the planning stage but I am just wondering, what would be the most efficient and pythonic way to carry out the recursive searching? I am looking to scan the root directory and append any new identified subdirectories to a list and then scan all subdirectory paths in the list for further subdirectories until I have a total list of all directories. This just seems to be a clumsy way though IMHO and a bit repetitive so I assume there may be a more OOP manner of carrying out this function.

Similarly, I am only looking to add new info to my MySQL table and so what would be the most efficient way to establish if an entry already exists? The filename both in the table and the JPEG file name will be its MD5 hash values. I was considering scanning through the table at the beginning of the code and placing all filenames in a set and so, before scanning a new JPEG, if an entry already exists in the set, there would be no need to extract the EXIF and move onto the next picture. Is this an efficient method however or would it be better to scan through the MySQL table when a new image is encountered? I anticipate the set method may be the most efficient however the table may potentially contain tens of millions of entries eventually and so to add the filenames for these entries into a set (volatile memory) may not be the best idea.

Thanks folks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T06:45:57+00:00Added an answer on June 3, 2026 at 6:45 am

    I would just write a function that scanned a directory for all files; if it’s a jpeg, add the full path name of the jpeg to the list of results. If it’s a directory, then immediately call the function with the newly discovered directory as an argument. If it’s another type of file, do nothing. This is a classic recursive divide-and-conquer strategy. It will break if there are loops in your directory path, for instance with symlinks — if this is a danger for you, then you’ll have to make sure you don’t traverse the same directory twice by finding the “real” non-symlinked path of each directory and recording it.

    How to avoid duplicate entries is a trickier problem and you have to consider whether you are tolerant of two differently-named files with the exact same contents (and also consider the edge cases of symlinked or multiply-hard-linked files), how new files appear in the directories you are scanning and whether you have any control over that process. One idea to speed it up would be to use os.path.getmtime(). Record the moment you start the directory traversal process. Next time around, have your recursive traversal process ignore any jpeg files with an mtime older than your recorded time. This can’t be your only method of keeping track because files modified between the start and end times of your process may or may not be recorded, so you will still have to check the database for those records (for instance using the full path, a hash on the file info or a hash on the data itself, depending on what kind of duplication you’re intolerant of) but used as a heuristic it should speed up the process greatly.

    You could theoretically load all filenames (probably paths and not filenames) from the database into memory to speed up comparison, but if there’s any danger of the table becoming very large it would be better to leave that info in the database. For instance, you could create a hash from the filename, and then simply add that to the database with a UNIQUE constraint — the database will reject any duplicate entries, you can catch the exception and go on your way. This won’t be slow if you use the aforementioned heuristic checking file mtime.

    Be sure you account for the possibility of files that may be only modified and not newly created, if that’s important for your application.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Good evening, In my app that I'm currently developing, I have a class that
Good evening, I am developing a set of Java classes so that a container
Good evening. I'm developing an android application that uses Facebook SDK. I've no problem
Good evening all! i have an Ajax call that loads 5 'echo's' from a
Good evening,someone asked me to help her with fixing some code for asp.net, she
Good evening, I wrote the code below in C to read and print from
Good evening all, I've been working on an MD5 tool in C# that takes
Good evening everyone. I am puzzling over the oddity that the jQuery Marquee plugin
Good evening, I am working on a program where some application config info is
Good evening, I have been trying to wrap my head about this, below, code

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.