Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6148511
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T19:17:42+00:00 2026-05-23T19:17:42+00:00

I have a network storage device that contains a few hundred thousand mp3 files,

  • 0

I have a network storage device that contains a few hundred thousand mp3 files, organized by [artist]/[album] hierarchy. I need to identify newly added artist folders and/or newly added album folders programmatically on demand (not monitoring, but by request).

Our dev server is Windows-based, the production server will be FreeBSD. A cross-platform solution is optimal because the production server may not always be *nix, and I’d like to spend as little time on reconciling the (unavoidable) differences between the dev and production server as possible.

I have a working proof-of-concept that is Windows platform-dependent: using a Scripting.FileSystemObject COM object I am iterating through all top-level (artist) directories and checking the size of the directory. If there is a change, then the directory is further explored to find new album folders. As the directories are iterated, the path and file size is collected into an array, which I write serialized into a file for next time. This array is used on a subsequent call, both to identify changed artist directories (new album added) as well as identifying completely new artist directories.

This feels convoluted, and as I mentioned it is platform-dependent. To boil it down, my goals are:

  • Identify new top-tier directories
  • Identify new second-tier directories
  • Identify new loose files within the top-tier directories

Execution time is not a concern here, and security is not an obstacle: this is an internal-only project using only intranet assets, so we can do whatever has to be done to facilitate the desired end result.

Here’s my working proof-of-concept:

    // read the cached list of artist folders
    $folder_list_cache_file = 'seartistfolderlist.pctf';
    $fh = fopen($folder_list_cache_file, 'r');
    $folder_list_cache = fread($fh, filesize($folder_list_cache_file));
    fclose($fh);

    if (!$folder_list_cache)
        $folder_list_cache = '';

    $folder_list_cache = unserialize($folder_list_cache);
    if (!is_array($folder_list_cache))
        $folder_list_cache = array();

    // container arrays
    $found_artist_folders = array();
    $newly_found_artist_folders = array();
    $changed_artist_folders = array();

    $filesystem = new COM('Scripting.FileSystemObject');

    $dir = "//network_path_to_folders/";
    if ($handle = opendir($dir)) {
        // loop the directories
        while (false !== ($file = readdir($handle))) {
            // skip non-entities
            if ($file == '.' || $file == '..')
                continue;

            // make a key-friendly version of the artist name, skip invalids
            // ie 10000-maniacs
            $file_t = trim(post_slug($file));
            if (strlen($file_t) < 1)
                continue;

            // build the full path
            $pth = $dir.$file;

            // skip loose top-level files
            if (!is_dir($pth))
                continue;

            // attempt to get the size of the directory
            $size = 'ERR';
            try {
                $f = $filesystem->getfolder($pth);
                $size = $f->Size();
            } catch (Exception $e) {
                /* failed to get size */
            }

            // if the artist is not known, they are newly added
            if (!array_key_exists($file_t, $folder_list_cache)) {
                $newly_found_artist_folders[$file_t] = $file;
            } elseif (array_key_exists($file_t, $folder_list_cache) && $size != $folder_list_cache[$file_t]['size']) {
                // if the artist is known but the size is different, a new album is added
                $changed_artist_folders[] = $file;
            }

            // build a list of everything, along with file size to write into the cache file
            $found_artist_folders[$file_t] = array (
                'path'=>$file,
                'size'=>$size
            );
        }
        closedir($handle);
    }

// write the list to a file for next time
    $fh = fopen($folder_list_cache_file, 'w') or die("can't open file");
    fwrite($fh, serialize($found_artist_folders));
    fclose($fh);

     // deal with discovered additions and changes....

Another thing to mention: because these are MP3s, the sizes I’m dealing with are big. So big, in fact, that I have to watch out for PHP’s limitation on unsized integers. The drive is currently at 90% utilization of 1.7TB (yes, SATA in RAID), a new set of multi-TB drives will be added soon only to be filled up in short order.

EDIT

I did not mention the database because I thought it would be a needless detail, but there IS a database. This script is seeking new additions to the digital portion of our music library; at the end of the code where it says “deal with discovered additions and changes”, it is reading ID3 tags and doing Amazon lookups, then adding the new stuff to a database table. Someone will come along and review the new additions and screen the data, then it will be added it to the “official” database of albums available for play. Many of the songs we’re dealing with are by local artists, so the ID3 and Amazon lookups don’t give the track titles, album name, etc. In that case, the human intervention is critical to fill in the missing data.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T19:17:42+00:00Added an answer on May 23, 2026 at 7:17 pm

    Simplest thing for the BSD-side is a find script that simply looks for inodes with a ctime greater than the last time it ran.

    Leave a sentinel file somewhere to ‘store’ the last run time, which you can do with a simple

    touch /tmp/find_sentinel
    

    and then

    find /top/of/mp3/tree --cnewer /tmp/find_sentinel
    

    which will produce a list of files/directory which have been changed since the find_sentinel file was touched. Running this via cron will get you regular updates, and the script doing the find can them digest the returned file data into your database for processing.

    You could accomplish something similar on the Windows-side with Cygwin, which’d provide an identical ‘find’ app.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know that my network printer (big Konica Minolta) have own memory and storage
I have a network C++ program in Windows that I'd like to test for
I have a network of blogs that I want to be able to do
I recently came across a problem for image file storage in network. I have
I have a simple document storage database that allows people to upload various types
I've installed ArchLinux with VB on MacOS. I have network problems, I cannot resolve
You have multiple network adapters. Bind a UDP socket to an local port, without
I have multiple Network Interface Cards on my computer, each with its own IP
I have a network software which uses UDP to communicate with other instances of
I'm designing an application which will have a network interface for feeding out large

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.