I have a network storage device that contains a few hundred thousand mp3 files,

Question

0

Asked: May 23, 20262026-05-23T19:17:42+00:00 2026-05-23T19:17:42+00:00

I have a network storage device that contains a few hundred thousand mp3 files,

0

I have a network storage device that contains a few hundred thousand mp3 files, organized by [artist]/[album] hierarchy. I need to identify newly added artist folders and/or newly added album folders programmatically on demand (not monitoring, but by request).

Our dev server is Windows-based, the production server will be FreeBSD. A cross-platform solution is optimal because the production server may not always be *nix, and I’d like to spend as little time on reconciling the (unavoidable) differences between the dev and production server as possible.

I have a working proof-of-concept that is Windows platform-dependent: using a Scripting.FileSystemObject COM object I am iterating through all top-level (artist) directories and checking the size of the directory. If there is a change, then the directory is further explored to find new album folders. As the directories are iterated, the path and file size is collected into an array, which I write serialized into a file for next time. This array is used on a subsequent call, both to identify changed artist directories (new album added) as well as identifying completely new artist directories.

This feels convoluted, and as I mentioned it is platform-dependent. To boil it down, my goals are:

Identify new top-tier directories
Identify new second-tier directories
Identify new loose files within the top-tier directories

Execution time is not a concern here, and security is not an obstacle: this is an internal-only project using only intranet assets, so we can do whatever has to be done to facilitate the desired end result.

Here’s my working proof-of-concept:

    // read the cached list of artist folders
    $folder_list_cache_file = 'seartistfolderlist.pctf';
    $fh = fopen($folder_list_cache_file, 'r');
    $folder_list_cache = fread($fh, filesize($folder_list_cache_file));
    fclose($fh);

    if (!$folder_list_cache)
        $folder_list_cache = '';

    $folder_list_cache = unserialize($folder_list_cache);
    if (!is_array($folder_list_cache))
        $folder_list_cache = array();

    // container arrays
    $found_artist_folders = array();
    $newly_found_artist_folders = array();
    $changed_artist_folders = array();

    $filesystem = new COM('Scripting.FileSystemObject');

    $dir = "//network_path_to_folders/";
    if ($handle = opendir($dir)) {
        // loop the directories
        while (false !== ($file = readdir($handle))) {
            // skip non-entities
            if ($file == '.' || $file == '..')
                continue;

            // make a key-friendly version of the artist name, skip invalids
            // ie 10000-maniacs
            $file_t = trim(post_slug($file));
            if (strlen($file_t) < 1)
                continue;

            // build the full path
            $pth = $dir.$file;

            // skip loose top-level files
            if (!is_dir($pth))
                continue;

            // attempt to get the size of the directory
            $size = 'ERR';
            try {
                $f = $filesystem->getfolder($pth);
                $size = $f->Size();
            } catch (Exception $e) {
                /* failed to get size */
            }

            // if the artist is not known, they are newly added
            if (!array_key_exists($file_t, $folder_list_cache)) {
                $newly_found_artist_folders[$file_t] = $file;
            } elseif (array_key_exists($file_t, $folder_list_cache) && $size != $folder_list_cache[$file_t]['size']) {
                // if the artist is known but the size is different, a new album is added
                $changed_artist_folders[] = $file;
            }

            // build a list of everything, along with file size to write into the cache file
            $found_artist_folders[$file_t] = array (
                'path'=>$file,
                'size'=>$size
            );
        }
        closedir($handle);
    }

// write the list to a file for next time
    $fh = fopen($folder_list_cache_file, 'w') or die("can't open file");
    fwrite($fh, serialize($found_artist_folders));
    fclose($fh);

     // deal with discovered additions and changes....

Another thing to mention: because these are MP3s, the sizes I’m dealing with are big. So big, in fact, that I have to watch out for PHP’s limitation on unsized integers. The drive is currently at 90% utilization of 1.7TB (yes, SATA in RAID), a new set of multi-TB drives will be added soon only to be filled up in short order.

EDIT

I did not mention the database because I thought it would be a needless detail, but there IS a database. This script is seeking new additions to the digital portion of our music library; at the end of the code where it says “deal with discovered additions and changes”, it is reading ID3 tags and doing Amazon lookups, then adding the new stuff to a database table. Someone will come along and review the new additions and screen the data, then it will be added it to the “official” database of albums available for play. Many of the songs we’re dealing with are by local artists, so the ID3 and Amazon lookups don’t give the track titles, album name, etc. In that case, the human intervention is critical to fill in the missing data.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T19:17:42+00:00

Simplest thing for the BSD-side is a find script that simply looks for inodes with a ctime greater than the last time it ran.

Leave a sentinel file somewhere to ‘store’ the last run time, which you can do with a simple

touch /tmp/find_sentinel

and then

find /top/of/mp3/tree --cnewer /tmp/find_sentinel

which will produce a list of files/directory which have been changed since the find_sentinel file was touched. Running this via cron will get you regular updates, and the script doing the find can them digest the returned file data into your database for processing.

You could accomplish something similar on the Windows-side with Cygwin, which’d provide an identical ‘find’ app.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a network storage device that contains a few hundred thousand mp3 files,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply