Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8093717
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T20:33:39+00:00 2026-06-05T20:33:39+00:00

I have a highly structured hierarchical directory containing multiple files that need to be

  • 0

I have a highly structured hierarchical directory containing multiple files that need to be moved into a flat structure and renamed at the same time. The original path and name must be logged along with the new path and name and eventually loaded into a database. Finally, each renamed file must get a unique, unguessable (IE: encrypted or hashed) file name. When the renamed file is moved into the new directory structure, I also want to limit the # of files in each directory, so each directory would be created with a sequential number for its name and then the files would be loaded into it until a maximum number of files was reached (eg: 255) before rolling into a new directory with the next sequential number for its name.

Is there a tool / software that does this? I did some initial research and nothing came up with the following criteria:

  • batch rename & copy into alternative (flatter) structure
  • hash / encrypt filename and ensure uniqueness
  • sequentially name folders and limit file count
  • log each file’s original name and path, and new (encrypted) name and path
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T20:33:40+00:00Added an answer on June 5, 2026 at 8:33 pm

    I have several Bash scripts I have used in the past to migrate hand-made file repositories to hashed repositories to be accessed and managed from a web application (mostly PHP apps). In these repositories filenames are hashed (to avoid collisions with files with the same content/name) and files are distributed evenly (in a deterministic fashion or randomly) to keep files-per-dir count low for performance reasons. The following is one fully-working example:

    #!/bin/bash
    
    MAXFILESPERDIR=500
    TARGETROOTDIR="./newrepository"
    RANDOMDISTRIBUTION=1
    
    if [ -d "$1" ]; then
      LOGFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.log
      SQLFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.sql
      SOURCEDIR="$1"
      TOTALSOURCEFILES=$(find "$1" -type f | wc -l)
      let "TOTALTARGETDIRS=$TOTALSOURCEFILES / $MAXFILESPERDIR"
      PADLENTARGETDIRS=${#TOTALTARGETDIRS}
      PADLENTARGETFILE=${#TOTALSOURCEFILES}
      echo "We will create $TOTALTARGETDIRS directories to hold $MAXFILESPERDIR files per directory."
      if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
        echo "We will rename and distribute each file randomly."
      else
        echo "We will rename and distribute each file uniformly."
      fi
      echo "Do you want to continue?"
      select choice in yes no ; do
        if [ "$choice" == "yes" ] ; then
          COUNTER=1
          find "$1" -type f | while read SOURCEFILE ; do { 
            CHECKSUMFILE=$(sha1sum "$SOURCEFILE" | cut -d " " -f 1)
            CHECKSUMNAME=$(echo "$SOURCEFILE" | sha1sum | cut -d " " -f 1)
            DETERMINISTICNONCE=$(printf "%0${PADLENTARGETFILE}d\n" $COUNTER)
            if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
              PROBABILISTICNONCE=$(let "XX=$RANDOM % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
            else
              PROBABILISTICNONCE=$(let "XX=$COUNTER % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
            fi
            FILEDATE=$(stat -c %z "$SOURCEFILE" | cut -d "." -f 1)
            FILESIZE=$(stat -c %s "$SOURCEFILE")
            echo "Source file $SOURCEFILE" >> $LOGFILE
            echo "Target file $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE" >> $LOGFILE
            echo "INSERT INTO files (Filename, Location, Checksum, CDate, Size) VALUES ('$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE', '$PROBABILISTICNONCE', '$CHECKSUMFILE', '$FILEDATE', $FILESIZE);" >> $SQLFILE
            mkdir -p $TARGETROOTDIR/$PROBABILISTICNONCE
            cp -v "$SOURCEFILE" $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE
            let "COUNTER+=1"
          } ; done
          echo "Done."
          echo
          break
        fi
        if [ "$choice" == "no" ] ; then
          echo
          echo "Operation cancelled"
          echo
          break
        fi
      done
    else 
      echo
      echo "Missing source directory"
      echo
    fi
    

    Just run it from the root of your new repository. You can configure it modifying the first variables: MAXFILESPERDIR defines how many files to store per-directory, TARGETROOTDIR is the name of the first-level directory to create the first level directory (it uses only two levels, the first one is really a single root), and RANDOMDISTRIBUTION defines if the files will be distributed randomly (it may look uneven, specially for small runs) or deterministically (just counting).

    How it works (FYI, just in case this is not what you are looking for but maybe you can get some ideas):

    1. Count the source files.
    2. Calculate how many target directories will create.
    3. Ask for confirmation.
    4. For each file:
      • Calculate the SHA1 hash for the file content.
      • Create a deterministic nonce.
      • Create a probabilistic nonce (if RANDOMDISTRIBUTION is 1, otherwise just a counter).
      • Get the size and modification date.
      • Combine the values of the random value with the hash and the counter to get the new file name (the path will be the random value).
      • Log the source and target full paths.
      • Create and log a SQL insert query.
      • Create the target directory (if it does not exist).
      • Copy the file. (You can move it if you want but I’m playing safe).
    5. Finish

    If you set RANDOMDISTRIBUTION to 1 and run the script several times, you’ll get duplicates of your source files, as each file will get different target filename/path each time you run it. If RANDOMDISTRIBUTION is set to something else, everytime you run the script the files will be renamed the same way (for the same file set, if you add or remove files, they will get different names/paths).

    The objective of using a random value + hash + counter is to be sure we can handle duplicates (won’t collide thanks to the counter) while still distributing the files randomly (for long enough runs, this will distribute the files evenly).

    Also, the preffix of the generated file name is the name of the directory too, so that if you have the file name and the directory name length, you can calculate the directory name (just in case you don’t store that in your database table).

    Finally, this is a one-time migration script, it was not really written to be executed regularly over the same set of files.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a highly nested OBJECT TYPE structure that allows me to hold data
I have a folder structure at server side. Inside the folders there are files
I have an xml file in a flat structure. We do not control the
We have a highly specialized DAL which sits over our DB. Our apps need
I have a need to store a structure where N parents will have 1
I have an application that reads the structure of an existing PostgreSQL 9.1 database,
I'm looking for a data structure for string(UTF-8) indices that is highly optimized for
I have intranet highly user interactive ASP.net applications, which now I am going to
I have a web-based application which is very highly reliant on jquery / javascript,
Have a SomeLib.pro file that contains: CONFIG += debug TEMPLATE = lib TARGET =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.