I have a highly structured hierarchical directory containing multiple files that need to be

Question

0

Asked: June 5, 20262026-06-05T20:33:39+00:00 2026-06-05T20:33:39+00:00

I have a highly structured hierarchical directory containing multiple files that need to be

0

I have a highly structured hierarchical directory containing multiple files that need to be moved into a flat structure and renamed at the same time. The original path and name must be logged along with the new path and name and eventually loaded into a database. Finally, each renamed file must get a unique, unguessable (IE: encrypted or hashed) file name. When the renamed file is moved into the new directory structure, I also want to limit the # of files in each directory, so each directory would be created with a sequential number for its name and then the files would be loaded into it until a maximum number of files was reached (eg: 255) before rolling into a new directory with the next sequential number for its name.

Is there a tool / software that does this? I did some initial research and nothing came up with the following criteria:

batch rename & copy into alternative (flatter) structure
hash / encrypt filename and ensure uniqueness
sequentially name folders and limit file count
log each file’s original name and path, and new (encrypted) name and path

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T20:33:40+00:00

I have several Bash scripts I have used in the past to migrate hand-made file repositories to hashed repositories to be accessed and managed from a web application (mostly PHP apps). In these repositories filenames are hashed (to avoid collisions with files with the same content/name) and files are distributed evenly (in a deterministic fashion or randomly) to keep files-per-dir count low for performance reasons. The following is one fully-working example:

#!/bin/bash

MAXFILESPERDIR=500
TARGETROOTDIR="./newrepository"
RANDOMDISTRIBUTION=1

if [ -d "$1" ]; then
  LOGFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.log
  SQLFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.sql
  SOURCEDIR="$1"
  TOTALSOURCEFILES=$(find "$1" -type f | wc -l)
  let "TOTALTARGETDIRS=$TOTALSOURCEFILES / $MAXFILESPERDIR"
  PADLENTARGETDIRS=${#TOTALTARGETDIRS}
  PADLENTARGETFILE=${#TOTALSOURCEFILES}
  echo "We will create $TOTALTARGETDIRS directories to hold $MAXFILESPERDIR files per directory."
  if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
    echo "We will rename and distribute each file randomly."
  else
    echo "We will rename and distribute each file uniformly."
  fi
  echo "Do you want to continue?"
  select choice in yes no ; do
    if [ "$choice" == "yes" ] ; then
      COUNTER=1
      find "$1" -type f | while read SOURCEFILE ; do { 
        CHECKSUMFILE=$(sha1sum "$SOURCEFILE" | cut -d " " -f 1)
        CHECKSUMNAME=$(echo "$SOURCEFILE" | sha1sum | cut -d " " -f 1)
        DETERMINISTICNONCE=$(printf "%0${PADLENTARGETFILE}d\n" $COUNTER)
        if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
          PROBABILISTICNONCE=$(let "XX=$RANDOM % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
        else
          PROBABILISTICNONCE=$(let "XX=$COUNTER % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
        fi
        FILEDATE=$(stat -c %z "$SOURCEFILE" | cut -d "." -f 1)
        FILESIZE=$(stat -c %s "$SOURCEFILE")
        echo "Source file $SOURCEFILE" >> $LOGFILE
        echo "Target file $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE" >> $LOGFILE
        echo "INSERT INTO files (Filename, Location, Checksum, CDate, Size) VALUES ('$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE', '$PROBABILISTICNONCE', '$CHECKSUMFILE', '$FILEDATE', $FILESIZE);" >> $SQLFILE
        mkdir -p $TARGETROOTDIR/$PROBABILISTICNONCE
        cp -v "$SOURCEFILE" $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE
        let "COUNTER+=1"
      } ; done
      echo "Done."
      echo
      break
    fi
    if [ "$choice" == "no" ] ; then
      echo
      echo "Operation cancelled"
      echo
      break
    fi
  done
else 
  echo
  echo "Missing source directory"
  echo
fi

Just run it from the root of your new repository. You can configure it modifying the first variables: MAXFILESPERDIR defines how many files to store per-directory, TARGETROOTDIR is the name of the first-level directory to create the first level directory (it uses only two levels, the first one is really a single root), and RANDOMDISTRIBUTION defines if the files will be distributed randomly (it may look uneven, specially for small runs) or deterministically (just counting).

How it works (FYI, just in case this is not what you are looking for but maybe you can get some ideas):

Count the source files.
Calculate how many target directories will create.
Ask for confirmation.
For each file:
- Calculate the SHA1 hash for the file content.
- Create a deterministic nonce.
- Create a probabilistic nonce (if RANDOMDISTRIBUTION is 1, otherwise just a counter).
- Get the size and modification date.
- Combine the values of the random value with the hash and the counter to get the new file name (the path will be the random value).
- Log the source and target full paths.
- Create and log a SQL insert query.
- Create the target directory (if it does not exist).
- Copy the file. (You can move it if you want but I’m playing safe).
Finish

If you set RANDOMDISTRIBUTION to 1 and run the script several times, you’ll get duplicates of your source files, as each file will get different target filename/path each time you run it. If RANDOMDISTRIBUTION is set to something else, everytime you run the script the files will be renamed the same way (for the same file set, if you add or remove files, they will get different names/paths).

The objective of using a random value + hash + counter is to be sure we can handle duplicates (won’t collide thanks to the counter) while still distributing the files randomly (for long enough runs, this will distribute the files evenly).

Also, the preffix of the generated file name is the name of the directory too, so that if you have the file name and the directory name length, you can calculate the directory name (just in case you don’t store that in your database table).

Finally, this is a one-time migration script, it was not really written to be executed regularly over the same set of files.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a highly structured hierarchical directory containing multiple files that need to be

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply