I’ve got a directory with a few thousand files in it, named things like:

Question

0

Asked: June 12, 20262026-06-12T08:51:47+00:00 2026-06-12T08:51:47+00:00

I’ve got a directory with a few thousand files in it, named things like:

0

I’ve got a directory with a few thousand files in it, named things like:

filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.

Most of the files with bracketed numbers are duplicates of the original, but in some cases they’re not.

How can I keep my original files, delete the duplicates, but not lose the files that are different?

I know that I could rm *\).ext, but that obviously doesn’t make sure that files match the original.

I’m using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).

And I don’t know what to write in the script. I was thinking of storing things in an array with this:

awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'

But that always seems to delete my original.

What am I doing wrong? How do I do it right?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T08:51:49+00:00

Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that’s seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.

Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.

I wouldn’t do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.

Instead:

$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done

You can expand this to check MD5’s along the way. But it’s more code, so I’ll break it into multiple lines, in a script:

#!/bin/bash

shopt -s nullglob              # Show nothing if a fileglob matches no files

for file in *[^\)].ext; do
  md5=$(md5 -q "$file")        # The -q option gives you only the message digest
  echo "-- Found: $file ($md5)"
  for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
     if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
        rm -v "$duplicate"
     fi
  done
done

As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:

#!/bin/bash

shopt -s nullglob

for file in *[^\)].ext; do
  for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
    if cmp "$file" "$duplicate"; then
      rm -v "$file"
    fi
  done
done

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a directory with a few thousand files in it, named things like:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply