I have a script that looks for files on a mass filer device. Every day, it searches about 250,000 files and creates a simple report that counts how many files we received, broken down by sender ($1) and by date ($11).
For efficiency purpose, I use the ‑exec with a plus-sign in my find() expression to grab filenames into groups.
find . -exec ls '{}' +| awk -F~ '{print $1"\t"$11}'|sort|uniq -c
It works fast but duplicates results like so
9632 ./Reynosa 20120607
9632 Reynosa 20120607
20328 ./Reynosa 20120608
20328 Reynosa 20120608
15354 ./Reynosa 20120609
15354 Reynosa 20120609
What am I doing wrong? Should I use basename to strip out the ./ or is there a better way?
Try
ls -dinstead ofls. From the manpage:Your current command lists the entries of the directories as well, which are then listed again by find, thus resulting in duplicates.