I have millions of files in a folder (nested). I need to scan a

Question

0

Asked: June 15, 20262026-06-15T16:01:04+00:00 2026-06-15T16:01:04+00:00

I have millions of files in a folder (nested). I need to scan a

0

I have millions of files in a folder (nested). I need to scan a value from those files and print lines containing this value (say LINE_TXT). Earlier I used to sed each file but it used to take 45mins to do this. My earlier solution was something like this:

FILES=$(find $1 -type f -name 'filename.txt')
for f in $FILES
do
    if [[ "$LINE" == *LINE_TXT* ]]; then
        echo $LINE
    fi
done

I figured out that pipemill is best way to achieve this. My primary solution is something like this:

makefifo mypipe
find $1 -type f -name 'filename.txt' | xargs cat > my pipe &
while read -r LINE
do
    if [[ "$LINE" == *LINE_TXT* ]]; then
        echo $LINE
    fi
done << mypipe

Run time is 1min around. Can I improve on this further ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T16:01:05+00:00

Seems to me that less script overhead would make things faster.

fgrep -r -h 'LINE_TXT' $1

Just let grep do its own recursion through your directories with -r. And if you don’t want its output to include the filename in its output, include the -h option. You can pipe its output through whatever you need for post-processing.

If you want to search only for specific filenames, grep’s -r option has options of its own: --include and --exclude, mentioned on its man page. For example:

fgrep -h -r --include '*/filename.txt' 'LINE_TXT' $1

While the find command is excellent, and invaluable in certain situations, if you can use options built in to a single tool like grep, you will incur less overhead. The find command doesn’t look inside files, so it would still have to launch grep for each one of them. If you DID want to use find, it might look something like this:

find $1 -name 'filename.txt' -exec fgrep 'LINE_EXT' {} \;

This has the benefit of giving you access to find‘s directory searching capabilities, but if all you want to do is look for a particularly named file in your directory tree, grep’s -r --include is probably sufficient and is sure to run faster.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have millions of files in a folder (nested). I need to scan a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply