I’ve got a somewhat weird case, where a for-loop is incredibly slow when I use findstr as the string for DO.
Its worth mentioning that the file (old-file.xml) that I’m processing contains about 200 000 lines.
This part is blazing fast, but can be rendered slower if I remove | find /c ":"
rem find total number of lines in xml-file
findstr /n ^^ old-file.xml | find /c ":" > "temp-count.txt"
set /p lines=< "temp-count.txt"
The code which is slow looks like this and I can’t use the pipe trick above. It seems like the slow part is the for itself, as i’m not seeing any progress in the title bar until after 10 min.
setlocal DisableDelayedExpansion
rem start replacing wrong dates with correct date
for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do (
rem cache the value of each line in a variable
set read-line=%%j
set line=%%i
rem restore delayed expansion
setlocal EnableDelayedExpansion
rem write progress in title bar
title Processing line: !line!/%lines%
rem remove trailing line number
rem set read-line=!read-line:*:=!
for /f "usebackq" %%i in ("%tmpfile%") do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line!>>"Updated-file.xml"
rem end local
endlocal
)
EDIT:
Further investigation showed me that using this single line that should display the current line number being looped takes about 10 minutes on my 8MB file of 200 000 lines. That’s just for getting it to start displaying the lines.
for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do echo %%i
So it seems like findstr is writing screen output hidden for the user, but visible for the for-loop. How can I prevent that from happening while still getting the same results?
EDIT 2: Solution
The solution as proposed by Aacini and finally revised by me.
This is a snippet from a much bigger script. Wrong dates are retrieved in another loop. And total number of lines are also retrieved from another loop.
setlocal enabledelayedexpansion
rem this part is for snippet only, dates are generated from another loop in final script
echo 2069-04-29 > dates-tmp.txt
echo 2069-04-30 >> dates-tmp.txt
findstr /n ^^ Super-Large-File.xml > out.tmp
set tmpfile=dates-tmp.txt
set correctdate=2011-11-25
set wrong-dates=
rem hardcoded total number of lines
set lines=186442
for /F %%i in (%tmpfile%) do (
set wrong-dates=!wrong-dates! %%i
)
rem process each line in out.tmp and loop them through :ProcessLines
call :ProcessLines < out.tmp
rem when finished with above call for each line in out.tmp, goto exit
goto ProcessLinesEnd
:ProcessLines
for /L %%l in (1,1,%lines%) do (
set /P read-line=
rem write progress in title bar
title Processing line: %%l/%lines%
for %%i in (%wrong-dates%) do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line:*:=!>>"out2.tmp"
)
rem end here and continue below
goto :eof
:ProcessLinesEnd
echo this should not be printed until call has ended
:exit
exit /b
Two points here:
1- The
setlocal EnableDelayedExpansioncommand is executed with every line of the file. This means that about 200000 times the complete environment must be copied to a new local memory area. This may cause several problems.2- I suggest you to start with the most basic part. How much time takes the findstr to execute? Run
findstr /n ^^ old-file.xmlalone and check this before trying to fix any other part. If this process is fast, then add a single step to it and test again until you discover the cause of the slow down. I suggest you not use pipes norfor /fover the execution offindstr, but over the file generated by a previous redirection.EDIT A faster solution
There is another way to do this. You may pipe findstr output into a Batch subroutine, so the lines can be read with
SET /Pcommand. This method allows to process the lines entirely via delayed expansions and not via the command-line susbtitution ofFOR /F, so the pair ofsetlocal EnableDelayedExpansionandendlocalcommands are no longer necessary. However, if you still want to display the line number it is necessary to calculate it again.Also, it is faster to load the wrong dates in a variable instead of process the %tmpfile% with every line of the big file.
.
SECOND EDIT An even faster modification
Previous method may be slighlty speeded up if the loop is achieved via
for /Lcommand instead of via agoto.This modification also omit the :EOF comparison and the calculation of line number, so the time gain may be significative after repeated it 200000 times. If you use this method, don’t forget to delete the
echo :EOF>> findstr.txtline in first part.