Just wondering if there is any code in batch file that can find a certain text or word in a .txt file.
For Example:
- The quick brown fox jump \mark{1} over the lazy dog
- The quick \mark{10} brown fox jump over the lazy dog
- The quick brown fox jump over the lazy \mark{100} dog
- The quick brown fox jump over the lazy dog \mark{1000}
- The \mark{1} quick brown fox jump over the lazy dog
- The quick brown fox jump over the lazy dog \mark{100}
- The quick brown fox \mark{30} jump over the lazy dog
as you can see from the example above, I want to search for the “\mark{Number here}” word and also if there is any possibility that when there is an occurrence of the same word for example the first line and the fifth line, it will only display the “\mark{1}” of the first line and disregard the same word in the fifth line
so the results will be printed in a txt file will be:
- \mark{1}
- \mark{10}
- \mark{100}
- \mark{1000}
- \mark{30}
This should be relatively easy if you download a tool like sed for Windows (or perhaps grep for Windows). The Gnu project has both sed and grep for Windows for free.
It should also be relatively easy using regex capabilities of VBScript, JScript, or powershell.
But I thought I would take a stab at using native batch. FINDSTR has primitive regex support, but it cannot extract the matching text, so the batch solution is fairly complex.
The solution below can find multiple marks on one line. It is also able to count the number of appearances for each distinct mark. The SET search and replace is case insensitive, so I was forced to make this entire solution case insensitive.
The solution can only handle lines of length ~8191 bytes or less.
The performance should be good even for very large files as long as the number of lines containing marks is relatively small.
Here is the test.txt file that I used. It has a number of problem test cases that make a batch solution difficult.
And here are my results