I’m parsing the sourcecode of many websites, an entire huge web with thousands of pages. Now I want to search for stuff in perĺ, I want to find the number of occurrences of a keyword.
For parsing the webpages I use curl and pipe the output to “grep -c” which doesn’t work, so I want to use perl. Can be perl utilised completely to crawl a page?
E.g.
cat RawJSpiderOutput.txt | grep parsed | awk -F " " '{print $2}' | xargs -I replaceStr curl replaceStr?myPara=en | perl -lne '$c++while/myKeywordToSearchFor/g;END{print$c}'
Explanation: In the textfile above I have usable and unusable URLs. With “Grep parsed” I fetch the usable URLs. With awk I select the 2nd column with contains the pure usable URL. So far so good. Now to this question: With Curl I fetch the source (appending some parameter, too) and pipe the whole source code of each page to perl in order to count “myKeywordToSearchFor” occurrences. I would love to do this in perl only if it is possible.
Thanks!
This uses Perl only (untested):