I just need some clarity here on whether this concept is possible or whether i have misunderstood what is capable of crawlers.
Say 1 have a list of 100 websites/blogs and every day, my program ( i am assuming its a crawler thingy ) will crwal thru them and if there is a match for some specified phrases like “miami heat” or “lebron james”, it will proceed to download that page -> convert it to a pdf with full text/images and save that pdf.
So my questions are;
- This type of thing is possible right ? Pls note that i dont want just text snippets but i am hoping to get the entire page as if it was printed out on a piece of paper?
- This type of programs are called as crawlers right ?
- i am planning to build on code from http://phpcrawl.cuab.de/about.html
This is perfectly possible, as you are going to use phpcrawl to crawl the web pages use wkhtmltopdf to convert your html to pdf as it is