I am writing a php code that uses regex to get all the links from a page and I need to transform it to get the links from entire website.
I guess the extracted urls should be checked again and so on, so that the script will access all the urls of it, not only the one given page.
I know that anything is possible, but how about this? Thank you for your guidance.
So, your regex grabs all the links. You cycle through a loop of those links, grab each with cURL, run that through your regex, wash, rinse, repeat.
Might want to make sure to put some sort of URL depth counter in there, lest you end up parsing The Internet.
Might also want to make sure you don’t re-check links you’ve already followed, lest you end up at the end of Infinite Recursion Street.
Might also want to look at threading, lest it take 100,000 years.