I would like to check multiple websites for HTML change using PHP with some sort of database storage. I would like to know another way besides using an MD5 sum to check the change.
Also, is there any way that when a change is detected, that I can also find out what exactly was changed?
Thanks in advance SO!
You can store the Last-Modified header for that page when you crawl for the first time. The next time you crawl, you just have to check the Last-Modified header again.
If the website doesnt support this header, you can use MD5.
Detecting the change can be done using any diff package. For e.g http://www.raymondhill.net/finediff/viewdiff-ex.php