Hi guys
As you know checking process of web pages content is a little different from static pages or personal files on our machines because content of Dynamic web pages are changed on each request. So if we are going to use checksums to identifying changes, We’ll fail! very simple example is when site owner are use Google Ads on him website; on each request Ads are different from previous. Also if we are going to cache only on period time, also We’ll fail, because some web pages aren’t updated every years but some are every minutes (if not seconds).
So what is better approach to solve this issue? (Thanks)
UPDATE
Another option is use of LastModified http-header! but this is strong approach?
Browsers do this automatically with the help of the several caching mechanisms that HTTP provides. The two mechanisms most obviously useful for determining whether a page has changed is the concept of Entity Tags and the Last Modified HTTP header. These mechanisms allow the browser to make conditional requests to a web site, eg. fetch a page only if it has been changed.
Quoting RFC 2616 on HTTP 1.1:
The key point here is that the ETag is a cache validator. If a browser has a cached version of a page (called a resource in the RFC), it can use the ETag to determine whether the cached page is still valid, ie. if the page hasn’t changed on the server.
And about the modification date:
The key point here is that the server may know when a page has been modified, and may then inform the client.
If you open a HTTP monitor (such as Fiddler for Windows) and watch your browser communicate with web sites, you’ll see the use of these mechanisms first-hand when the browser makes conditional requests.
To specifically address your question about the Last Modified header, this header in itself won’t work for the majority of pages you’ll find. But in combination with the ETag it can get you started.