Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7186217
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T18:35:06+00:00 2026-05-28T18:35:06+00:00

How can I bring google-like recrawling in my application(web or console). I need only

  • 0

How can I bring google-like recrawling in my application(web or console). I need only those pages to be recrawled which are updated after a particular date.

The LastModified header in the System.Net.WebResponse gives only the current date of the server. For example if I downloaded one page with HTTPWebRequest on 27 January 2012, and check the header for the LastModified date, it is showing the current time of the server when the page was served. In this case it is 27 January 2012 only.

Can anyone suggest any other methods?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T18:35:08+00:00Added an answer on May 28, 2026 at 6:35 pm

    First, to point out here is that what you’re trying to do is very difficult and there is a great deal of research-level papers that try to address it (I will give you links to a few of them a little later). There is no way to see if a site has changed without crawling it, although you can have shortcuts like checking the Content-Length from the response header without downloading the rest of the page. This will allow your system to save on traffic, but it won’t resolve your problem in a manner that’s really useful.

    Second, since you’re concerned about content, then Last-Modified header field will not be very useful for you and I would even go as far as to say that it will not be useful at all.

    And third, what you’re describing has somewhat conflicting requirements, because you’re interested in crawling only the pages that have updated content and that’s not exactly how Google does things (yet, you want google-like crawling). Google’s crawling is focused on providing the freshest content for the most frequently searched/visited websites. For example: Google has very little interest in frequently crawling a website that updates its content twice a day when that website has 10 visitors a day, instead Google is more interested in crawling a website that gets 10 million visitors a day even if its content updates less frequently. It may be also true that websites that update their content frequently also have a lot of visitors, but from Google’s perspective that’s not exactly relevant.


    If you have to discover new websites (coverage) and at the same time you want to have the latest content of the sites you know about (freshness), then you have conflicting goals (which is true for most crawlers, even Google). Usually what ends up happening is that when you have more coverage you have less freshness and if you have more freshness then you have less coverage. If you’re interested in balancing both, then I suggest you read the following articles:

    • Web Crawler: An Overview
    • After that, I would recommend reading Adaptive On-Line Page Importance Computation
    • And finally: Scaling to 6 Billion Pages and Beyond

    The summary of the idea is that you have to crawl a website several times (maybe several hundred times) in order for you to build up a good measure of its history. Once you have a good set of historical measures, then you use a predictive model to interpolate when will the website change again and you schedule a crawl for some time after the expected change.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

How can I bring a console application window to front in C# (especially when
I have been using the 'Web Site Administration Tool' that you can bring up
How can I bring my RubyCocoa application to the front? Basically I want to
In my Silverlight application, I can't seem to bring focus to a TextBox control.
Google/Bing didnt bring up any solution to my following problem, hope someone can help
I'd like to know how I can code a Java program that knows which
So I was thinking one way that you can bring method chaining into PHP
What does IEquatable<T> buy you, exactly? The only reason I can see it being
I would like to have a button on a web page with the following
I am wondering how sites like Google sites and shopify allow customers to create

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.