Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9141299
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T09:37:37+00:00 2026-06-17T09:37:37+00:00

I have a web crawler and the whole web to crawl. what should be

  • 0

I have a web crawler and the whole web to crawl.
what should be my strategy? what kind of classification algorithms should i use ?

I am saying i have a web crawler , i din mean manually crawling the web .

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T09:37:37+00:00Added an answer on June 17, 2026 at 9:37 am

    You can try and classify each page you crawl and determine if it is a restaurant or not (binary classifier) and use supervised learning.

    You can use the Bag of Words model for it – which means, use the words as “features” and their existence (and number of occurances) determines the value of the feature.

    You will also need to first manually label a set of pages and determine for them if they are a restaurant page or not. The data you generate is called your training set.

    Note that the bag of words model tend to have a huge feature space – so you are going to need a classifier that is not sensitive to non informative features.

    You can later use cross-validation to estimate how good your model is.

    Here are some suggestions I found useful when classifying data using the bag of words model:

    • SVM tends to be very useful and yield very good results for the Bag of Words model. I did not see significance different between the performance of linear kernel and gaussian kernel.
    • Use stemming and filter stop words – you don’t need the noise it generates.
    • Use bi-grams, they are very informative and at least for me – tend to increase the accuracy of the classifier significantly.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written myself a web crawler using simplehtmldom, and have got the crawl
I would like to use a web crawler and crawl a particular website. The
Which database engine should I use for a web crawler, InnoDB or MYiSAM? I
I have a web crawler that looks for specific information I want and returns
I have following situation: String a = A Web crawler is a computer program
I have web service which i want to use to upload image to the
I just downloaded Scrapy (web crawler) on Windows 32 and have just created a
I have a MyISAM table with ~50'000'000 records (tasks for web crawler): CREATE TABLE
I have written a web crawler in Java, and I am using Berkeley DB
I have developed a web crawler which crawls with Start URL as seed parameter.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.