Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8653943
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T14:39:50+00:00 2026-06-12T14:39:50+00:00

I am trying to build a recommender system which would recommend webpages to the

  • 0

I am trying to build a recommender system which would recommend webpages to the user based on his actions(google search, clicks, he can also explicitly rate webpages). To get an idea the way google news does it, it displays news articles from the web on a particular topic. In technical terms that is clustering, but my aim is similar. It will be content based recommendation based on user’s action.

So my questions are:

  1. How can I possibly trawl the internet to find related web-pages?
  2. And what algorithm should I use to extract data from web-page is textual analysis and word frequency the only way to do it?
  3. Lastly what platform is best suited for this problem. I have heard of Apache mahout and it comes with some re-usable algos, does it sound like a good fit?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T14:39:51+00:00Added an answer on June 12, 2026 at 2:39 pm

    as Thomas Jungblut said, one could write several books on your questions 😉
    I will try to give you a list of brief pointers – but be aware there will be no ready-to-use off-the-shelf solution …

    1. Crawling the internet: There are plenty of toolkits for doing this, like Scrapy for Python , crawler4j and Heritrix for Java, or WWW::Robot for Perl. For extracting the actual content from web pages, have a look at boilerpipe.

      http://scrapy.org/

      http://crawler.archive.org/

      http://code.google.com/p/crawler4j/

      https://metacpan.org/module/WWW::Robot

      http://code.google.com/p/boilerpipe/

    2. First of all, often you can use collaborative filtering instead of content-based approaches. But if you want to have good coverage, especially in the long tail, there will be no way around analyzing the text. One thing to look at is topic modelling, e.g. LDA. Several LDA approaches are implemented in Mallet, Apache Mahout, and Vowpal Wabbit.
      For indexing, search, and text processing, have a look at Lucene. It is an awesome, mature piece of software.

      http://mallet.cs.umass.edu/

      http://mahout.apache.org/

      http://hunch.net/~vw/

      http://lucene.apache.org/

    3. Besides Apache Mahout which also contains things like LDA (see above), clustering, and text processing, there are also other toolkits available if you want to focus on collaborative filtering: LensKit, which is also implemented in Java, and MyMediaLite (disclaimer: I am the main author), which is implemented in C#, but also has a Java port.

      http://lenskit.grouplens.org/

      http://ismll.de/mymedialite

      https://github.com/jcnewell/MyMediaLiteJava

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying build a method which returns the shortest path from one node to
Trying to build sslsniff on a RHEL 5.2 system here. When compiling sslsniff on
Trying to build a micro-CMS (of sorts), which needs to dish out content i.e.
I'm trying to figure out the best build system for latex. Currently, I use
I'm trying to build an authentication system in my application, but I'm having some
I am trying to build a system that will be able to process a
At the moment I'm trying to build a log in system with a very
I am trying build a jQuery EasyUI datagrid or treegrid out of a large
I was trying Build For Archiving application (from Titanium Mobile) with xCode 4.4, but
I'm trying build an App Engine connected Android application and am having some problems

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.