Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 931733
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T20:33:11+00:00 2026-05-15T20:33:11+00:00

I would like any advice on how to create and visualize a link map

  • 0

I would like any advice on how to create and visualize a link map between blogs so to reflect the “social network” between them.

Here is how I am thinking of doing it:

  1. Start with one (or more) blog home page and collect all the links on that page
  2. Remove all the links that are internal links (that is If I start from http://www.website.com. Then I want to remove all the links from the shape “www.website.com/***”). But store all the external links.
  3. Go to each of these links (assuming you haven’t visited them already), and repeat step 1.
  4. Continue until (let’s say) X jumps from the first page.
  5. Plot the data collected.

I imagine that in order to do this in R, one would use RCurl/XML (Thanks Shane for your answer here), combined with something like igraph.

But since I don’t have experience with either of them, is there someone here that might be willing to correct me if I missed any important step, or attach any useful snippet of code to allow this task?

p.s: My motivation for this question is that in a week I am giving a talk on useR 2010 on “blogging and R”, and I thought this might be a nice way to both give something fun to the audience and also motivate them to do something like this themselves.

Thanks a lot!

Tal

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T20:33:11+00:00Added an answer on May 15, 2026 at 8:33 pm

    NB: This example is a very BASIC way of getting the links and therefore would need to be tweaked in order to be more robust. 🙂

    I don’t know how useful this code is, but hopefully it might give you an idea of the direction to go in (just copy and paste it into R, it’s a self contained example once you’ve installed the packages RCurl and XML):

    library(RCurl)
    library(XML)
    
    get.links.on.page <- function(u) {
      doc <- getURL(u)
      html <- htmlTreeParse(doc, useInternalNodes = TRUE)
      nodes <- getNodeSet(html, "//html//body//a[@href]")
      urls <- sapply(nodes, function(x) x <- xmlAttrs(x)[[1]])
      urls <- sort(urls)
      return(urls)
    }
    
    # a naieve way of doing it. Python has 'urlparse' which is suppose to be rather good at this
    get.root.domain <- function(u) {
      root <- unlist(strsplit(u, "/"))[3]
      return(root)
    }
    
    # a naieve method to filter out duplicated, invalid and self-referecing urls. 
    filter.links <- function(seed, urls) {
      urls <- unique(urls)
      urls <- urls[which(substr(urls, start = 1, stop = 1) == "h")]
      urls <- urls[grep("http", urls, fixed = TRUE)]
      seed.root <- get.root.domain(seed)
      urls <- urls[-grep(seed.root, urls, fixed = TRUE)]
      return(urls)
    }
    
    # pass each url to this function
    main.fn <- function(seed) {
      raw.urls <- get.links.on.page(seed)
      filtered.urls <- filter.links(seed, raw.urls)
      return(filtered.urls)
    }
    
    ### example  ###
    seed <- "http://www.r-bloggers.com/blogs-list/"
    urls <- main.fn(seed)
    
    # crawl first 3 links and get urls for each, put in a list 
    x <- lapply(as.list(urls[1:3]), main.fn)
    names(x) <- urls[1:3]
    x
    

    If you copy and paste it into R, and then look at x, I think it’ll make sense.

    Either way, good luck mate!
    Tony Breyal

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to know is there any difference in performance between these two
I'm investigating Adobe CQ5 and would like any advice on how to integrate its
I would like to inspect any code changes after doing a git pull .
I would like to hide any text matching a pattern from any HTML page,
Instead of app-id@appspot.com or any@app-id.appspot.com I would like to use any@own-domain.tld. can this be
I have installed Visual Studio 2012 RC. I would like to find any example
I have two vectors, v1 and v2. I would like to find any of
I would like to know if there exists any kind of library or workaround
I would like to know if there is any kind of tool to monitor
I would like to know if there are any good resources (online or books)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.