Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7976495
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T08:55:27+00:00 2026-06-04T08:55:27+00:00

There are good answers on SO about how to use readHTMLTable from the XML

  • 0

There are good answers on SO about how to use readHTMLTable from the XML package and I did that with regular http pages, however I am not able to solve my problem with https pages.

I am trying to read table on this website (url string):

library(RTidyHTML)
library(XML)
url <- "https://ned.nih.gov/search/ViewDetails.aspx?NIHID=0010121048"
h = htmlParse(url)
tables <- readHTMLTable(url)

But I get this error: File https://ned.nih.gov/search/Vi…does not exist.

I tried to get past the https problem with this (first 2 lines below)(from using google to find solution (like here:http://tonybreyal.wordpress.com/2012/01/13/r-a-quick-scrape-of-top-grossing-films-from-boxofficemojo-com/).

This trick helps to see more of the page, but any attempts to extract the table are not working. Any advice appreciated. I need the table fields like Organization, Organizational Title, Manager.

 #attempt to get past the https problem 
 raw <- getURL(url, followlocation = TRUE, cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
 head(raw)
[1] "\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; 
...
 h = htmlParse(raw)
Error in htmlParse(raw) : File ...
tables <- readHTMLTable(raw)
Error in htmlParse(doc) : File ...
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T08:55:28+00:00Added an answer on June 4, 2026 at 8:55 am

    The new package httr provides a wrapper around RCurl to make it easier to scrape all kinds of pages.

    Still, this page gave me a fair amount of trouble. The following works, but no doubt there are easier ways of doing it.

    library("httr")
    library("XML")
    
    # Define certicificate file
    cafile <- system.file("CurlSSL", "cacert.pem", package = "RCurl")
    
    # Read page
    page <- GET(
      "https://ned.nih.gov/", 
      path="search/ViewDetails.aspx", 
      query="NIHID=0010121048",
      config(cainfo = cafile)
    )
    
    # Use regex to extract the desired table
    x <- text_content(page)
    tab <- sub('.*(<table class="grid".*?>.*</table>).*', '\\1', x)
    
    # Parse the table
    readHTMLTable(tab)
    

    The results:

    $ctl00_ContentPlaceHolder_dvPerson
                    V1                                      V2
    1      Legal Name:                    Dr Francis S Collins
    2  Preferred Name:                      Dr Francis Collins
    3          E-mail:                 francis.collins@nih.gov
    4        Location: BG 1 RM 1261 CENTER DRBETHESDA MD 20814
    5       Mail Stop:                                       Â
    6           Phone:                            301-496-2433
    7             Fax:                                       Â
    8              IC:             OD (Office of the Director)
    9    Organization:            Office of the Director (HNA)
    10 Classification:                                Employee
    11            TTY:                                       Â
    

    Get httr here: http://cran.r-project.org/web/packages/httr/index.html


    EDIT: Useful page with FAQ about the RCurl package: http://www.omegahat.org/RCurl/FAQ.html

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know there is a question about that, but there isn't any good answer(for
I recently asked a question about functional programming, and received (good!) answers that prompted
Is there a good tool that will look at a .NET assembly and tell
Yes, I know. There are a lot of questions and answers about the NSOperation
I am new to Emacs. I have googled this but no good answer there.
Are there any good technical solutions for extremely long term archiving of data, for
Is there a good development IDE for Groovy/Grails code completion under Linux?
Are there any good examples of mvc routing wherein every 404 page not found
is there a good example of a source file containing Javadoc ? I can
Are there any good open source frameworks for developing computer system emulators? I am

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.