Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6570901
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T14:48:52+00:00 2026-05-25T14:48:52+00:00

I regularly extract tables from Wikipedia. Excel’s web import does not work properly for

  • 0

I regularly extract tables from Wikipedia. Excel’s web import does not work properly for wikipedia, as it treats the whole page as a table. In google spreadsheet, I can enter this:

=ImportHtml("http://en.wikipedia.org/wiki/Upper_Peninsula_of_Michigan","table",3)

and this function will download the 3rd table, which lists all the counties of the UP of Michigan, from that page.

Is there something similar in R? or can be created via a user defined function?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T14:48:53+00:00Added an answer on May 25, 2026 at 2:48 pm

    The function readHTMLTable in package XML is ideal for this.

    Try the following:

    library(XML)
    doc <- readHTMLTable(
             doc="http://en.wikipedia.org/wiki/Upper_Peninsula_of_Michigan")
    
    doc[[6]]
    
                V1         V2                 V3                              V4
    1       County Population Land Area (sq mi) Population Density (per sq mi)
    2        Alger      9,862                918                            10.7
    3       Baraga      8,735                904                             9.7
    4     Chippewa     38,413               1561                            24.7
    5        Delta     38,520               1170                            32.9
    6    Dickinson     27,427                766                            35.8
    7      Gogebic     17,370               1102                            15.8
    8     Houghton     36,016               1012                            35.6
    9         Iron     13,138               1166                            11.3
    10    Keweenaw      2,301                541                             4.3
    11        Luce      7,024                903                             7.8
    12    Mackinac     11,943               1022                            11.7
    13   Marquette     64,634               1821                            35.5
    14   Menominee     25,109               1043                            24.3
    15   Ontonagon      7,818               1312                             6.0
    16 Schoolcraft      8,903               1178                             7.6
    17       TOTAL    317,258             16,420                            19.3
    

    readHTMLTable returns a list of data.frames for each element of the HTML page. You can use names to get information about each element:

    > names(doc)
     [1] "NULL"                                                                               
     [2] "toc"                                                                                
     [3] "Election results of the 2008 Presidential Election by County in the Upper Peninsula"
     [4] "NULL"                                                                               
     [5] "Cities and Villages of the Upper Peninsula"                                         
     [6] "Upper Peninsula Land Area and Population Density by County"                         
     [7] "19th Century Population by Census Year of the Upper Peninsula by County"            
     [8] "20th & 21st Centuries Population by Census Year of the Upper Peninsula by County"   
     [9] "NULL"                                                                               
    [10] "NULL"                                                                               
    [11] "NULL"                                                                               
    [12] "NULL"                                                                               
    [13] "NULL"                                                                               
    [14] "NULL"                                                                               
    [15] "NULL"                                                                               
    [16] "NULL" 
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'll regularly get an extract from a DB/2 database with dates and timestaps formatted
I'm looking for a .NET regular expression extract all the URLs from a webpage
I want a regular expression to extract the title from a HTML page. Currently
I need a regular expression to extract from two types of URIs http://example.com/path/to/page/?filter http://example.com/path/to/?filter
When using regular expressions we generally, if not always use them to extract some
I regularly achieve 100% coverage of libraries using TDD, but not always, and there
I regularly want to check if an object has a member or not. An
I'm trying to extract column names from a SQLite result set from sqlite_master's sql
I need to extract the digits from the following string using regular expression: pc
I want to extract text from a column using regular expressions in Oracle 11g.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.