Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7062273
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T04:34:27+00:00 2026-05-28T04:34:27+00:00

Is there an easy way to extract data from specific HTML tables using Mathematica?

  • 0

Is there an easy way to extract data from specific HTML tables using Mathematica? Import seems to be pretty powerful, and Mathematica appears to be capable of handling formats such as XML pretty well.

Here’s an example: http://en.wikipedia.org/wiki/Unemployment_by_country

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T04:34:28+00:00Added an answer on May 28, 2026 at 4:34 am

    For general examples of this there are these How tos:

    • How to | Clean Up Data Imported from a ZIP File
    • How to | Clean Up Data Imported from a Website

    For this specific example just import it

    tmp = Import["http://en.wikipedia.org/wiki/Unemployment_by_country", "Data"]
    

    Cleaning it up is fairly straight forward with this import. The table is 3 columns so extract it from the rest of the stuff:

    tmp1 = Cases[tmp, {_, _?NumberQ, _}, \[Infinity]]
    

    You will presumably want to remove the square bracket references (??):

    tmp1[[All, 3]] = Flatten[If[StringQ[#], 
    StringCases[#, x__ ~~ Whitespace ~~ "[" ~~ __ :> x], #] & /@ tmp1[[All, 3]]]
    
    Grid[tmp1, Frame -> All]
    

    Note also you can add the header back if you want it in your table, which you probably do

    Grid[Join[{{"Country / Region", "Unemployment rate (%)", 
       "Source / date of information"}}, tmp1], Frame -> All]
    

    purists might object to the last step but when you are scraping data generally you just want to get the job done and each site is a case by case prospect. So some manual inspection and flexibility gets you the fastest overall result.

    Edit

    if you wanted the flags you could also get them from CountryData. Some further cleaning up is needed otherwise a lot of misses will occur. The cleanup involves removing the reference to the “sovereign country” in parenthesis. e.g. “Guam ( United States )” -> “Gaum”.

    tmp2 = Flatten[
      If[StringMatchQ[#, __ ~~ "(" ~~ __], 
         StringCases[#, 
          z__ ~~ Shortest["(" ~~ __ ~~ ")" ~~ EndOfString] :> 
           StringTrim@z], StringTrim[#]] & /@ tmp1[[All, 1]]]
    

    This will still produce some output that CountryData does not recognize.

    flags = CountryData[#, "Flag"] & /@ tmp2;
    Cases[flags, _CountryData]
    

    6 misses out of 190. Remove those misses from the output:

    flags = If[Head[#] === CountryData, {""}, {#}] & /@ flags; (*much faster than rule replacement*)
    tmp2 = Join[flags, tmp1, 2];
    Grid[tmp2, Frame -> All]
    

    Note that this takes a while to render.

    enter image description here

    You can obviously style the Grid as desired using Grid options and also resize the images if needed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there an easy way to extract the raw data from a Blogger page.
Is there an easy way to extract table DDL information, via a query, using
Is there an easy way to produce MSDN-style documentation from the Visual Studio XML
Is there an easy way of using the RegularExpressionValidator control while ignoring white space?
Is there an easy way to extract every nth element of a vector in
Is there a relatively easy way to extract a relationship-consistent subset of a DataSet
Is there a way to extract timezone information directly from an oracle.sql.TIMESTAMPTZ object (selected
double d = 4.321562; Is there an easy way to extract the 0.321562 on
I've got two tables from which I need to extract information, but the data
My task is to extract some data from a given document using Perl-style (or

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.