Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8337035
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T04:10:21+00:00 2026-06-09T04:10:21+00:00

I am looking for a way to get certain info from HTML in linux

  • 0

I am looking for a way to get certain info from HTML in linux shell environment.

This is bit that I’m interested in :

<table class="details" border="0" cellpadding="5" cellspacing="2" width="95%">
  <tr valign="top">
    <th>Tests</th>
    <th>Failures</th>
    <th>Success Rate</th>
    <th>Average Time</th>
    <th>Min Time</th>
    <th>Max Time</th>
  </tr>
  <tr valign="top" class="Failure">
    <td>103</td>
    <td>24</td>
    <td>76.70%</td>
    <td>71 ms</td>
    <td>0 ms</td>
    <td>829 ms</td>
  </tr>
</table>

And I want to store in shell variables or echo these in key value pairs extracted from above html. Example :

Tests         : 103
Failures      : 24
Success Rate  : 76.70 %
and so on..

What I can do at the moment is to create a java program that will use sax parser or html parser such as jsoup to extract this info.

But using java here seems to be overhead with including the runnable jar inside the “wrapper” script you want to execute.

I’m sure that there must be “shell” languages out there that can do the same i.e. perl, python, bash etc.

My problem is that I have zero experience with these, can somebody help me resolve this “fairly easy” issue

Quick update:

I forgot to mention that I’ve got more tables and more rows in the .html document sorry about that (early morning).

Update #2:

Tried to install Bsoup like this since I don’t have root access :

$ wget http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/beautifulsoup4-4.1.0.tar.gz
$ tar -zxvf beautifulsoup4-4.1.0.tar.gz
$ cp -r beautifulsoup4-4.1.0/bs4 .
$ vi htmlParse.py # (paste code from ) Tichodromas' answer, just in case this (http://pastebin.com/4Je11Y9q) is what I pasted
$ run file (python htmlParse.py)

error:

$ python htmlParse.py
Traceback (most recent call last):
  File "htmlParse.py", line 1, in ?
    from bs4 import BeautifulSoup
  File "/home/gdd/setup/py/bs4/__init__.py", line 29
    from .builder import builder_registry
         ^
SyntaxError: invalid syntax

Update #3 :

Running Tichodromas’ answer get this error :

Traceback (most recent call last):
  File "test.py", line 27, in ?
    headings = [th.get_text() for th in table.find("tr").find_all("th")]
TypeError: 'NoneType' object is not callable

any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T04:10:23+00:00Added an answer on June 9, 2026 at 4:10 am

    A Python solution using BeautifulSoup4 (Edit: with proper skipping. Edit3: Using class="details" to select the table):

    from bs4 import BeautifulSoup
    
    html = """
      <table class="details" border="0" cellpadding="5" cellspacing="2" width="95%">
        <tr valign="top">
          <th>Tests</th>
          <th>Failures</th>
          <th>Success Rate</th>
          <th>Average Time</th>
          <th>Min Time</th>
          <th>Max Time</th>
       </tr>
       <tr valign="top" class="Failure">
         <td>103</td>
         <td>24</td>
         <td>76.70%</td>
         <td>71 ms</td>
         <td>0 ms</td>
         <td>829 ms</td>
      </tr>
    </table>"""
    
    soup = BeautifulSoup(html)
    table = soup.find("table", attrs={"class":"details"})
    
    # The first tr contains the field names.
    headings = [th.get_text() for th in table.find("tr").find_all("th")]
    
    datasets = []
    for row in table.find_all("tr")[1:]:
        dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
        datasets.append(dataset)
    
    print datasets
    

    The result looks like this:

    [[(u'Tests', u'103'),
      (u'Failures', u'24'),
      (u'Success Rate', u'76.70%'),
      (u'Average Time', u'71 ms'),
      (u'Min Time', u'0 ms'),
      (u'Max Time', u'829 ms')]]
    

    Edit2: To produce the desired output, use something like this:

    for dataset in datasets:
        for field in dataset:
            print "{0:<16}: {1}".format(field[0], field[1])
    

    Result:

    Tests           : 103
    Failures        : 24
    Success Rate    : 76.70%
    Average Time    : 71 ms
    Min Time        : 0 ms
    Max Time        : 829 ms
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking for a way to get from the GPS it's longitude and latitude
I'm looking for a way to get a static methods list for a certain
I'm looking for a way to select elements that contain a certain element and
I am looking for a way to extract/get values from an array and assign
I am looking for a better way of doing this: SELECT * FROM $tbl_name
I'm looking for a way to accomplish a certain task and that is, going
Looking for a way to get path of file (and then process it in
Looking for a way to get the seconds until @post.created_at date time with timezone.
Am looking for a way to get image of the first page in pdf
I'm looking for a way to get the values of params being set, whether

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.