Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8821849
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T05:57:20+00:00 2026-06-14T05:57:20+00:00

I am trying to learn about web scraping and python (and programming for that

  • 0

I am trying to learn about web scraping and python (and programming for that matter) and have found the BeautifulSoup library which seems to offer a lot of possibilities.

I am trying to find out how to best pull the pertinent information from this page:

http://www.aidn.org.au/Industry-ViewCompany.asp?CID=3113

I can go into more detail on this, but basically the company name, the description about it, contact details, the various company details / statistics e.t.c.

At this stage looking at how to cleanly isolate this data and scrape it, with the view to put it all in a CSV or something later.

I am confused how to use BS to grab the different table data. There are lots of tr and td tags and not sure how to anchor on to anything unique.

The best I have come up with is the following code as a start:

from bs4 import BeautifulSoup
import urllib2

html = urllib2.urlopen("http://www.aidn.org.au/Industry-ViewCompany.asp?CID=3113")
soup = BeautifulSoup(html)
soupie = soup.prettify()
print soupie

and then from there use regex e.t.c. to pull data from the cleaned up text.

But there must be a better way to do this using the BS tree? Or is this site formatted in a way that BS won’t provide much more help?

Not looking for a full solution as that is a big ask and I want to learn, but any code snippets to get me on my way would be much appreciated.

Update

Thanks to @ZeroPiraeus below I am starting to understand how to parse through the tables. Here is the output from his code:

=== Personnel ===
bodytext    Ms Gail Morgan CEO
bodytext    Phone: +61.3. 9464 4455 Fax: +61.3. 9464 4422
bodytext    Lisa Mayoh Sales Manager
bodytext    Phone: +61.3. 9464 4455 Fax: +61.3. 9464 4422 Email: bob@aerospacematerials.com.au

=== Company Details ===
bodytext    ACN: 007 350 807 ABN: 71 007 350 807 Australian Owned Annual Turnover: $5M - $10M Number of Employees: 6-10 QA: ISO9001-2008, AS9120B, Export Percentage: 5 % Industry Categories: AerospaceLand (Vehicles, etc)LogisticsMarineProcurement Company Email: lisa@aerospacematerials.com.au Company Website: http://www.aerospacematerials.com.au Office: 2/6 Ovata Drive Tullamarine VIC 3043 Post: PO Box 188 TullamarineVIC 3043 Phone: +61.3. 9464 4455 Fax: +61.3. 9464 4422
paraheading ACN:
bodytext    007 350 807
paraheading ABN:
bodytext    71 007 350 807
paraheading 
bodytext    Australian Owned
paraheading Annual Turnover:
bodytext    $5M - $10M
paraheading Number of Employees:
bodytext    6-10
paraheading QA:
bodytext    ISO9001-2008, AS9120B,
paraheading Export Percentage:
bodytext    5 %
paraheading Industry Categories:
bodytext    AerospaceLand (Vehicles, etc)LogisticsMarineProcurement
paraheading Company Email:
bodytext    lisa@aerospacematerials.com.au
paraheading Company Website:
bodytext    http://www.aerospacematerials.com.au
paraheading Office:
bodytext    2/6 Ovata Drive Tullamarine VIC 3043
paraheading Post:
bodytext    PO Box 188 TullamarineVIC 3043
paraheading Phone:
bodytext    +61.3. 9464 4455
paraheading Fax:
bodytext    +61.3. 9464 4422

My next question is, what is the best way to put this data into a CSV which would be suitable for importing into a spreadsheet? For example having things like ‘ABN’ ‘ACN’ ‘Company Website’ e.t.c. as column headings and then the corresponding data as row information.

Thanks for any help.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T05:57:21+00:00Added an answer on June 14, 2026 at 5:57 am

    Your code will depend on exactly what you want and how you want to store it, but this snippet should give you an idea how you can get the relevant information out of the page:

    import requests
    
    from bs4 import BeautifulSoup
    
    url = "http://www.aidn.org.au/Industry-ViewCompany.asp?CID=3113"
    html = requests.get(url).text
    soup = BeautifulSoup(html)
    
    for feature_heading in soup.find_all("td", {"class": "Feature-Heading"}):
        print "\n=== %s ===" % feature_heading.text
        details = feature_heading.find_next_sibling("td")
        for item in details.find_all("td", {"class": ["bodytext", "paraheading"]}):
            print("\t".join([item["class"][0], " ".join(item.text.split())]))
    

    I find requests a more pleasant library to work with than urllib2, but of course that’s up to you.

    EDIT:

    In response to your followup question, here’s something you could use to write a CSV file from the scraped data:

    import csv
    import requests
    
    from bs4 import BeautifulSoup
    
    columns = ["ACN", "ABN", "Annual Turnover", "QA"]
    urls = ["http://www.aidn.org.au/Industry-ViewCompany.asp?CID=3113", ] # ... etc.
    
    with open("data.csv", "w") as csv_file:
        writer = csv.DictWriter(csv_file, columns)
        writer.writeheader()
        for url in urls:
            soup = BeautifulSoup(requests.get(url).text)
            row = {}
            for heading in soup.find_all("td", {"class": "paraheading"}):
                key = " ".join(heading.text.split()).rstrip(":")
                if key in columns:
                    next_td = heading.find_next_sibling("td", {"class": "bodytext"})
                    value = " ".join(next_td.text.split())
                    row[key] = value
            writer.writerow(row)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Trying to learn about php's arrays today. I have a set of arrays like
I've been trying to learn about metaclasses in Python. I get the main idea,
i'm trying to learn about user controls. I created a user control that has
I am trying to learn about all possible conditionals that can be used on
I am trying to learn more about BindingList because I believe that it will
I'm trying to learn more about how web and tcp work by implementing web
I am currently trying to learn some more about web design and I used
I have a couple of questions. I'm trying to learn how to make web
I am trying to learn more about Javascript, I have been coding with PHP
I'm new to Java-based web programming and am trying to learn JSF from the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.