Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3665908
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T01:49:02+00:00 2026-05-19T01:49:02+00:00

I’m first time poster here trying to pick up some Python skills; please be

  • 0

I’m first time poster here trying to pick up some Python skills; please be kind to me 🙂

While I’m not a complete stranger to programming concepts (I’ve been messing around with PHP before), the transition to Python has turned out to be somewhat difficult for me. I guess this mostly has to do with the fact that I lack most – if not all – basic understanding of common “design patterns” (?) and such.

Having that said, this is the problem. Part of my current project involves writing a simple scraper by utilizing Beautiful Soup. The data to be processed has a somewhat similar structure to the one which is laid out below.

<table>
    <tr>
        <td class="date">2011-01-01</td>
    </tr>
    <tr class="item">
        <td class="headline">Headline</td>
        <td class="link"><a href="#">Link</a></td>
    </tr>
    <tr class="item">
        <td class="headline">Headline</td>
        <td class="link"><a href="#">Link</a></td>
    </tr>
    <tr>
        <td class="date">2011-01-02</td>
    </tr>
    <tr class="item">
        <td class="headline">Headline</td>
        <td class="link"><a href="#">Link</a></td>
    </tr>
    <tr class="item">
        <td class="headline">Headline</td>
        <td class="link"><a href="#">Link</a></td>
    </tr>
</table>

The main issue is that I simply can’t get my head around how to 1) keep track of the current date (tr->td class=”date”) while 2) looping over the items in the subsequent tr:s (tr class=”item”->td class=”headline” and tr class=”item”->td class=”link”) and 3) store the processed data in an array.

Additionally, all data will be inserted into a database where each entry must contain the following information;

  • date
  • headline
  • link

Note that crud:ing the database is not part of the problem, I only mentioned this in order to better illustrate what I’m trying to accomplish here 🙂

Now, there are many different ways to skin a cat. So while a solution to the issue at hand is indeed very welcome, I’d be extremely grateful if someone would care to elaborate on the actual logic and strategy you would make use of in order to “attack” this kind of problem 🙂

Last but not least, sorry for such a noobish question.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T01:49:02+00:00Added an answer on May 19, 2026 at 1:49 am

    The basic problem is that this table is marked up for looks, not for semantic structure. Properly done, each date and its related items should share a parent. Unfortunately, they don’t, so we’ll have to make do.

    The basic strategy is to iterate through each row in the table

    • if the first tabledata has class ‘date’, we get the date value and update last_seen_date
    • Otherwise, we get extract a headline and a link, then save (last_seen_date, headline, link) to the database

    .

    import BeautifulSoup
    
    fname = r'c:\mydir\beautifulSoup.html'
    soup = BeautifulSoup.BeautifulSoup(open(fname, 'r'))
    
    items = []
    last_seen_date = None
    for el in soup.findAll('tr'):
        daterow = el.find('td', {'class':'date'})
        if daterow is None:     # not a date - get headline and link
            headline = el.find('td', {'class':'headline'}).text
            link = el.find('a').get('href')
            items.append((last_seen_date, headline, link))
        else:                   # get new date
            last_seen_date = daterow.text
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.