Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8890769
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T22:34:40+00:00 2026-06-14T22:34:40+00:00

Im having trouble parsing a HTML page using Beautiful Soup 3, and python 2.6.

  • 0

Im having trouble parsing a HTML page using Beautiful Soup 3, and python 2.6.

The HTML content is this:

content='<div class="egV2_EventReportCardLeftBlockShortWidth">
<span class="egV2_EventReportCardTitle">When</span>
<span class="egV2_EventReportCardBody">
<meta itemprop="startDate" content="2012-11-23T10:00:00.0000000">
<span class='egV2_archivedDateEnded'>STARTS</span>Fri 23 Nov,10:00AM<br/>
<meta itemprop="endDate" content="2012-12-03T18:00:00.0000000">
<span class='egV2_archivedDateEnded'>ENDS</span>Mon 03 Dec,6:00PM</span>
<span class="egV2_EventReportCardBody"></span>
<div class="egV2_div_cal" onclick=" showExportEvent()">
<div class="egV2_div_cal_outerFix">
<div class="egV2_div_cal_InnerAdjust"> Cal </div>
</div></div></div>'

And I want to get the string ‘Fri 23 Nov,10:00AM’ out of the middle into a variable, for concatenating, and sending back to a PHP page.

To read this content, i use the following code:
(the content above comes through from a html page read (http://everguide.com.au/melbourne/event/2012-nov-23/life-with-bird-spring-warehouse-sale/)

import urllib2
req = urllib2.Request(URL)
response = urllib2.urlopen(req)
html = response.read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html.decode('utf-8'))
soup.prettify()
import re
for node in soup.findAll(itemprop="name"):
    n = ''.join(node.findAll(text=True)) 
for node in soup.findAll("div", { "class" : "egV2_EventReportCardLeftBlockShortWidth" }):
    d = ''.join(node.findAll(text=True))
print n,"|", d

Which returns:

[(ssh user)]# python testscrape.py

LIFE with BIRD Spring Warehouse Sale | 
When
<span class="egV2_EventReportCardDateTitle">STARTS</span>
STARTSFri 23 Nov,10:00AMENDSMon 03 Dec,6:00PM
<span class="egV2_EventReportCardDateTitle">ENDS</span>



 Cal 



[(ssh user)]# 

(And it includes all those line breaks etc).

So you can see there at the end, Im grouping both of those stripped strings into one printout, with a separator character in the middle to PHP can read back the string as one, and then break it apart.

Problem is – the python code can read that page and store the text, but it includes all those rubbish and tags etc, that are confusing the PHP app.

I really just want returned:

Fri 23 Nov,10:00AM

is it because Im using the findAll(text=True) method?

How can I drill down and get just the text only in that div – not the span tags too?

Any help would be greatly appreciated, thank you.

Rick – Melbourne.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T22:34:42+00:00Added an answer on June 14, 2026 at 10:34 pm

    Why not try something like

    In [95]: soup = BeautifulSoup(content)
    
    In [96]: soup.find("span", {"class": "egV2_archivedDateEnded"})
    Out[96]: <span class="egV2_archivedDateEnded">STARTS</span>
    
    In [97]: soup.find("span", {"class": "egV2_archivedDateEnded"}).next
    Out[97]: u'STARTS'
    
    In [98]: soup.find("span", {"class": "egV2_archivedDateEnded"}).next.next
    Out[98]: u'Fri 23 Nov,10:00AM'
    

    or even

    In [99]: soup.find("span", {"class": "egV2_archivedDateEnded"}).nextSibling
    Out[99]: u'Fri 23 Nov,10:00AM'
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm having trouble parsing this xml file using jquery: <?xml version=1.0 encoding=utf-8?> <?mso-infoPathSolution name=urn:schemas-microsoft-com:office:infopath:BOT-Memos:-myXSD-2011-07-13T14-29-57
I'm having trouble parsing Json string to Objects in C#. I'm using this: JavaScriptSerilizer
I am having trouble parsing some returned XML using this command: XML::Parser.string(xml_string).parse Here is
I'm having trouble parsing an iso date from my json api/mongodb using rest kit.
I'm building a simple web-based RSS reader in Python, but I'm having trouble parsing
Been having a lot of trouble with this... new to Python so sorry if
This is the XML I'm trying to parse. I'm having trouble parsing the entry
I am trying to generate a PDF using HTML template. i am having trouble
I am having trouble parsing my JSON which i get from javascript. The format
I am having some trouble parsing xform xml with javascript. The structure of the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.