I’m relatively new to python so things like this are not coming easy to

Question

0

Asked: June 15, 20262026-06-15T15:37:54+00:00 2026-06-15T15:37:54+00:00

I’m relatively new to python so things like this are not coming easy to

0

I’m relatively new to python so things like this are not coming easy to me.

I just want to loop through a web pages content and then print each occurrence to the console window for now but I obviously have my loop wrong.

import sys
import re
import urllib2
import urlparse

crawling = tocrawl.pop()
response = urllib2.urlopen(crawling)

msg = response.read()
endDiv = msg.find('</div>')
while endDiv != -1:
    endDiv = msg.find('</div>')
    startPos = msg.find('class="facultyname">', endDiv)
    if startPos != -1:
        nextPos = msg.find('.php">', startPos)
        endPos = msg.find('</a>', nextPos)
    if endPos != -1:
        name = msg[nextPos+6:endPos]
        print name, "   ",

    startPos = msg.find('function escramble()')
    if startPos != -1:
        nextPos = msg.find('b=', startPos)
        endPos = msg.find('c', nextPos)
    if endPos != -1:
        email = msg[nextPos+3:endPos-1]
        email = email[:-13] + '@email.com'
        print email

    endDiv = msg.find('</div>', endPos)

I’m already grabbing the first occurrence, I just want to loop until the end of the page and collect the rest.

Sample HTML:

<div id="main-text">

   <p class="title">Research Scientists</p>


   <div class="space">&nbsp;</div>
   <img src="photos/icons/bastolaicon.jpg" class="faculty" width="53" height="71" alt="Bastola Photo" />

   <div class="facultyname">
     <strong><a href="people/bastola.php">person1</a>
     <br /><em>Post-Doctoral Scientist</em></strong>
     <br />
   </div>

   <div class="facultybody">
     Rm. 218A
     <br /><em><script type="text/javascript">

       <!--
       function escramble(){
       var a,b,c,d,e,f,g,h,i
       a='<a href=\"mai'
       b='person1'
       c='\">'
       a+='lto:'
       b+='@'
       e='</a>'
       f=''
       b+='email.com'
       g='<img src=\"'
       h=''
       i='\" alt="Email us." border="0">'

       if (f) d=f
       else if (h) d=g+h+i
       else d=b

       document.write(a+b+c+d+e)
       }
       escramble()
       //-->

       </script></em>

   </div>

   <div class="space">&nbsp;</div>

   <img src="photos/icons/person2icon.jpg" class="faculty" width="53" height="71" alt="person2 Photo" />

   <div class="facultyname">
     <strong><a href="people/person2.shtml">person2</a>
     <br /><em>Assistant Research Scientist</em></strong>
     <br />
   </div>

   <div class="facultybody">
     Rm. 227
     <br />(850) 645-1253
     <br /><em><script type="text/javascript">

       <!--
       function escramble(){
       var a,b,c,d,e,f,g,h,i
       a='<a href=\"mai'
       b='person2'
       c='\">'
       a+='lto:'
       b+='@'
       e='</a>'
       f=''
       b+='email.com'
       g='<img src=\"'
       h=''
       i='\" alt="Email us." border="0">'

       if (f) d=f
       else if (h) d=g+h+i
       else d=b

       document.write(a+b+c+d+e)
       }
       escramble()
       //-->

       </script></em>

   </div>

   <div class="spacer">&nbsp;</div>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T15:37:55+00:00

Quick and dirty approach that works for your sample data:

>>> res = re.findall(r"b\+?='(.*?)'", html)
>>> res
['person1', '@', 'email.com', 'person2', '@', 'email.com']
>>> emails [''.join(group) for group in zip(*[iter(res)]*3)]
['person1@email.com', 'person2@email.com']

And since this is already horrible then let’s really kludge it:

>>> names = [name.split('>', 1)[1] for name in re.findall(r'href="people(.*?)</a>', html)]
>>> names
['person1', 'person2']
>>> zip(names, emails)
[('person1', 'person1@email.com'), ('person2', 'person2@email.com')]

Note – this just works on your sample data – HTML is fickle – so don’t expect this to be robust – easily managable etc… etc…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m relatively new to python so things like this are not coming easy to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply