This is the HTML I have:
p_tags = '''<p class='foo-body'> <font class='test-proof'>Full name</font> Foobar<br /> <font class='test-proof'>Born</font> July 7, 1923, foo, bar<br /> <font class='test-proof'>Current age</font> 27 years 226 days<br /> <font class='test-proof'>Major teams</font> <span style='white-space: nowrap'>Japan,</span> <span style='white-space: nowrap'>Jakarta,</span> <span style='white-space: nowrap'>bazz,</span> <span style='white-space: nowrap'>foo,</span> <span style='white-space: nowrap'>foobazz</span><br /> <font class='test-proof'>Also</font> bar<br /> <font class='test-proof'>foo style</font> hand <br /> <font class='test-proof'>bar style</font> ball<br /> <font class='test-proof'>foo position</font> bak<br /> <br class='bar' /> </p>'''
This is my Python code, using Beautiful Soup:
def get_info(p_tags): '''Returns brief information.''' head_list = [] detail_list = [] # This works fine for head in p_tags.findAll('font', 'test-proof'): head_list.append(head.contents[0]) # Some problem with this? for index in xrange(2, 30, 4): detail_list.append(p_tags.contents[index]) return dict([(l, detail_list[head_list.index(l)]) for l in head_list])
I get the proper head_list from the HTML but the detail_list is not working.
head_list = [u'Full name', u'Born', u'Current age', u'Major teams', u'Also', u'foo style', u'bar style', u'foo position']
I wanted something like this
{ 'Full name': 'Foobar', 'Born': 'July 7, 1923, foo, bar', 'Current age': '78 years 226 days', 'Major teams': 'Japan, Jakarta, bazz, foo, foobazz', 'Also': 'bar', 'foo style': 'hand', 'bar style': 'ball', 'foo position': 'bak' }
Any help would be appreciable. Thanks in advance.
Sorry for the unnecessarily complex code, I badly need a big dose of caffeine 😉