OK I have been trying to parse a html tag which in it contains

Question

0

Editorial Team

Asked: May 28, 20262026-05-28T05:53:32+00:00 2026-05-28T05:53:32+00:00

OK I have been trying to parse a html tag which in it contains

0

OK I have been trying to parse a

html tag which in it contains other tags and text

for example

if I had this html (yes I know using <b> and <i> is bad but it makes for a simple example)

<p> <b> 1 </b> Apple <b> 2 </b> <i> Orange </i> <b> 3 </b> Pineapple </p>

It could render something like this

1 Apple 2 Orange 3 Pineapple

How can I get a relation of

{"1": "Apple", "2": "<i> Orange </i>, "3": "Pineapple"}

I have tried using beautifulsoup tag.next but it doesn’t return with tags instead it stops

I have tried using beautifulsoup tag.find(text = True, recursive = False) doesn’t return anything but a \n

I have tried tags.findAll("b")

for i in b:
    print i.text
    print tags.find(i).text

I have looked up parsing tags in tags and nothing really came up fitting some suggest regexes (sounds like trouble) and some said it can’t be done (not really helpful)

I think what I have to find out how to do is get the html between two tags. I tried iterating through .nextSibling bit it eventually gave me a unicode space so can’t continue iterating through.

Anyone have experience with this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T05:53:33+00:00

To accumulate elements (tags and text) before and after each <b> tag in <p>:

#!/usr/bin/env python
from collections import defaultdict
from BeautifulSoup import BeautifulSoup

d = defaultdict(list) # data structure to hold the result
soup = BeautifulSoup(html)
i = 0
for el in soup.p.contents:
    if getattr(el, 'name', None) == 'b':
       i += 1  # switch to next <b> element
    else:
       d[i].append(el)

import pprint
pprint.pprint(dict(d))

It expresses the intent correctly but it is not as readable and efficient as it could be.

Output

{0: [u' '],
 1: [u' Apple '],
 2: [u' ', <i> Orange </i>, u' '],
 3: [u' Pineapple ']}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

OK I have been trying to parse a html tag which in it contains

Leave an answerCancel reply

1 Answer

Output

Leave an answer
Cancel reply