OK I have been trying to parse a
html tag which in it contains other tags and text
for example
if I had this html (yes I know using <b> and <i> is bad but it makes for a simple example)
<p> <b> 1 </b> Apple <b> 2 </b> <i> Orange </i> <b> 3 </b> Pineapple </p>
It could render something like this
1 Apple 2 Orange 3 Pineapple
How can I get a relation of
{"1": "Apple", "2": "<i> Orange </i>, "3": "Pineapple"}
I have tried using beautifulsoup tag.next but it doesn’t return with tags instead it stops
I have tried using beautifulsoup tag.find(text = True, recursive = False) doesn’t return anything but a \n
I have tried tags.findAll("b")
for i in b:
print i.text
print tags.find(i).text
I have looked up parsing tags in tags and nothing really came up fitting some suggest regexes (sounds like trouble) and some said it can’t be done (not really helpful)
I think what I have to find out how to do is get the html between two tags. I tried iterating through .nextSibling bit it eventually gave me a unicode space so can’t continue iterating through.
Anyone have experience with this?
To accumulate elements (tags and text) before and after each
<b>tag in<p>:It expresses the intent correctly but it is not as readable and efficient as it could be.
Output