Sorry, another python newbie question. I have a string:
my_string = "<p>this is some \n fun</p>And this is \n some more fun!"
I would like:
my_string = "<p>this is some fun</p>And this is \n some more fun!"
In other words, how do I get rid of ‘\n’ only if it occurs inside an html tag?
I have:
my_string = re.sub('<(.*?)>(.*?)\n(.*?)</(.*?)>', 'replace with what???', my_string)
Which obviously won’t work, but I’m stuck.
You should try using BeautifulSoup (
bs4), this will allow you to parse XML tags and pages.This will pull out the new line in the p tag. If the content has more than one tag,
Nonecan be used as well as a for loop, then gathering the children (using thetag.childproperty).For example:
Though, this might not work exactly the way you want it (as web pages can vary), this code can be reproduced for your needs.