Consider a string:
str1="abcd<aaa>some thing <#^&*some more!#$@ </aaa>
abcdefgasf <aaa>asfaf %^&*$saf asf %$^ </aaa>
<another tag> some text </another tag>
<aaa>sfafaff#%%%^^</aaa> "
Now in the above string how to replace the special characters and white spaces that are present between the tag <aaa> and </aaa>?
The replacing character should be ‘_’.
Here is a possible solution, is a little bit complex, so I’ll explain it step by step.
We are going to use a module called
re, for regular expressions:OK, here is our string:
First, let’s get all the content inside the tags:
Now, lets iterate through each content of
inside_tagsand replace the special characters:So, in
cleaned_contentsnow we have the contents inside the tags, but with the special characters replaced. Now, letszip(join in a tuple) each content inside a tag with its “cleaned” content:And finally, search the tag contents in the string and replace them with the new cleaned content:
NOTE: If you don’t understand something (there is a bunch of weird stuff here, like
?,[^\w ],zip) post your comment below and I’ll explain it.