I have small problem. I’m trying to get text whitch is out of html elements.
Example input:
I want this text I want this text I want this text <I don't want this text/>
I want this text I wan this text <I don't>want this</text>
Does anybody know how is it possible by regex? I thought that I can make it by deleting element text. So, does anybody know another solution for this problem? Please help me.
I agree that anything not trivial should be done with a HTML parser (Agility pack is excellent if you use .NET) but for small requirements as this its more than likely overkill.
Then again, A HTML parser knows more about the quirks and edge cases that HTML is full of. Be sure to test well before using a regex.
Here you go
It also correctly ignores
and not just the tags
In C# this becomes