I would like to extract static text from between HTML tags:
<p>
text here
<span> text here <b>too</b></span>
</p>
I have this regular expression so far:
(<|<)[\s\/\?]*(\w+)(?<attributes>.*?)[\s\/\?]*(>|>)(\n|.)*?<\/\2>
I don’t want to use HTML parser. Any help. Thanks!!
Parsing HTML with regexes is usually a bad idea, but that’s not exactly what you’re trying to do here. All you really want is to strip out the HTML tags. In your example, you try to match the tags and parse out the attributes. But you don’t need to do this.
If the following assumptions hold:
<p>delimits paragraphs)Then all you need to do is to strip the pattern
</?[^>]+>.Escaped, in vim, this is: