I’m looking for a regex that matches all used HTML tags in a text consisting of several lines. It should read out ‘b’, ‘p’ and ‘script’ in the following lines:
<b> <p class='normalText'> <script type='text/javascript'>
Is there such thing? The start I have is that it should start with a ‘<‘ and read until it hits a space or a ‘>’, but at the same time, it should not include the starting ‘<‘ since I just want to match the letter/word itself. Thoughts?
It’s virtually impossible to regex HTML once you start considering all the special cases and malformed HTML that browsers sometimes happilly parse anyway. That said however I thought it might be fun to get the names without using capture groups and thus I present too you with the following sollution:
For the record I hold little faith in it being at all useful in any but the most trivial of cases.