I need to grab inline script tags inside html pages.
The regex will eventually be driven from c#.
Now I am using Expresso for test purpose.
The following is the best for now:
.*<script.*\r\n(.*\r\n)*\s*</script>
i.e.
.*<scriptcatch the script tag.*\r\ncatch anything till the end of line(.*\r\n)*catch other lines of the script\s*</script>catch the closing script, with any indentation before
It grabs ALL the stuff between the first tag, inculding html and other script tags.
Two scripts on the same line will break your regex. Try it on the source of the page with your question.
Parsing HTML with regex is not a very good idea (there is a link in the comment to your question which answers why the
<center>cannot hold); use HTML parser instead.The next code snippet selects the
<script>nodes by using HtmlAgilityPack:Isn’t this is simplier than regex?