I have been going through Regex tutorials for hours now and I can’t seem to grasp it very well. I would like a regex statement that extracts an html title only if the title is exceptionally long (1000+ characters). I’ve managed to work out the following to select the entire title:
<title>(.*?)</title>
I have no idea where to begin adding the length portion. Any assistance would be greatly appreciated!
would do that (unless the title contains newlines – in that case it depends on the regex engine how to handle that).
This also presupposes that there is only one
<title>tag in the string you’re looking at (which probably is the case in an HTML file, so you should be OK, given the general warning that regexes are a brittle tool when dealing with HTML).