I’m trying to write a search and replace regex that will detect whether HTML that has been returned by a web request is complete. I have had cases when the server returns incomplete HTML (half of the page), so I want to detect that in the client and request the page again.
I was thinking the regex could look for the presence of <html[^>]*>, and then the absence of </html>. The replace part would then replace the whole HTML with a bit of special text.
I can’t just check for the absence of </html> because the returned data might be a text file, and I can’t check MIME types.
Any ideas? I just can’t wrap my head around the look-behinds this would require. I’m not trying to parse HTML, just searching for bits of text, which is what regexes are for, right?
EDIT:
The regexes will be run by C#, but I write them in a regex editor. I can only use a search and replace regex to solve this, nothing else.
Oded is correct. You cannot parse HTML with regex. But of course you can see whether some (multiline) string contains
<html>not followed by</html>. If you are sure that whatever your web request returns will be consistent and not contain any weird things likehtmltags inside comments, thenwill find such a string, if you set the “dot matches newlines” option. How to do this depends on the regex implementation which you didn’t provide yet.