Most website I can parse its title easily with RegEx “(.)” or “\s(.+?)\s*”. However some sites have a bit different formatting, like http://www.youtube.com (see below). The expression above does not work. Any help catching this kind of format and any other HTML formats?
Thanks
-Tim.
<title>
YouTube - Broadcast Yourself.
If you want to include the line break to the regular expression, in most cases you would only need to use the
\ninside the expression. That said, which language/interpreter are you using? Some of them doesn’t allow multiline expressions.If they are permitted, something like
(.|\n|\r)*would suffice.In case your language or interpreter is not compatible to multiline regular expressions, you could always replace the newlines characters with spaces, and then pass the resulting string to the regular expression parser. That again also depends on your programming environment.
Hope helped!