I’m looking for a method that will allow me to get the title of a webpage and store it as a string.
However all the solutions I have found so far involve downloading the source code for the page, which isn’t really practical for a large number of webpages.
The only way I could see would be to limit the length of the string or it only downloads either a set number of chars or stops once it reaches the tag, however this obviously will still be quite large?
Thanks
As the
<title>tag is in the HTML itself, there will be no way to not download the file to find “just the title”. You should be able download a portion of the file until you’ve read in the<title>tag, or the</head>tag and then stop, but you’ll still need to download (at least a portion of) the file.This can be accomplished with
HttpWebRequest/HttpWebResponseand reading in data from the response stream until we’ve either read in a<title></title>block, or the</head>tag. I added the</head>tag check because, in valid HTML, the title block must appear within the head block – so, with this check we will never parse the entire file in any case (unless there is no head block, of course).The following should be able to accomplish this task:
UPDATE: Updated the original source-example to use a compiled
Regexand ausingstatement for theStreamfor better efficiency and maintainability.