I’m trying to scrape an HTML page for it’s title using a regular expression.

Question

0

Asked: June 9, 20262026-06-09T20:34:49+00:00 2026-06-09T20:34:49+00:00

I’m trying to scrape an HTML page for it’s title using a regular expression.

0

I’m trying to scrape an HTML page for it’s title using a regular expression. Here’s what I’m trying:

\<title\>\A\Z\</title\>

Any suggestions?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T20:34:51+00:00

<title>(.*?)</title>

The brackets around .*? lets you reference the capture group. Your regular expression library will probably have a way to return what is matched in capture groups. The group indexed 0 is the whole match. So you should probably pick group index 1, which is the first starting bracket it comes across (there’s only one set of brackets here).

In some libraries, you need:

.*?<title>(.*?)</title>.*

because some require a complete match of the string.

\A is used to match the start of a string container
\< is used to match the boundary between whitespace and a character
\> matches the boundary between a character and whitespace

Be aware that this is not foolproof. Webpages can break your regular expression with pages like:

<html>
    <script>
      // <title>HAHA YOU GOT THE WRONG TITLE</title>
    </script>
    <title>The Actual title</title>
  </head><body></body>
</html>

You can avoid the possibility of this by making your regex more complicated before matching the title. However, that doesn’t really work. Because the fake title could be in an HTML comment , or a /* javascript */ comment.

Thus, it is better to use an actual HTML parser. You can search google to find many of these.

If you are using Ruby, you can use the nokogiri gem – http://nokogiri.org/.
For Java – http://htmlparser.sourceforge.net/.
For python – http://docs.python.org/library/htmlparser.html.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to scrape an HTML page for it’s title using a regular expression.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply