I’m nearly done with a trackback system for my website, but have one last niggling regular expression I just can’t get right.
What I’m after is an excerpt of the referring page, where I’m defining the most relevant excerpt as:
The first paragraph (marked by <p></p> tags) that follows either an <h1></h1>, <h2></h2> or <h3></h3> in the HTML Source of the page.
For instance, I can successfully fetch the <title></title> tag for the HTML as follows:
Regex reTITLE = new Regex( @"(?<=<title.*>)([\s\S]*)(?=</title>)",
RegexOptions.IgnoreCase );
Match match = reTITLE.Match( strHTMLSource );
if (match.Success)
{
strReferringPageTitle = match.Value.Trim( );
}
My question — what Regular Expression can I use to fetch the string described in the first part of my post?
PS: I love StackOverflow and this community — great job, Joel & Co.!
1 Answer