I really can’t work out how to best do this, I can do fairly simple regex expressions, but the more complex ones really stump me.
The following appears in specific HTML documents:
<span id="label">
<span>
<a href="http://variableLink">Joe Bloggs</a>
now using
</span>
<span>
'
<a href="/variableLink/">Important Data</a>
'
</span>
<span>
on
<a href="/variableLink">Important data 2</a>
</span>
</span>
I need to extract the two ‘important data’ points and could spend hours working out the regex to do it.(I’m using the .net Regex Library in C# 3.5)
As often stated befor, regular expressions are usually not the right tool for parsing HTML, XML, and friends – think about using HTML or XML parsing libraries. If you really want to or have to use regular expressions, the following will match the content of the tags in many cases, but might still fail in some cases.
This expression will match all links not starting with
http://– this is the only obviouse difference I can see between the links.