I need a regular expression to help me make a match in my string. This is the line that contains the info that I need:
<td width="140" height="18"><a href="users_folders.cfm?viewfolder=86&viewsub=20207&addSub=20202" class="folderNav"><strong>087690898</strong></a></td>
What I need to pull out of it is the address of the href "users_folders.cfm?viewfolder=86&viewsub=20207&addSub=20202" and the value stored between the two strong tags 087690898. So I just need to identify lines that look like this.
So I have figured it out to this point:
(Match any char or digit) (Match < a href=”) (Match any char or digit) (Match class=”folderNav”>)
Which I have created this as my regular expression:
[a-z](< a href=”)[a-z](class=”folderNav”>)
Once I have identified this string, I can parse it pull the values I need, but it the identifying the string I am having an issue with. I am new to regular expressions, and not sure exactly how to do this. I know m regular expression is flawed. I am using C#.
Also, I know you shouldn’t use Regex on HTML, but for this, I dont mind a quick and dirty solution.
Although the purists will condemn me to eternal damnation for breaking the regex/HTML rule, here’s what you need:
The
(?<name>expression)parts are called “named matched subexpressions”; you may read more about them by following the link to MSDN.In the code above, we’re using named subexpressions for matching your address and your value. In each case, we allow any character to be matched, except for the expected terminator. In the case of the
hrefaddress, the attribute value ends just before the"; thus, we match[^"]*. In the case of the<strong>value, the element text ends just before the<(of the closing tag); thus, we match[^<]*. The rest of the regex pattern is verbatim.