Hey all, what would the regEX code be for the following:
<br/><span class=""synopsis-view-synopsis"">America's justice system comes under indictment in director <a href='/people/1035' class='actor' style='font-weight:bold'>Norman Jewison</a>'s trenchant film starring <a href='/people/1028' class='actor' style='font-weight:bold'>Al Pacino</a> as upstanding attorney Arthur Kirkland. A hard-line -- and tainted -- judge (<a href='/people/1034' class='actor' style='font-weight:bold'>John Forsythe</a>) stands accused of rape, and Kirkland (<a href='/people/1028' class='actor' style='font-weight:bold'>Al Pacino</a>) has to defend him. Kirkland has a history with the judge, who jailed one of the lawyer's clients on a technicality. When the judge confesses his guilt, Kirkland faces an ethical and legal quandary. </span>
Ive tried this:
regex = New System.Text.RegularExpressions.Regex("(?<=""synopsis-view-synopsis""\>)([^<\/span><]+)")
But that only seems to get the first part of the description; Americ
Any help would be great! :o)
David
I don’t see any need for lookaheads or lookbehinds here; just match the whole
<span>element and use a capturing group extract its content. Assuming there will never be any<span>elements inside the one you’re matching, this should be all you need:Also,
[^<\/span><]+doesn’t do what you probably think it does. What you’ve got there is a character class that matches any one character except<,/,s,p,a,n, or>. You may have been trying for this:…which matches one character at a time, after the lookahead confirms that the character isn’t the beginning of the sequence
</span>. It’s a valid technique, but (as with the lookarounds) I don’t think you need anything so fancy here.