I’m getting some html data from remote server and before displaying it in the UI of application i need to make some changes, i.e. delete counters, replace links, etc. Removing some tag with contents and changing specific link is not a big deal, but when it comes to some advanced processing, i have some problems.There is a need to replace (delete) few html tag attributes (not a tag itself – there are plenty of examples over internet about this). For example : delete all onmouseover handlers from buttons. I know that XPath would be a perfect fit for such problem, but i don’t know it at all and although my information is XHTML-complaint, it’s stored in a string variable and not queryable :(. So i’m trying to use Regular Expressions to solve this problem, with no success for now. I guess it’s a mistake in pattern…
public string Processing (string Source, string Tag, string Attribute) { return System.Text.RegularExpressions.Regex.Replace(Source, string.Format(@'<{0}(\s+({1}=''([^'']*)''|\w+=(''[^'']*''|\S+)))+>', Tag, Attribute), string.Empty); } ... string before = @'<input type=''text'' name=''Input'' id=''Input'' onMouseOver=''some js to be eliminated''>'; string after = Processing(before,'input','onMouseOver'); // expected : <input type='text' name='Input' id='Input'>'
That’s an interesting approach, but like bobince said, you can only process one attribute per match. This regex will match everything up to the attribute you’re interested in:
Then you use ‘$1’ as your replacement string to plug back in everything but the attribute.
This approach requires you to make a separate pass over the string for each of your target tag/attribute pairs, and at the beginning of each pass you have to create and compile the regex. Not very efficient, but if the string isn’t too large it should be okay. A much bigger problem is that it won’t catch duplicate attributes; if there are two ‘onmouseover’ attributes on a button, you’ll only catch the first one.
If I were doing this in C# I would probably use the regex to match the target tag, then use a MatchEvaluator to remove all of the target attributes at once. But seriously, if the string really is well-formed XML, there’s no excuse for not using XML-specific tools to process it–this is what XML was invented for.