I am not very good with Regex but I am learning.
I would like to remove some html tag by the class name. This is what I have so far :
<div class="footer".*?>(.*?)</div>
The first .*? is because it might contain other attribute and the second is it might contain other html stuff.
What am I doing wrong? I have try a lot of set without success.
Update
Inside the DIV it can contain multiple line and I am playing with Perl regex.
You will also want to allow for other things before class in the div tag
Also, go case-insensitive. You may need to escape things like the quotes, or the slash in the closing tag. What context are you doing this in?
Also note that HTML parsing with regular expressions can be very nasty, depending on the input. A good point is brought up in an answer below – suppose you have a structure like:
Trying to build a regex for that is a recipe for disaster. Your best bet is to load the document into a DOM, and perform manipulations on that.
Pseudocode that should map closely to XML::DOM:
Here is a perl library, HTML::DOM, and another, XML::DOM
.NET has built-in libraries to handle dom parsing.