I am trying to extract from a webpage which has the following markup
<div id="div">
content
content
content
content
</div>
The regex I currently have is
Pattern div = Pattern.compile("<div id=\"div\">(.*?)</div>");
This works when there is only one line but with new lines it doesn’t recognise stuff inside the div tag..
Any help will be grateful (I am using java by the way)
Personally, I would strongly discourage you from using regular expressions in this case. It is well documented as being a bad idea to attempt to suck information out of an HTML document with regular expressions. Take a look at a proper HTML parser instead!