How can I extract the content between tags with several line breaks?
I’m a newbie to regex, who would like to know how to handle unknown numbers of line break to match my query.
Task: Extract content between <div class="test"> and the first closing </div> tag.
Original source:
<div class="test">optional text<br/>
content<br/>
<br/>
content<br/>
...
content<br/><a href="/url/">Hyperlink</a></div></div></div>
I’ve worked out the below regex,
/<div class=\"test\">(.*?)<br\/>(.*?)<\/div>/
Just wonder how to match several line breaks using regex.
There is DOM for us but I am not familiar with that.
You could use
preg_match_all('/<div class="test">(.*?)<\/div>/si', $html, $matches);. But remember that this will match the first closing</div>within the HTML. Ie. if the HTML looks like<div class="test">...aaa...<div>...bbb...</div>...ccc...</div>then you would get...aaa...<div>...bbb...as the result in $matches…So in the end using a DOM parser would indeed by a better solution.