Using regex and PHP I am trying to get the content of the title attribute as below.
preg_match('/<abbr class="dtstart" title="([^"]*)"/i', $file_string, $starts);
$starts_out = $starts[1];
preg_match('/<abbr class="dtend" title="([^"]*)"/i', $file_string, $ends);
$ends_out = $ends[1];
Here is the exact part of the code that I want to get, and I get the data correctly.
<div id="eventDetailInfo">
<h2>When</h2>
<div class="p">
<div>From:
<abbr class="dtstart" title="2012-08-24T17:00:00">Friday, August 24th, 2012</abbr></div>
<div>Until:
<abbr class="dtend" title="2012-08-26">Saturday, August 25th, 2012</abbr></div>
</div>
</div>
However, because sometimes there is no Until in some articles, the regex matches the first of the remaining code ( this is related articles).
My question is how do I restrict the regex to match only the above, and if no
<div>Until:
<abbr class="dtend" title="2012-08-26">Saturday, August 25th, 2012</abbr></div>
is found, to remain blank?
This is the rest code of the page, unfortunately the regex matches it.
<div class="evdate">
<em>When:</em>
<abbr class="dtstart" title="2012-07-03T21:00:00">July 3rd</abbr>
to
<abbr class="dtend" title="2012-07-13">July 12th</abbr>*
</div>
<div class="evtime"><em>Time:
</em>
21:00
</div>
</div>
While I’ve shown you how to do this with a quick regex, I clearly advised you against using a regex for this sort of thing. As you can see for yourself, it can get out of hand rather quickly.
As pointed out by others (here and there), you should be using an HTML parser for this.
I’d advise you to use Simple HTML DOM, since it’s very easy to work with, and their documentation is pretty good too.