So I have many large text paragraphs to parse.
The end goal is to separate the paragraphs into smaller postings, so I can insert them into mysql.
Here’s a very short example of one of the paragraphs in a string:
<?php
$longstring = '
(<b>John Smith</b>) at <b class="datetimeGMT">2011-01-10 22:13:01 GMT</b><hr>
Lots of text entered here under the first line.<br>And most of it is html, since it is for displaying in a web browser.<br></br></br>
(<b>Alan Slappy</b>) at <b class="datetimeGMT">2011-01-11 13:12:00 GMT</b><hr>
Forgot to put one more thing in the notes.........<br>blah blah blah
(<b>Joe Mama</b>) at <b class="datetimeGMT">2011-01-13 10:15:00 GMT</b><hr>
Groceries list:<br>Watermelons<br>Floss<br><br>email doctor
';
?>
Yep, I have a freaky project of parsing these strings for each entry.
Yes, I agree with anyone that this is not a cool task. the original developer allowed for appending text to the original text. Not a bad idea for some occasions, but for me it is.
I do need help with how to RegEx this beast and place it into a foreach loop so I can start cleaning it up.
Here’s how far I got:
<?php
if(preg_match_all('/\(<b>.*?<hr>/', $longstring, $matches)){
print_r($matches);
}
/* output:
Array
(
[0] => Array
(
[0] => (<b>John Smith</b>) at <b class="datetimeGMT">2011-01-10 22:13:01 GMT</b><hr>
[1] => (<b>Alan Slappy</b>) at <b class="datetimeGMT">2011-01-11 13:12:00 GMT</b><hr>
[2] => (<b>Joe Mama</b>) at <b class="datetimeGMT">2011-01-13 10:15:00 GMT</b><hr>
)
)
*/
?>
So, I’m actually doing pretty good with looping through the tops of each entry. I’m kinda proud I figured that out. (regex is my nemesis)
So now I’m stuck figuring out how to include the actual text below each iteration.
Anyone have an idea on how I can adjust the preg_match_all to account for the text below each “header”?
Try this