I need help with building regular expression for text separating.
Now I have some text like
text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
<table class="table2">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
I need to create a regular expression that would separate the text and tables.
Now I have regular expression
preg_match_all( "/(.*)(<table(?s).*?\/table>)(.*)/si", $value[ 'TEXT' ], $matches );
And this expression works fine for the text like
text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
It separate to the
text text text
text text text
<div> text text text </div>
and
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
But for the text
text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
<table class="table2">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
my regular expression doesnot work. It’s return array with
[0] =>"text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text",
[1]=>"<table class="table2">
<tr>
<td>
</td>
</tr>
</table>",
[2]=>"text text text
text text text
text text text"
How to build right regular expression?
It should be somewhere around this:
This code loads your html, find and removes tables, finds all the textnodes and fill an array with their content. You should read more about PHP DOM to fine tune it to your needs.