I am using PHP/Regex to parse some data for an application. The pages I

Question

0

Asked: May 27, 20262026-05-27T08:46:45+00:00 2026-05-27T08:46:45+00:00

I am using PHP/Regex to parse some data for an application. The pages I

0

I am using PHP/Regex to parse some data for an application. The pages I am parsing have table formats that include a header followed by a bunch of items. What I am trying to do is get the header for each table, along with all of the items so that I can label each item as part of that group (defined by the header).

I currently have it set up with an expression matching each header, and then everything up to the next header. I then use a loop on the header count to match the additional data from the second match in the first expression.

So basically:

preg_match_all ('#table-header.*?>(.*?)<\/td>(.*?)table-header#s', $url, $gr, PREG_PATTERN_ORDER);

for($i = 0; $i < count($gr[0]); $i++) {
  preg_match_all ('#type_id.*?<b>(.*?)</b> ... #s', $gr[2][$i], $info, PREG_PATTERN_ORDER);
  $group = trim($gr[1][$i]);

  for($ii = 0; $ii < count($info[0]); $ii++) {
    $name = trim($info[1][$ii]);
    ...
   }
 }

My issue is that it is skipping every other group, what I can only assume is because it matches table-header to table-header, and then skips to the next table-header instead of starting the next match with the ending table-header of the first match. How can I get it to start the next match with the end point of the previous match? Unfortunately the pages do not have enough unique items near the beginning/end points to use something different to match. The code looks similar to this:

<td align='center' class='table-header' colspan='18' valign='top'>
    Header
</td>

...items...

<td align='center' class='table-header' colspan='18' valign='top'>
    Header 2
</td>

I tried using the colspan as the start of my expression, and grabbing everything up to the next table-header, but it just breaks.

Thanks for any suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T08:46:45+00:00

Editorial Team

2026-05-27T08:46:45+00:00Added an answer on May 27, 2026 at 8:46 am

You should have a look to this class instead:
http://simplehtmldom.sourceforge.net/

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using PHP/Regex to parse some data for an application. The pages I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply