I have a large set of HTML files that I need to parse the

Question

0

Asked: May 27, 20262026-05-27T10:22:51+00:00 2026-05-27T10:22:51+00:00

I have a large set of HTML files that I need to parse the

0

I have a large set of HTML files that I need to parse the <? and ?> tags out of, keeping in mind <?xml and the fact that an opening <?php tag doesn’t need an ending tag… EOF counts too.

My regular expression knowledge is admittedly lacking: /<\?[^(\?>)]*\?>/

Example HTML:

<? 
function trans($value) {
  // Make sure it does not translate the function call itself
}
?>
<!-- PHP 

code -->
<div id='test' <?= $extraDiv ?>>
<?= trans("hello"); ?>
<? if ($something == 'hello'): ?>
<? if ($something == 'hello'): ?>
<p>Hello</p>
<? endif; ?>
<?php

// Some multiline PHP stuff
echo trans("You are \"great'"); // I threw some quotes in to toughen the test
echo trans("Will it still work with two");
echo trans('and single quotes');
echo trans("multiline

stuff
");

echo trans("from array('test')",array('test'));

$counter ++;

?>

<p>Smart <?= $this->translation ?> time</p>
<p>Smart <?=$translation ?> time</p>
<p>Smart <?= $_POST['translation'] ?> time</p>

</div>

<?
trans("This php tag has no end");

Hoped for Array:

[0] => "<? 
function trans($value) {
  // Make sure it does not translate the function call itself
}
?>",
[1] => "<?= $extraDiv ?>",
[2] => etc...

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T10:22:52+00:00

Editorial Team

2026-05-27T10:22:52+00:00Added an answer on May 27, 2026 at 10:22 am

No, that isn’t how character classes work. Luckily you don’t need to worry about that because we can use a ? to make the character class non-greedy. I’ll also add a s to the end so that . can also match newlines, it usually can’t.

/<\?(.*?)\?>/s

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large set of HTML files that I need to parse the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply