I have a large set of HTML files that I need to parse the <? and ?> tags out of, keeping in mind <?xml and the fact that an opening <?php tag doesn’t need an ending tag… EOF counts too.
My regular expression knowledge is admittedly lacking: /<\?[^(\?>)]*\?>/
Example HTML:
<?
function trans($value) {
// Make sure it does not translate the function call itself
}
?>
<!-- PHP
code -->
<div id='test' <?= $extraDiv ?>>
<?= trans("hello"); ?>
<? if ($something == 'hello'): ?>
<? if ($something == 'hello'): ?>
<p>Hello</p>
<? endif; ?>
<?php
// Some multiline PHP stuff
echo trans("You are \"great'"); // I threw some quotes in to toughen the test
echo trans("Will it still work with two");
echo trans('and single quotes');
echo trans("multiline
stuff
");
echo trans("from array('test')",array('test'));
$counter ++;
?>
<p>Smart <?= $this->translation ?> time</p>
<p>Smart <?=$translation ?> time</p>
<p>Smart <?= $_POST['translation'] ?> time</p>
</div>
<?
trans("This php tag has no end");
Hoped for Array:
[0] => "<?
function trans($value) {
// Make sure it does not translate the function call itself
}
?>",
[1] => "<?= $extraDiv ?>",
[2] => etc...
No, that isn’t how character classes work. Luckily you don’t need to worry about that because we can use a
?to make the character class non-greedy. I’ll also add asto the end so that.can also match newlines, it usually can’t.