I’m working with a small subset of mostly invalid HTML, and I need to

Question

0

Asked: May 19, 20262026-05-19T22:22:39+00:00 2026-05-19T22:22:39+00:00

I’m working with a small subset of mostly invalid HTML, and I need to

0

I’m working with a small subset of mostly invalid HTML, and I need to extract a small piece of data. Given the fact that most of “markup” isn’t valid, I don’t think that loading everything into a DOM is a good option. Moreover, it seems like a lot of overhead for this simple case.

Here’s an example of the markup that I have:

(a bunch of invalid markup here with unclosed tags, etc.)
<TD><span>Something (random text here)</span></TD>
(a bunch more invalid markup here with more unclosed tags.)

The <TD><span>Something (random text here)</span></TD> portion does not repeat itself anywhere in the document, so I believe a simple regex would do the trick.

However, I’m terrible with regular expressions.

Should I use a regular expression? Is there a more simple way to do this? If possible, I’d just like to extract the text after Something, the (random text here) portion.

Thanks in advance!

Edit –

Exact example of the HTML (I’ve omitted the stuff prior, which is the invalid markup that the vendor uses. It’s irrelevant for this example, I believe):

<div class="FormTable">
        <TABLE>
        <TR>
                <TD colspan="2">In order to proceed with login operation please 
                answer on the security question below</TD>
        </TR>
        <TR>
                <TD colspan="2">&nbsp;</TD>
        </TR>
        <TR>
                <TD><label class="FormLabel">Security Question</label></TD>
                <TD><span>What is your city of birth?</span></TD>
        </TR>
        <TR>
                <TD><label class="FormLabel">Answer</label></TD>
                <TD><INPUT name="securityAnswer" class="input" type="password" value=""></TD>
        </TR>
        </TABLE>
</div>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T22:22:39+00:00

If you’re sure the opening and closing span tags are on a single line . . .

$ cat test.php
<?php
  $subject = "(a bunch of invalid markup here with unclosed tags, etc.)
              <TD><span>Something (random text here)</span></TD>
              (a bunch more invalid markup here with more unclosed tags.)";

  $pattern = '/<span>.*<\/span>/';

  preg_match($pattern, $subject, $matches);
  print_r($matches);

?>


$ php -f test.php
Array
(
    [0] => <span>Something (random text here)</span>
)

If you’re not confident that the span tags are on the same line, you can treat the html as a text file, and grep for the span tags.

$ grep '[</]span>' yourfile.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working with a small subset of mostly invalid HTML, and I need to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply