I’ve just run into a pathological case with HTML parsing. I’ve always thought that

Question

0

Asked: June 17, 20262026-06-17T23:56:00+00:00 2026-06-17T23:56:00+00:00

I’ve just run into a pathological case with HTML parsing. I’ve always thought that

0

I’ve just run into a pathological case with HTML parsing. I’ve always thought that a <script> tag would run until the first closing </script> tag. But it turns out this is not always the case.

This is valid:

<script><!--
alert('<script></script>');
--></script>

And even this is valid:

<script><!--
alert('<script></script>');
</script>

But this is not:

<script><!--
alert('</script>');
--></script>

And neither is this:

<script>
alert('<script></script>');
</script>

This behavior is consistent in Firefox and Chrome. So, as hard as it is to believe, browsers seem to accept an open+close script tag inside an html comment inside a script tag. So the question is how do browser really parse script tags?
This matters because the HTML parsing library I’m using, Nokogiri, assumed the obvious (but incorrect) until-the-first-closing-tag rule and did not handle this edge case. I imagine most other libraries would not handle it either.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T23:56:01+00:00

After poring over the links given by Tim and Jukka I came to the following answer:

after the opening <script> tag, the parser goes to data1 state
if <!-- is encountered while in data1 state, switch to data2 state
if --> is encountered while in any state, switch to data1 state
if <script[\s/>] is encountered while in data2 state, switch to data3 state
if </script[\s/>] is encountered while in data3 state, switch to data2 state
if </script[\s/>] is encountered while in any other state, stop parsing

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve just run into a pathological case with HTML parsing. I’ve always thought that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply