I am trying to extract javascript code from HTML content that I receive via

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T11:22:03+00:00 2026-06-13T11:22:03+00:00

I am trying to extract javascript code from HTML content that I receive via

0

I am trying to extract javascript code from HTML content that I receive via CFHTTP request.

I have this simple regex that catches everyting as long as there is no linebreak in the code between the tags.

var result=REMatch("<script[^>]*>(.*?)</script>",html);

This will catch:

<script>testtesttest</script<

but not

<script>
testtest

</script>

I have tried to use (?m) for multiline, but it doesn’t work like that.
I am using the reference to figure it out but I am just not getting it with regex.

Heads up, normally there would be javascript between the script tags, not simple text so also characters like {}();:-_ etc.

Can anyone help me out?

Cheers

[[UPDATE]]
Thanks guys, I will try the solutions. I favor regex because but I will look into the HTML Parser too.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T11:22:04+00:00

(?m) multiline mode is for making ^ and $ match on line breaks (not just start/end of string as is default), but what you’re trying to do here is make . include newlines – for that you want (?s) (dot-all mode).

However, I probably wouldn’t do this with regex – a HTML parser is a more robust solution. Here’s how to do it with jSoup:

var result = jsoup.parse(html).select('script').text();

More details on using jSoup in CF are available here, or alternatively you can use the TagSoup parser, which ships with CF10 (so you don’t need to worry about jars/etc).

If you really want regex, then you can use this:

var result = rematch('<script[^>]*>(?:[^<]+|<(?!/script>))+',html);

Unlike using (?s).*? this avoids matching empty blocks (but it will still fail in certain edge cases – if accuracy is required use a HTML parser).

To extract just the text from the first script block, you can strip the script tag with this:

result = ListRest( result[1] , '>' );

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to extract javascript code from HTML content that I receive via

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply