I compose a large HTML file out of a huge unformatted text file. Now my fear is that the text file might contain some malicious javascript code. To avoid any damage I scan the text and replace any < or > with lt and gt. That is quite effective, but it’s not really good for the performance.
Is there some tag or attribute or whatever that allows me to turn javascript off within the HTML file? In the header perhaps?
Since you’ve considered replacing all
<and>by the HTML entities, a good option would consist of sending theContent-Type: text/plainheader.If you include want to show the contents of the file, replacing every
&by&and every<by<is sufficient to correctly display the contents of the file. Example:Input:
Huge wall of text 1<a2 &>1Output:
Huge wall of text 1<a2 &>1Unmodified output, displaying in browser:
Huge wall of text 11(<..>interpreted as HTML)If you cannot modify code at the back-end (server-side), you need a HTML parser, which sanitised your code. JavaScript is not the only threat, embedded content (
<object>,<iframe>, …) can also be very malicious. Have a look at the following answer for a very detailed HTML parser & sanitizer :Can I load an entire HTML document into a document fragment in Internet Explorer?