I’m facing some difficulties when parsing bbcode safely, specifically [img] and [url]. Language is less important, but this is regarding JavaScript.)
-
URLs:
Not long ago users were able to write [url=#” onclick=”alert(‘test’);”]Link[/url] on my site, and when others clicked the link an alert would appear. However, by replacing all double and single quotes with nothing, i.e. removing them, the alert hax did not work any further. My question here is if this is enough security for urls? Or are there any other scenarios I need to be aware of? -
Images:
What security features do I need for the img bbcode? Is it enough to remove quotes and check if the end of the url ends with a known image file type, such as .png or .jpg? Or do I need to do more?
Thanks for your help!
With the caveats from my comment, I suggest you just whitelist characters for a URL: a-z, 0-9, &, ., /, ?, :, =, etc. Then replace the
.*?by your allowed characters :This will cover most cases I think, except international URLs. Quotes are not allowed in this regexp, so no need to escape them. They meant to be expressed as
%22. Also, this doesn’t validate URLs, but only protects from XSS I believe.Both [url] and [img] take a URL, so this part of the regexp is the same. And you shouldn’t check for .png or .jpeg because many images do not have a URL with an explicit extension.
Then the url group in the regexp match will only need to be escaped for HTML.
Full code: