I have a website that allows users to input URLs to websites to share with other users.
We currently htmlentities() the URLs when showing them to other users but this will occasionally cause a broken URL due to a valid character being converted.
What’s the best way to remove potentially malicious characters from the URLs while breaking as few URLs as possible?
Example
Original URL: website.com?foo=1&bar=2
Escaped/Broken URL: website.com?foo=1&bar=2
It depends what you are outputting to, but if it is html, you can use
htmlspecialcharsinstead. That will only escape the characters that have special meaning in html.