So in PHP I am trying to open a URL using;
$raw = file_get_contents($inlink);
and then I am working on the results. $inlin come from a $_GET variable. This works for most URLs but I have an issue when the URL includes the TM (™) symbol as I get a 404 error.
The URL in question is http://www.boots.com/en/Soap-Glory-Flake-Away-™-Body-Scrub-300ml_27894/
So it seems like an encoding issue. So I have tried urlencode, but this doesn’t help. I have also tried copy and past from the address bar in firefox, so that I enter
http://www.boots.com/en/Soap-Glory-Flake-Away-%E2%84%A2-Body-Scrub-300ml_27894/
instead, but this also does not help.
Through debugging and echoing to the screen then I end up with either
http://www.boots.com/en/Soap-Glory-Flake-Away-â„¢-Body-Scrub-300ml_27894/
or
http%3A%2F%2Fwww.boots.com%2Fen%2FSoap-Glory-Flake-Away-%E2%84%A2-Body-Scrub-300ml_27894%2F
being submitted, neither of which works.
Strange thing, is that if I hard code the link in the program then it works!
Any ideas?
You need to
utf8_encodethe URL (I assume it comes by in ISO-8859-1(5)/Latin1)Also, I assume you are utf8_decoding and validating the URL before you run the
file_get_contents()– you wouldn’t want a user to make your system request an arbitrary URL from anywhere on the Internet.