I’m storing URLs in a database, and I want to be able to know if two URLs are identical.
Generally, a trailing slash at the end doesn’t change the response you’d get from a server. (ie. http://www.google.com/ is the same as http://www.google.com)
Can I always blindly remove the trailing slash from any URL, without looking at anything?
Is that safe?
What I mean by “without looking at anything” is that I’d remove the slash from:
http://www.google.com/q?xxx=something&yyy=something/
I know the web server could theoretically return completely different things if it wanted, and I know sometimes going to a URL without the slash will redirect to one with the slash. My only intention here is determining if both URLs are the same.
Is this method safe?
No it is not always safe. A web server could interpret the path part of the URL anyway it likes. You cannot tell what it will do (resolve the URI) without using a GET or HEAD on the URL.