I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =. Is there a way to do this with mod_rewrite without having to double encode the links?
I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What’s the magic formula?
Here’s an example of what I want to happen:
http://www.example.com/tag/c%2b%2b
Would trigger the following RewriteRule:
RewriteRule ^tag/(.*) script.php?tag=$1
and the value of tag would be "c++"
The normal operation of apache/mod_rewrite doesn’t work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to ‘%252B’ then I get the desired result – however it makes for messy URLS and seems pretty hack to me.
I don’t think that’s quite what’s happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.
So then mod_rewrite changes your request ‘/tag/c++’ to ‘script.php?tag=c++’. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, ‘+’ is a shorthand for space (which could just as well be encoded as ‘%20’, but this is an old behaviour we’ll never be able to change now).
So PHP’s form-reading code receives the ‘c++’ and dumps it in your _GET as C-space-space.
Looks like the way around this is to use the rewriteflag ‘B’. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags – curiously it uses more or less the same example!